Having an interest in computer science usually means that one gravitates towards use and implementation of compiled languages. (Writing programs that write programs is more fun than writing programs!) Having explored the use of
bison by generating an expression evaluator (calculator program) in a previous mini-series I wanted to progress onto implementing a full compiler. At the time of writing the compiler is feature-complete (for the pseudocode specification which was used – see under “Resources” below), however it has several known (and probably more unknown) bugs and needs a slight code clean-up. The source code (with Linux and Windows build scripts) and a 32-bit Windows .exe are available to download from this page.
When choosing a language to implement a compiler for, a good familiarity with the language itself is obviously a pre-requisite for all but the simplest programming languages. I decided against writing a compiler for a language that I knew already had such an implementation as I would be duplicating effort and other coders would naturally compare the two, possibly unkindly! My earliest introduction to programming was learning BBC BASIC in the early 1990s and this interest continued after I began to learn C/C++ around 2002. Most versions of interpreted BASIC dialects (leaving aside Microsoft Visual Basic) used upper-case keywords and I wanted to follow this convention. A web-search for a modern pseudocode specification (in the style of BASIC) led me to a download from the English AQA exam board’s website (direct link at the bottom of this page).
Liking the flavor of the language, I set myself a challenge of implementing a compiler for it using the tools with which I already had some degree of familiarity (
bison/C++). Not knowing whether a language has a suitable syntax to allow straightforward compilation, in other words whether it is possible to express it using a suitable grammar, means a bit of a stab-in-the-dark in trying to implement a compiler for it. This project’s
bison grammar specification in fact has no shift-reduce or reduce-reduce conflicts (grammar ambiguities), which may well not have been the case if I had designed my own language to try to implement.
-p, as well as the
Answering the second, according to the book [Compilers], a program that compiles code from one high-level language into another is a “source-to-source translator”. However, with the use of full syntax as well as semantic analysis and type checking to verify the correctness of the source program during parsing, I’m going to call this project a compiler.
As well as lexical, syntax and semantic analysis, the job of the (parser) front-end is to generate an in-memory a representation of the program. This representation is called an abstract syntax tree (AST) and usually takes the form of a (possibly highly asymmetric) binary tree. The (code generator) back end then usually just “walks” this tree in order to generate output code (almost as if this were a casual side-effect and not the point of the whole exercise!) Having chosen to use C++ to implement the compiler, use of a class hierarchy was the obvious way to create the AST. (By the way, use of
An AST can be described as a mathematical representation of a program, and as such is probably of more interest to mathematicians than most computer scientists: “If it produces correct code then its implementation is also probably correct,” may go the logic. Its existence forms a bridge between the source input and generated output, and its implementation is where knowledge of the syntax of the language being compiled matters the most. We’ll take a look at the implementation of this in the next article.
- AQA GCSE Computer Science (course code 8525) Pseudocode Specification (PDF). Please note that the provision of this link does not imply any endorsement from, or affiliation to, AQA exam board.
- Download the 32-bit Windows .exe and source code for this mini-series (hosted on GitHub), with Makefiles for Linux and Windows.
- [Compilers] Aho, Lam, Sethi, Ullman Compilers: Principles, Techniques and Tools (2nd ed., 2007). The first edition was known as the “Dragon Book” after its cover art, and the second edition remains the definitive book on the subject of compiler implementation. Looking on Amazon at the time of writing, it appears that the 12-chapter hardback “US Edition” may be a better purchase than the 11-chapter paperback “International Edition”.