Writing a pseudocode compiler (1) – Setting the scene

Having an interest in computer science usually means that one gravitates towards use and implementation of compiled languages. (Writing programs that write programs is more fun than writing programs!) Having explored the use of flex and bison by generating an expression evaluator (calculator program) in a previous mini-series I wanted to progress onto implementing a full compiler. At the time of writing the compiler is feature-complete (for the pseudocode specification which was used – see under “Resources” below), however it has several known (and probably more unknown) bugs and needs a slight code clean-up. The source code (with Linux and Windows build scripts) and a 32-bit Windows .exe are available to download from this page.

When choosing a language to implement a compiler for, a good familiarity with the language itself is obviously a pre-requisite for all but the simplest programming languages. I decided against writing a compiler for a language that I knew already had such an implementation as I would be duplicating effort and other coders would naturally compare the two, possibly unkindly! My earliest introduction to programming was learning BBC BASIC in the early 1990s and this interest continued after I began to learn C/C++ around 2002. Most versions of interpreted BASIC dialects (leaving aside Microsoft Visual Basic) used upper-case keywords and I wanted to follow this convention. A web-search for a modern pseudocode specification (in the style of BASIC) led me to a download from the English AQA exam board’s website (direct link at the bottom of this page).

Liking the flavor of the language, I set myself a challenge of implementing a compiler for it using the tools with which I already had some degree of familiarity (flex/bison/C++). Not knowing whether a language has a suitable syntax to allow straightforward compilation, in other words whether it is possible to express it using a suitable grammar, means a bit of a stab-in-the-dark in trying to implement a compiler for it. This project’s bison grammar specification in fact has no shift-reduce or reduce-reduce conflicts (grammar ambiguities), which may well not have been the case if I had designed my own language to try to implement.

Having had an email correspondence with AQA (once the compiler was more-or-less working) I understand that they do not either use or endorse products that are not produced in-house (other than textbooks). However they kindly gave permission to link to their GCSE pseudocode specification (meant for 14 to 16-year-olds to compose hand-written code fragments, possibly in a formal exam setting) and distribute my compiler for the same. This mini-series will discuss the design and implementation of an AQA Pseudocode to Javascript compiler written from scratch.

So why output Javascript? Isn’t this then just a translator, not a true compiler?

The answer to the first question is universality: Javascript runs in a browser or on the console, and just about everywhere in-between. To this end, I have created a simple two-pane web application (using the compiler as a CGI utility) that allows entry of code into the left-hand pane while executing Javascript in the right-hand pane. Also, the code output by the compiler is compatible with Mozilla’s Spidermonkey engine and Node.js (the latter needs a small support library provided by option -p, as well as the npm package readline-sync).

Answering the second, according to the book [Compilers], a program that compiles code from one high-level language into another is a “source-to-source translator”. However, with the use of full syntax as well as semantic analysis and type checking to verify the correctness of the source program during parsing, I’m going to call this project a compiler.

As well as lexical, syntax and semantic analysis, the job of the (parser) front-end is to generate an in-memory a representation of the program. This representation is called an abstract syntax tree (AST) and usually takes the form of a (possibly highly asymmetric) binary tree. The (code generator) back end then usually just “walks” this tree in order to generate output code (almost as if this were a casual side-effect and not the point of the whole exercise!) Having chosen to use C++ to implement the compiler, use of a class hierarchy was the obvious way to create the AST. (By the way, use of bison to create the front-end does not actually mean significantly less work than writing a compiler “by hand”, however the framework provided is useful in giving a structure to work within.) The AST developed for this project keeps all of the information needed to output Javascript code without a need for a separate back-end.

An AST can be described as a mathematical representation of a program, and as such is probably of more interest to mathematicians than most computer scientists: “If it produces correct code then its implementation is also probably correct,” may go the logic. Its existence forms a bridge between the source input and generated output, and its implementation is where knowledge of the syntax of the language being compiled matters the most. We’ll take a look at the implementation of this in the next article.

Resources:

  • AQA GCSE Computer Science (course code 8525) Pseudocode Specification (PDF). Please note that the provision of this link does not imply any endorsement from, or affiliation to, AQA exam board.
  • Download the 32-bit Windows .exe and source code for this mini-series (hosted on GitHub), with Makefiles for Linux and Windows.
  • [Compilers] Aho, Lam, Sethi, Ullman Compilers: Principles, Techniques and Tools (2nd ed., 2007). The first edition was known as the “Dragon Book” after its cover art, and the second edition remains the definitive book on the subject of compiler implementation. Looking on Amazon at the time of writing, it appears that the 12-chapter hardback “US Edition” may be a better purchase than the 11-chapter paperback “International Edition”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s