In this article we’ll look at some of the design decisions to be made when implementing an abstract syntax tree in C++, called “abstract” because of being a (slight) simplification of the source text. There is no bison
(or flex
) code involved, just pure C++ and textual output of Javascript; the AST has to contain sufficient information about the source to enable generation of the output code (this is almost true, usually at least a symbol table is needed as well, this is covered in a later article).
Firstly to name of the base class of the class hierarchy I chose class Tree
as that is what it represents, other common names could be Node
or AST
. I decided against using smart pointers, or even a destructor, so memory claimed via new
is only released when the program exits. (This would probably be the case even if we used the obvious choice of std::shared_ptr
, so little would be gained except to slow the compiler down slightly.) Here is the relevant part of the class definition:
class Tree {
public:
Tree(Tree *l, Tree *r)
: left{ l }, right{ r } {}
void link(Tree *next) {
if (right) {
throw std::runtime_error("cannot link");
}
right = next;
}
virtual void emit() = 0;
protected:
Tree *left{ nullptr }, *right{ nullptr };
// ... definition of the Indenter class
};
Notice there is no default constructor—only a constructor taking two Tree*
pointers is defined—therefore every immediately derived class must call Tree(l, r)
from its own constructor explicitly. In addition, r
(right
) will always initially be nullptr
in this scheme. Tree elements in the same logical block (as would be between {
and }
in the Javascript code to be output) are manually linked by calling the member function link()
. Also, emit()
is a pure virtual function which must be overridden, meaning that class Tree
is an abstract base class, or ABC; a typical implementation of emit()
in a derived class would be:
virtual void emit() override {
// ... do stuff with this->left
if (right) {
right->emit();
}
}
Should the non-null check for right
be omitted, then the output would end early! From this design it can be seen that an external function or class which “walks the tree” is unnecessary; the emit()
member function provides this capability, which means that calling this member on the head node is all that is required to output the Javascript code (assuming that the tree is fully and correctly constructed). We will need a minimal class Empty
to act as dummy nodes, and this uses the above minimal definition of emit()
.
Consider the following code fragment (written in the pseudocode language which we are translating from):
A <- 10
WHILE A > 0
OUTPUT INT_TO_STRING(A)
A <- A - 1
ENDWHILE
OUTPUT 'Blastoff!'
This can be represented in tree form as follows:

The tree starts from the top left and continues to the bottom right (when considered as a fragment and not a complete program, which it also happens to be). We’ve needed to use expressions involving A
quite a lot, so a class Expression : public Tree
can usefully be created, another ABC:
enum class ExpI { None, BoolT, IntT, StringT, /* other types */ };
using BoolT = bool;
using IntT = int;
using StringT = std::string;
// ... other types
using ExpT = std::variant<std::monostate,BoolT,IntT,StringT, /* other types */>;
class Expression : public Tree {
public:
Expression(Tree *l = nullptr, Tree *r = nullptr)
: Tree{ l, r } {}
virtual bool isConstant() = 0;
virtual ExpI type() = 0;
virtual ExpT apply() = 0;
protected:
// ... definitions of valid operations for this->left and this->right
};
The member function type()
, which is overridden for sub-classes Value
and Variable
, has its value set exactly once, when the entity it refers to first appears in the source code. In an initial assignment a <- 3
, the type of a
is synthesized from the initialization expression, the type of which becomes the type of this variable. (The opposite occurs with an initial assignment in C such as int i = 4.2;
, where the type of i
is inherited from its assumed parent node int
, causing a truncation to 4
when the assignment takes place.) Synthesized attributes are difficult to implement in an LL parser, however in an LR parser the leaves of the AST are resolved first meaning that attribute values can be passed “up” to the parent Assign
node using virtual
functions.
Implementing the CONSTANT
keyword (after the syntax of constexpr
from Modern C++) was seen as a challenge, as was implementing type-checking as if we were outputting TypeScript, as opposed to plain Javascript. The constant-calculation function apply()
can only be called if isConstant()
is true
, which is the case if left->isConstant()
and right->isConstant()
both return true
. Member function type()
returns an enum class
indicating the types of members left
and right
; mixed mode arithmetic (operands of different types) is not allowed:
enum class Operator{ none, plus, minus, negative, multiply, divide, DIV, MOD, AND, OR, NOT, equal, not_equal, less_than, less_equal, greater_equal, greater_than };
class ExpressionOp : public Expression {
const Operator op;
public:
ExpressionOp(Tree *l = nullptr, Tree *r = nullptr, Operator o = Operator::none)
: Expression{ l, r }, op{ o } {}
// ... implementations of isConstant(), type() and apply()
};
A design decision made after study of the specification was to disallow mixed-mode operations, where the types of the operands differ. Therefore a check that left.type()==right.type()
is made, and a compile-time error is produced if they do not. The concrete class ExpressionOp
contains a single member variable op
which holds an arithmetic or relative operator. (Purists who argue that binary tree instances should not hold member variables other than left
and right
would probably concede this one.) We have followed the scheme that tree-referencing classes such as class While
should not usually have sub-trees as member variables (in order to maintain the tree paradigm), hence the need for a class WhileCond
(indicated as test
in the above diagram):
class WhileCond : public Tree {
public:
WhileCond(Expression *c, Tree *b)
: Tree(c, b) {}
virtual void emit() override {
output << '(';
left->emit();
output << ") {\n";
if (right) {
++indent;
right->emit();
--indent;
}
output << "}\n";
}
};
class While : public Tree {
public:
While(Expression *c, Tree *b)
: Tree(new WhileCond(c, b), nullptr) {}
virtual void emit() override {
output << indent << "while ";
left->emit();
if (right) {
right->emit();
}
}
};
So, our AST-building class hierarchy has begun to take shape, so we’re getting ready to launch into creation of our parser, which will create instances of these classes from the syntax rules of the language. Actually, in the next article we’re going to look at creating the scanner (or lexer), jumping back to the first stage of the (single) compilation pass, lexical analysis.