In this part we’re going to look at how the bison
grammar rules, specified in the file “grammar.y”, can be made to match against the stream of tokens which the parser requests (this process being called syntax directed translation). Ultimately we want to output correctly generated code by through walking the abstract syntax tree, created in memory by the parser program we wrote. We’ll also take a look at how the global symbol table plays a part in early type checking, as the syntax tree is being constructed.
When generating the abstract syntax tree we do, of course, have to make sure all relevant structural and semantic details from the source text are both checked for correctness and preserved so that (output) code generation can take place. Let’s look first at the IF
/THEN
/ENDIF
and IF
/THEN
/ELSE
/ENDIF
constructs which are both part of the language specification (leaving aside the open-ended chaining of optional ELSE
IF
clauses also specified for the language). The rules are specified in the bison
input file as follows:
Stmnt : // ... other statement rules
| IF BoolExp THEN EOL Marker Block ELSE EOL Marker Block ENDIF EOL {
$$ = new If($2, $6, $10);
$5->link($$);
prev = $$;
}
| IF BoolExp THEN EOL Marker Block ENDIF EOL {
$$ = new If($2, $6);
$5->link($$);
prev = $$;
}
// ... further statement rules follow
Marker : %empty { $$ = prev; prev = new Empty; }
;
The order in which the rules are defined should not matter (as long as there are no ambiguities, also known as conflicts) and the more complex version with an ELSE
clause is shown first. This rule extracts the associated semantic values for the second, sixth and tenth matches and forwards them to the If()
constructor (the types involved being Expression*
, Tree*
and Tree*
). However the new If()
instance (itself treated as a Tree*
) needs to be attached at the correct point in the parse tree, as well as its two sub-trees maintaining their own element(s) correctly.
Recall that the variable prev
was set to the head node on entry to the parsing function when calling the constructor in main()
. Being a single-value (non-array-typed) variable it is unable to cope with nested blocks, even with a simple IF
statement at global scope. To avoid having to use a stack, the value of prev
needs to be saved so that it can be used and then reset in order to attach the statement following the ENDIF
to the correct place. To do this we have made a rule called Marker
which stores the current value of prev
(a Tree*
) as its own semantic value. This is available (without the need for either a mid-rule action or a stack) to the body of the IF
/THEN
/ENDIF
rule as its fifth semantic value. (Therefore to attach the If()
tree we simply use $5->link($$)
, not prev->link($$)
as for other statements). The use of markers in this way, which match against empty, are described in [Compilers] $5.5.4; they have a side-effect of turning any grammar which was LL into LR, but that is harmless for our purposes.
In the Marker
rule we start a new block with new Empty
, which provides a stub for the rest of the sub-tree to be attached to, and which eventually (through the default production for Stmnts
and Block
) becomes the pointer for the parameters to If()
. Going back to the other tree, which is the IF
-statement’s conditional expression, this is again recursively constructed from an arbitrarily complex expression whose type is checked against Boolean by another rule:
BoolExp : Exp {
if ($1->type() != ExpI::BoolT) {
error(@1, "non-boolean expresssion in conditional context");
YYERROR;
}
$$ = $1;
}
;
A new variable is created by syntax such as ID <- new_value
, and this is both a statement and an entry for the symbol table, which keeps track of types (as we’ve interpreted the specification as defining a statically-typed language where types are known or deduced at compile-time). Here is the rule for assignment to a variable:
| ID ASSIGN Exp EOL {
if (!table->check($1).first) {
table->store($1, $3->type());
$$ = new Assign($3, $1);
prev->link($$);
prev = $$;
}
else {
if ($3->type() == table->type($1)) {
$$ = new Assign($3, $1);
prev->link($$);
prev = $$;
}
else {
error(@1, "variable previously assigned with different type: " + $1);
YYERROR;
}
}
}
This rule needs to check whether the assignment is to a variable name already encountered in this scope, which it does by calling the symbol table’s check()
method. If it hasn’t then a new table entry is created. In either case a new Assign()
instance is created and linked into the tree in the same way as for other statements.