Generating C++ programs with flex and bison (2)

In this second article we’ll get straight on with generating the C++ parser using bison. The source code is about twice the length of the scanner and is more complex in implementation, and it would be impossible to explain every part from scratch in one article. The main parts of the configuration to instruct bison to use a C++ parser class are covered. Interestingly the overall format is similar to flex‘s input file, being three sections separated by %% on a line by itself. Here is the bison input file grammar1.y:

%{
#include <iostream>
#include <string>
#include <cmath>
#include "Scanner1.hpp"
%}

%require "3.7.4"
%language "C++"
%defines "Parser1.hpp"
%output "Parser1.cpp"

%define api.parser.class {Parser}
%define api.namespace {calc}
%define api.value.type variant
%param {yyscan_t scanner}

%code provides
{
    #define YY_DECL \
        int yylex(calc::Parser::semantic_type *yylval, yyscan_t yyscanner)
    YY_DECL;
}

%token              EOL LPAREN RPAREN
%token <long long>  INT
%token <double>     FLT
%token <char>       INTVAR FLTVAR

%nterm <long long>  iexp
%nterm <double>     fexp

%nonassoc           ASSIGN
%left               PLUS MINUS
%left               MULTIPLY DIVIDE MODULO
%precedence         UMINUS
%precedence         FACTORIAL
%right              EXPONENT

%code
{
    namespace calc {
        long long ivars['Z' - 'A' + 1];
        double fvars['z' - 'a' + 1];
    
        long long factorial(long long n) {
            if (n < 2) {
                return 1;
            }
            else {
                return n * factorial(n - 1);
            }
        }
    } // namespace calc
} // %code

%%

lines   : %empty
        | lines line
        ;

line    : EOL                       { std::cerr << "Read an empty line.\n"; }
        | iexp EOL                  { std::cout << $1 << '\n'; }
        | fexp EOL                  { std::cout << $1 << '\n'; }
        | INTVAR ASSIGN iexp EOL    { ivars[$1 - 'A'] = $3; }
        | FLTVAR ASSIGN fexp EOL    { fvars[$1 - 'a'] = $3; }
        | error EOL                 { yyerrok; }
        ;

iexp    : INT                       { $$ = $1; }
        | iexp PLUS iexp            { $$ = $1 + $3; }
        | iexp MINUS iexp           { $$ = $1 - $3; }
        | iexp MULTIPLY iexp        { $$ = $1 * $3; }
        | iexp DIVIDE iexp          { $$ = $1 / $3; }
        | iexp MODULO iexp          { $$ = $1 % $3; }
        | MINUS iexp %prec UMINUS   { $$ = -$2; }
        | iexp FACTORIAL            { $$ = factorial($1); }
        | LPAREN iexp RPAREN        { $$ = $2; }
        | INTVAR                    { $$ = ivars[$1 - 'A']; }
        ;

fexp    : FLT                       { $$ = $1; }
        | fexp PLUS fexp            { $$ = $1 + $3; }
        | fexp MINUS fexp           { $$ = $1 - $3; }
        | fexp MULTIPLY fexp        { $$ = $1 * $3; }
        | fexp DIVIDE fexp          { $$ = $1 / $3; }
        | fexp EXPONENT fexp        { $$ = pow($1, $3); }
        | MINUS fexp %prec UMINUS   { $$ = -$2; }
        | LPAREN fexp RPAREN        { $$ = $2; }
        | FLTVAR                    { $$ = fvars[$1 - 'a']; }
        ;

%%

void calc::Parser::error(const std::string& msg) {
    std::cerr << msg << '\n';
}

The #includes at lines 2-5 are copied to the implementation file Parser1.cpp while the contents of the %code requires block at lines 20-22 is copied to the end of the header file Parser1.hpp. As we have seen, the scanner references this header file, and it is here that we control the signature of the generated yylex() function for the scanner implementation (recall that its default prototype is int yylex(void)). The pointer variable yylval is used when setting the semantic value of a token in the lexer.

Lines 8-16 configure the bison program to our needs meaning we don’t have to specify any command-line arguments. Using variant (not to be confused with C++17’s std::variant) is what generates the emplace<Type>(Value) methods used by the lexer. The line %param {yyscan_t scanner} is the “glue” that holds the main() function together as this parameter becomes both a member variable of calc::Parser and the second parameter used to call yylex() from within calc::Parser::parse.

Lines 25-38 declare the terminals, non-terminals and operators, each of which is given a unique integer token value (starting at 256). (The operators are listed in order of precedence low to high.) These are the same names as in the scanner, and generate fully qualified names such as calc::Parser::token::INT. Lines 40-55 contain C++ code which is copied verbatim to Parser1.cpp and consists of two array definitions and a support function. Lines 59-92 comprise the rules which reduce non-terminals via C++ code productions into values which can be output. Don’t be put off by the syntax $$, $1 etc., they are real type-safe variables that can be operated on like any other C++ variable. Importantly, the variable $$ is always the result of the production, which is saved for further use, while $N refers to the N’th token in the matched sequence.

The definition of the error() function after the second %% is necessary, and we could have put the main() function here too, or have put both in external source code files. This file can be processed using:

$ bison grammar1.y

Or under Windows:

> win_bison grammar1.y

We can now compile our generated C++ files in order to create a working program:

$ g++ -o calc1 Scanner1.cpp Parser1.cpp
$ ./calc1

Or under a Windows Visual Studio command prompt:

> cl /EHsc /FeCalc1.exe Parser1.cpp Scanner1.cpp
> Calc1

Both source files and a build script for Windows in a Zip file are available to download from part 3. If you’re interested in playing around with this program, here are a few ideas for features (in no particular order):

  • Check for divide-by-zero and output an error message
  • Conversion between integer and floating-point types
  • Allowing mixed-mode arithmetic
  • Outputting a prompt for user input, such as “? “
  • Set the output stream of the Parser class using a named member variable
  • Use of strings rather than characters for variable names
  • Access to built-in math functions log, sin etc.
  • Use of arbitrary-precision types instead of long long and double
  • Location tracking using bison‘s location type
  • Improved syntax error messages

In the final article of this mini-series we’ll look at the necessary changes to both source files in order to interface a C++ scanner class, still generated by flex, with a C++ parser class.

3 thoughts on “Generating C++ programs with flex and bison (2)”

  1. Hello there, I’m developer of winflexbison package.

    You mentioned a bug in the win_bison
    <<(Note: the version of win_bison.exe I compiled from source appears to have a small bug and would not accept the %header line;

    I checked bison original source code (v3.7.4) and didn't find any evidence of %header option (see parse-gram.y source file).

    As a current solution I propose to use `%defines "Parser.hpp"` that should work.

    If you know that unix bison accepts %header option, please create an issue on github and provide bison version which support it.

    Like

    1. Hi Alex,

      Many thanks for the solution, confirmed as working here with my win_bison.exe. I will update the article to reflect this.

      (The version of bison which allows “–header=” instead of “–defines=” is referenced as 3.8 in the manual, which is not yet a release.)

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s