Generating C++ programs with flex and bison (3)

In this final mini-series article we’ll cover using flex to generate a C++ scanner, and use this with a slightly modified version of our previous bison C++ parser program. A number of changes need to be made to both of the source files, as well as there being two new header dependencies, making for three header files in total (we don’t want to ask flex to create a header, this time). The first of these new headers is called FlexLexer.h and is distributed with the flex program in $PREFIX/include where $PREFIX is where flex was installed, such as /usr/local/. (Under Windows this file is probably in the same directory as win_flex.exe.) This header defines an abstract base class (ABC) FlexLexer from which the implementation class (snappily named yyFlexLexer) derives.

The problem we encounter is that the FlexLexer hierarchy ultimately defines the lexer member function as int yyFlexLexer::yylex(void) which is of no use to us, as we need to pass in the pointer to the semantic type of the token so it can be populated with the emplace<Type>(Value) method. One way round this restriction is to derive again from yyFlexLexer and declare a new member function, shown here as file Scanner2.hpp (note this file cannot be auto-generated):

namespace calc { // note: depends upon FlexLexer.h and Parser2.hpp

class Scanner : public yyFlexLexer {
public:
    Scanner(std::istream& arg_yyin, std::ostream& arg_yyout)
        : yyFlexLexer(arg_yyin, arg_yyout) {}
    Scanner(std::istream* arg_yyin = nullptr, std::ostream* arg_yyout = nullptr)
        : yyFlexLexer(arg_yyin, arg_yyout) {}
    int lex(Parser::semantic_type *yylval); // note: this is the prototype we need
};

} // namespace calc

Notice we’ve put this class definition into the calc namespace as for the parser and have called the member function lex(), not yylex(). The constructors allow us to initialize the input and output streams when we create the calc::Scanner object, defaulting to std::cin and std::cout. This is a key difference to the previous scanner: we use C++ streams not FILE*‘s. (Other member functions of this class hierarchy allow for switching streams during scanning.)

Perhaps surprisingly, there aren’t very many changes necessary to the flex input file, below is shown lexer2.l:

%{
#include "Parser2.hpp"
#include "Scanner2.hpp"
#define YY_DECL int calc::Scanner::lex(calc::Parser::semantic_type *yylval)
%}

%option c++ interactive noyywrap noyylineno nodefault outfile="Scanner2.cpp"

dseq            ([[:digit:]]+)
dseq_opt        ({dseq}?)
frac            (({dseq_opt}"."{dseq})|{dseq}".")
exp             ([eE][+-]?{dseq})
exp_opt         ({exp}?)

integer         ({dseq})
float           (({frac}{exp_opt})|({dseq}{exp}))
intvar          ([[:upper:]])
fltvar          ([[:lower:]])

%%

{integer}       yylval->emplace<long long>(strtoll(YYText(), nullptr, 10)); return Parser::token::INT;
{float}         yylval->emplace<double>(strtod(YYText(), nullptr)); return Parser::token::FLT;
{intvar}        yylval->emplace<char>(YYText()[0]); return Parser::token::INTVAR;
{fltvar}        yylval->emplace<char>(YYText()[0]); return Parser::token::FLTVAR;
"+"             return Parser::token::PLUS;
"-"             return Parser::token::MINUS;
"*"             return Parser::token::MULTIPLY;
"/"             return Parser::token::DIVIDE;
"%"             return Parser::token::MODULO;
"!"             return Parser::token::FACTORIAL;
"^"             return Parser::token::EXPONENT;
"("             return Parser::token::LPAREN;
")"             return Parser::token::RPAREN;
"="             return Parser::token::ASSIGN;
\n              return Parser::token::EOL;
<<EOF>>         return Parser::token::YYEOF;
.               /* no action on unmatched input */

%%

int yyFlexLexer::yylex() {
    throw std::runtime_error("Bad call to yyFlexLexer::yylex()");
}

int main() {
    calc::Scanner scanner{ std::cin, std::cerr };
    calc::Parser parser{ &scanner };
    std::cout.precision(10);
    parser.parse();
}

Lines 2-3 are new, as is the %option line where we’ve substituted c++ instead of reentrant. The output filename for the implementation file has changed slightly, and we also no longer auto-generate a header.

The patterns at lines 9-18 are identical to before, with the only change at lines 22-38 for the rules being the (purely stylistic) use of member function YYText() instead of yytext (which is now a member variable). Remember the code for the rules all goes into the member function calc::Scanner::lex(), we control this visibility by setting the YY_DECL macro at line 4.

Lines 42-44 define a dummy yyFlexLexer::yylex() function so that we are able to instantiate this (super-)class when creating a calc::Scanner object. Lines 46-51 define a slightly simpler main() program than before, the logic of which should be clear as it achieves the same as before.

As before process with:

$ flex lexer2.l

Or under Windows:

> win_flex --wincompat lexer2.l

We still need the header Parser2.hpp before compilation can take place. The (slightly modified) bison input file grammar2.y, follows:

%{
#include <iostream>
#include <string>
#include <cmath>
#include <FlexLexer.h>
%}

%require "3.7.4"
%language "C++"
%defines "Parser2.hpp"
%output "Parser2.cpp"

%define api.parser.class {Parser}
%define api.namespace {calc}
%define api.value.type variant
%parse-param {Scanner* scanner}

%code requires
{
    namespace calc {
        class Scanner;
    } // namespace calc
} // %code requires

%code
{
    #include "Scanner2.hpp"
    #define yylex(x) scanner->lex(x)
}

%token              EOL LPAREN RPAREN
%token <long long>  INT
%token <double>     FLT
%token <char>       INTVAR FLTVAR

%nterm <long long>  iexp
%nterm <double>     fexp

%nonassoc           ASSIGN
%left               PLUS MINUS
%left               MULTIPLY DIVIDE MODULO
%precedence         UMINUS
%precedence         FACTORIAL
%right              EXPONENT

%code
{
    namespace calc {
        long long ivars['Z' - 'A' + 1];
        double fvars['z' - 'a' + 1];

        long long factorial(long long n) {
            if (n < 2) {
                return 1;
            }
            return n * factorial(n - 1);
        }
    } // namespace calc
} // %code

%%

lines   : %empty
        | lines line
        ;

line    : EOL                       { std::cerr << "Read an empty line.\n"; }
        | iexp EOL                  { std::cout << $1 << '\n'; }
        | fexp EOL                  { std::cout << $1 << '\n'; }
        | INTVAR ASSIGN iexp EOL    { ivars[$1 - 'A'] = $3; }
        | FLTVAR ASSIGN fexp EOL    { fvars[$1 - 'a'] = $3; }
        | error EOL                 { yyerrok; }
        ;

iexp    : INT                       { $$ = $1; }
        | iexp PLUS iexp            { $$ = $1 + $3; }
        | iexp MINUS iexp           { $$ = $1 - $3; }
        | iexp MULTIPLY iexp        { $$ = $1 * $3; }
        | iexp DIVIDE iexp          { $$ = $1 / $3; }
        | iexp MODULO iexp          { $$ = $1 % $3; }
        | MINUS iexp %prec UMINUS   { $$ = -$2; }
        | iexp FACTORIAL            { $$ = factorial($1); }
        | LPAREN iexp RPAREN        { $$ = $2; }
        | INTVAR                    { $$ = ivars[$1 - 'A']; }
        ;

fexp    : FLT                       { $$ = $1; }
        | fexp PLUS fexp            { $$ = $1 + $3; }
        | fexp MINUS fexp           { $$ = $1 - $3; }
        | fexp MULTIPLY fexp        { $$ = $1 * $3; }
        | fexp DIVIDE fexp          { $$ = $1 / $3; }
        | fexp EXPONENT fexp        { $$ = pow($1, $3); }
        | MINUS fexp %prec UMINUS   { $$ = -$2; }
        | LPAREN fexp RPAREN        { $$ = $2; }
        | FLTVAR                    { $$ = fvars[$1 - 'a']; }
        ;

%%

void calc::Parser::error(const std::string& msg) {
    std::cerr << msg << '\n';
}

The changes up to line 15 are just the filenames modified slightly. The type and variable specified at line 16 becomes a member variable of the calc::Parser class, and is forward-declared in the %code requires block at lines 18-23 so that a pointer of this type can be used in the Parser2.hpp file (this code appears near the top of the generated header).

The next %code block at lines 25-29 allows the yylex() call in the calc::Parser::parse function to be translated into a member function call; the object scanner is the member variable given as %parse-param, which has member function lex(), not yylex(). The rest of the code at lines 31-102 is unchanged.

As before, this file is processed with:

$ bison grammar2.y

Or under Windows:

> win_bison grammar2.y

And the complete program can now be compiled:

$ g++ -o calc2 Scanner2.cpp Parser2.cpp
$ ./calc2

Or under Windows (note: modify /I. to point to the location of the distributed FlexLexer.h):

> cl /EHsc /FeCalc2.exe /I. Parser2.cpp Scanner2.cpp
> Calc2

In summary, flex and bison have always been able to work together when generating C code. We’ve discovered that interfacing them when generating C++ code is possible using the configuration options available, and can even be considered an elegant solution. The methods shown in this mini-series use some slightly tricky techniques to persuade the generated programs to interact, but the amount of “glue” code is small and the result is stable and scalable.

Resources

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s