In this final mini-series article we’ll cover using flex
to generate a C++ scanner, and use this with a slightly modified version of our previous bison
C++ parser program. A number of changes need to be made to both of the source files, as well as there being two new header dependencies, making for three header files in total (we don’t want to ask flex
to create a header, this time). The first of these new headers is called FlexLexer.h
and is distributed with the flex program in $PREFIX/include
where $PREFIX
is where flex was installed, such as /usr/local/
. (Under Windows this file is probably in the same directory as win_flex.exe
.) This header defines an abstract base class (ABC) FlexLexer
from which the implementation class (snappily named yyFlexLexer
) derives.
The problem we encounter is that the FlexLexer
hierarchy ultimately defines the lexer member function as int yyFlexLexer::yylex(void)
which is of no use to us, as we need to pass in the pointer to the semantic type of the token so it can be populated with the emplace<Type>(Value)
method. One way round this restriction is to derive again from yyFlexLexer
and declare a new member function, shown here as file Scanner2.hpp
(note this file cannot be auto-generated):
namespace calc { // note: depends upon FlexLexer.h and Parser2.hpp
class Scanner : public yyFlexLexer {
public:
Scanner(std::istream& arg_yyin, std::ostream& arg_yyout)
: yyFlexLexer(arg_yyin, arg_yyout) {}
Scanner(std::istream* arg_yyin = nullptr, std::ostream* arg_yyout = nullptr)
: yyFlexLexer(arg_yyin, arg_yyout) {}
int lex(Parser::semantic_type *yylval); // note: this is the prototype we need
};
} // namespace calc
Notice we’ve put this class definition into the calc
namespace as for the parser and have called the member function lex()
, not yylex()
. The constructors allow us to initialize the input and output streams when we create the calc::Scanner
object, defaulting to std::cin
and std::cout
. This is a key difference to the previous scanner: we use C++ streams not FILE*
‘s. (Other member functions of this class hierarchy allow for switching streams during scanning.)
Perhaps surprisingly, there aren’t very many changes necessary to the flex
input file, below is shown lexer2.l
:
%{
#include "Parser2.hpp"
#include "Scanner2.hpp"
#define YY_DECL int calc::Scanner::lex(calc::Parser::semantic_type *yylval)
%}
%option c++ interactive noyywrap noyylineno nodefault outfile="Scanner2.cpp"
dseq ([[:digit:]]+)
dseq_opt ({dseq}?)
frac (({dseq_opt}"."{dseq})|{dseq}".")
exp ([eE][+-]?{dseq})
exp_opt ({exp}?)
integer ({dseq})
float (({frac}{exp_opt})|({dseq}{exp}))
intvar ([[:upper:]])
fltvar ([[:lower:]])
%%
{integer} yylval->emplace<long long>(strtoll(YYText(), nullptr, 10)); return Parser::token::INT;
{float} yylval->emplace<double>(strtod(YYText(), nullptr)); return Parser::token::FLT;
{intvar} yylval->emplace<char>(YYText()[0]); return Parser::token::INTVAR;
{fltvar} yylval->emplace<char>(YYText()[0]); return Parser::token::FLTVAR;
"+" return Parser::token::PLUS;
"-" return Parser::token::MINUS;
"*" return Parser::token::MULTIPLY;
"/" return Parser::token::DIVIDE;
"%" return Parser::token::MODULO;
"!" return Parser::token::FACTORIAL;
"^" return Parser::token::EXPONENT;
"(" return Parser::token::LPAREN;
")" return Parser::token::RPAREN;
"=" return Parser::token::ASSIGN;
\n return Parser::token::EOL;
<<EOF>> return Parser::token::YYEOF;
. /* no action on unmatched input */
%%
int yyFlexLexer::yylex() {
throw std::runtime_error("Bad call to yyFlexLexer::yylex()");
}
int main() {
calc::Scanner scanner{ std::cin, std::cerr };
calc::Parser parser{ &scanner };
std::cout.precision(10);
parser.parse();
}
Lines 2-3 are new, as is the %option
line where we’ve substituted c++
instead of reentrant
. The output filename for the implementation file has changed slightly, and we also no longer auto-generate a header.
The patterns at lines 9-18 are identical to before, with the only change at lines 22-38 for the rules being the (purely stylistic) use of member function YYText()
instead of yytext
(which is now a member variable). Remember the code for the rules all goes into the member function calc::Scanner::lex()
, we control this visibility by setting the YY_DECL
macro at line 4.
Lines 42-44 define a dummy yyFlexLexer::yylex()
function so that we are able to instantiate this (super-)class when creating a calc::Scanner
object. Lines 46-51 define a slightly simpler main()
program than before, the logic of which should be clear as it achieves the same as before.
As before process with:
$ flex lexer2.l
Or under Windows:
> win_flex --wincompat lexer2.l
We still need the header Parser2.hpp
before compilation can take place. The (slightly modified) bison
input file grammar2.y
, follows:
%{
#include <iostream>
#include <string>
#include <cmath>
#include <FlexLexer.h>
%}
%require "3.7.4"
%language "C++"
%defines "Parser2.hpp"
%output "Parser2.cpp"
%define api.parser.class {Parser}
%define api.namespace {calc}
%define api.value.type variant
%parse-param {Scanner* scanner}
%code requires
{
namespace calc {
class Scanner;
} // namespace calc
} // %code requires
%code
{
#include "Scanner2.hpp"
#define yylex(x) scanner->lex(x)
}
%token EOL LPAREN RPAREN
%token <long long> INT
%token <double> FLT
%token <char> INTVAR FLTVAR
%nterm <long long> iexp
%nterm <double> fexp
%nonassoc ASSIGN
%left PLUS MINUS
%left MULTIPLY DIVIDE MODULO
%precedence UMINUS
%precedence FACTORIAL
%right EXPONENT
%code
{
namespace calc {
long long ivars['Z' - 'A' + 1];
double fvars['z' - 'a' + 1];
long long factorial(long long n) {
if (n < 2) {
return 1;
}
return n * factorial(n - 1);
}
} // namespace calc
} // %code
%%
lines : %empty
| lines line
;
line : EOL { std::cerr << "Read an empty line.\n"; }
| iexp EOL { std::cout << $1 << '\n'; }
| fexp EOL { std::cout << $1 << '\n'; }
| INTVAR ASSIGN iexp EOL { ivars[$1 - 'A'] = $3; }
| FLTVAR ASSIGN fexp EOL { fvars[$1 - 'a'] = $3; }
| error EOL { yyerrok; }
;
iexp : INT { $$ = $1; }
| iexp PLUS iexp { $$ = $1 + $3; }
| iexp MINUS iexp { $$ = $1 - $3; }
| iexp MULTIPLY iexp { $$ = $1 * $3; }
| iexp DIVIDE iexp { $$ = $1 / $3; }
| iexp MODULO iexp { $$ = $1 % $3; }
| MINUS iexp %prec UMINUS { $$ = -$2; }
| iexp FACTORIAL { $$ = factorial($1); }
| LPAREN iexp RPAREN { $$ = $2; }
| INTVAR { $$ = ivars[$1 - 'A']; }
;
fexp : FLT { $$ = $1; }
| fexp PLUS fexp { $$ = $1 + $3; }
| fexp MINUS fexp { $$ = $1 - $3; }
| fexp MULTIPLY fexp { $$ = $1 * $3; }
| fexp DIVIDE fexp { $$ = $1 / $3; }
| fexp EXPONENT fexp { $$ = pow($1, $3); }
| MINUS fexp %prec UMINUS { $$ = -$2; }
| LPAREN fexp RPAREN { $$ = $2; }
| FLTVAR { $$ = fvars[$1 - 'a']; }
;
%%
void calc::Parser::error(const std::string& msg) {
std::cerr << msg << '\n';
}
The changes up to line 15 are just the filenames modified slightly. The type and variable specified at line 16 becomes a member variable of the calc::Parser
class, and is forward-declared in the %code requires
block at lines 18-23 so that a pointer of this type can be used in the Parser2.hpp
file (this code appears near the top of the generated header).
The next %code
block at lines 25-29 allows the yylex()
call in the calc::Parser::parse
function to be translated into a member function call; the object scanner
is the member variable given as %parse-param
, which has member function lex()
, not yylex()
. The rest of the code at lines 31-102 is unchanged.
As before, this file is processed with:
$ bison grammar2.y
Or under Windows:
> win_bison grammar2.y
And the complete program can now be compiled:
$ g++ -o calc2 Scanner2.cpp Parser2.cpp
$ ./calc2
Or under Windows (note: modify /I.
to point to the location of the distributed FlexLexer.h
):
> cl /EHsc /FeCalc2.exe /I. Parser2.cpp Scanner2.cpp
> Calc2
In summary, flex
and bison
have always been able to work together when generating C code. We’ve discovered that interfacing them when generating C++ code is possible using the configuration options available, and can even be considered an elegant solution. The methods shown in this mini-series use some slightly tricky techniques to persuade the generated programs to interact, but the amount of “glue” code is small and the result is stable and scalable.
Resources
- Download or browse the source code for both example programs
- Flex on GitHub: westes/flex
- Bison on GitHub: akimd/bison
- Flex and bison manuals online
- Flex and bison for Windows (source and binary downloads)