Evaluating expressions using Reverse Polish Notation (1)

Most Mathematicians write expressions using infix notation, where the precedence and associativity of each operator is implied by standard math conventions. (Thus 1+2×3 is 7, not 9.) In 1924 the Polish logician Jan Łukasiewicz postulated prefix notation, which he called “Polish Notation”. A possible application of Polish Notation is translating expressions as named function calls (assuming suitably-named functions are available), such as: sum(1, product(2, 3)). However this isn’t particularly useful as function call overhead would in practice make this method of evaluation inefficient.

Around the time of the first Fortran compiler, the application of postfix notation was explored, which was named Reverse Polish Notation (RPN). In this form, the operators are written after the operands, thus: 1 2 3 × +. The method for parsing an expression written in Reverse Polish Notation using a Stack Machine is as follows:

In the case of a number, push it onto the top of the stack
In the case of an operator, pop two numbers off the stack, apply the operator, and push the result back onto the stack
The outcome from parsing a well-formed RPN expression should be a stack containing exactly one element, this being the value of the final result

A useful property of this method is that an sufficiently deep stack can cope with (infix) expressions of arbitrary complexity, including the use of parentheses. The challenge of converting infix to postfix is the subject of this mini-series.

Let’s start with the easy part which is a stack “simulator” which accepts a number to push or an operator to apply, and just prints them out; the order of this output will be the expression converted to Reverse Polish notation. As we’re planning to extend the program later, a class hierarchy is used where StackPrinter inherits from an abstract base class (ABC) StackMachine.

// rpn-calc-printer.cpp : convert an infix expression to reverse polish notation

#include <iostream>
#include <cctype>

using std::cout;
using std::cerr;
using std::cin;
using std::istream;
using std::ostream;

using Number = double;

enum class Op : char {
    add = '+',
    subtract = '-',
    multiply = '*',
    divide = '/',
    number = '.',
    none = '\0'
};

class StackMachine {
public:
    virtual ~StackMachine() = default;
    virtual void apply_operator(Op op) = 0;
    virtual void push_number(Number n) = 0;
};

class StackPrinter final : public StackMachine {
    ostream& out;
public:
    StackPrinter(ostream& os = cout) : out{os} {}
    virtual ~StackPrinter() {
        out << '\n';
    }
    virtual void apply_operator(Op op) override {
        out << ' ' << static_cast<char>(op);
    }
    virtual void push_number(Number n) override {
        out << ' ' << n;
    }
};

Lines 3-10 cover our use of the Standard Library. The using-declaration at line 12 allows the number type to be changed later with just one change to the code, while the strongly typed enum defined at lines 14-21 covers all of the operators we will need (and has its underlying type set to char).

The definition of the StackMachine base class at lines 23-28 should present no surprises if your knowledge of pure virtual syntax is up to speed. The concrete class StackPrinter at lines 30-43 is designed to print all of its numbers and operators on one line, which is why its destructor prints out a new-line.

Moving on to the ExpressionEvaluator class it is clear that this needs to take a reference to a StackMachine and store this as a class variable. It will also need a reference to an input stream and a single variable to store the lookahead symbol, which is a struct named Symbol with two fields. The ExpressionEvaluator class will perform top-down recursive descent parsing, and luckily the Mathematical expressions we will use only need us to look one symbol ahead (the input “language” is said to have an LL(1) grammar).

struct Symbol {
    Op sym;
    Number num;
};

class ExpressionEvaluator {
    Symbol lookahead;
    istream& in;
    StackMachine& target;
    void next();
    void term();
    void factor();
public:
    ExpressionEvaluator(StackMachine& sm, istream& is = cin)
        : target{sm}, in{is} { next(); }
    void expression();
};

int main() {
    StackPrinter sp;
    ExpressionEvaluator eval(sp);
    eval.expression();
}

Lines 45-48 define the Symbol struct, where the num field is only valid if the sym field is Op::number. (A suitable std::variant type could be used instead, but would probably use more memory and need manual type-checking.) The ExpressionEvaluator class declaration at lines 50-61 only has one public member function besides its construtor, with three private member variables and three private member functions. The variable lookahead stores the “lookahead symbol” which will be set by calling next(), and this is what the constructor does. The member function expression() is intended to be called just once in order to begin the process of parsing the complete expression. The main() program at lines 63-67 sets up the necessary objects and calls this function.

All that is left is implementing the four member functions of the ExpressionEvaluator class, all of which operate on the lookahead symbol without taking any parameters or returning a value.

void ExpressionEvaluator::next() {
    char c{};
    in >> c;
    if (isdigit(c) || (c == '.')) {
        in.unget();
        in >> lookahead.num;
        lookahead.sym = Op::number;
    }
    else {
        lookahead.num = Number{};
        lookahead.sym = static_cast<Op>(c);
    }
}

void ExpressionEvaluator::expression() {
    term();
    while ((lookahead.sym == Op::add) || (lookahead.sym == Op::subtract)) {
        Op op{ lookahead.sym };
        next();
        term();
        target.apply_operator(op);
    }
}

void ExpressionEvaluator::term() {
    factor();
    while ((lookahead.sym == Op::multiply) || (lookahead.sym == Op::divide)) {
        Op op{ lookahead.sym };
        next();
        factor();
        target.apply_operator(op);
    }
}

void ExpressionEvaluator::factor() {
    if (lookahead.sym == Op::number) {
        target.push_number(lookahead.num);
        next();
    }
    else {
        cerr << "Unknown operator: "
            << static_cast<char>(lookahead.sym) << '\n';
    }
}

The lookahead symbol is read by a call to next() which is defined at lines 69-81. Only two types are supported, number or operator, and the operator symbol is read without checking it is valid if a number cannot be read. The expression() member function at lines 83-91 calls term() straightaway; this is an indication of recursive descent parsing (although the program does not construct a binary “tree”). Then it continues to handle all cases of add and subtract in a while loop, which stores the operator, calls next() again and then term(), before applying its previous operator to the StackMachine member.

The term() member function at lines 93-101 is an exact copy of the previous member function except that it calls factor() and handles multiply and divide. By the time factor() is called, defined at lines 103-112, we are only interested in numbers, so any operator that gets this far is treated as an error. A valid number is pushed to the StackMachine member. The complete program is listed here.

Next time we’ll look at extending the parser to handle negative numbers, the exponentiation operator and parentheses, and will write a stack machine that performs the evaluation to a result. Note that using std::cin has its quirks, so you probably need to type expression, Enter, Ctrl-D (or expression, Ctrl-Z, Enter under Windows). We’ll fix this next time, too.

Evaluating expressions using Reverse Polish Notation (1)

Published by cpptutor

Leave a comment Cancel reply

Share this:

Published by cpptutor

Leave a comment Cancel reply