Writing a pseudocode compiler (2) – Abstract syntax tree

In this article we’ll look at some of the design decisions to be made when implementing an abstract syntax tree in C++, called “abstract” because of being a (slight) simplification of the source text. There is no bison (or flex) code involved, just pure C++ and textual output of Javascript; the AST has to contain sufficient information about the source to enable generation of the output code (this is almost true, usually at least a symbol table is needed as well, this is covered in a later article).

Firstly to name of the base class of the class hierarchy I chose class Tree as that is what it represents, other common names could be Node or AST. I decided against using smart pointers, or even a destructor, so memory claimed via new is only released when the program exits. (This would probably be the case even if we used the obvious choice of std::shared_ptr, so little would be gained except to slow the compiler down slightly.) Here is the relevant part of the class definition:

Continue reading “Writing a pseudocode compiler (2) – Abstract syntax tree”

Writing a pseudocode compiler (1) – Setting the scene

Having an interest in computer science usually means that one gravitates towards use and implementation of compiled languages. (Writing programs that write programs is more fun than writing programs!) Having explored the use of flex and bison by generating an expression evaluator (calculator program) in a previous mini-series I wanted to progress onto implementing a full compiler. At the time of writing the compiler is feature-complete (for the pseudocode specification which was used – see under “Resources” below), however it has several known (and probably more unknown) bugs and needs a slight code clean-up. The source code (with Linux and Windows build scripts) and a 32-bit Windows .exe are available to download from this page.

Continue reading “Writing a pseudocode compiler (1) – Setting the scene”

A use case for perfect forwarding

The Standard Library classes (such as std::string and std::vector) are not usually regarded as being able to be derived from. They are little black boxes whose functionality is clearly specified in the C++ Standard and which should not be redefined. However, let us put this to one side for a moment and consider that it should be possible, if not always desirable.

Say we want to add two new member functions to std::string, being utf8at() (which takes a index and returns a char32_t), and an overload to append() (which takes a char32_t). There are a number of ways of doing this:

Continue reading “A use case for perfect forwarding”

Calling C++ code from Kotlin

In case you didn’t already know, Kotlin is a fairly new yet surprisingly mature programming language that targets the JVM (Java Virtual Machine). Simply put, Kotlin compiles to the same bytecode that Java compiles to (compilation to JavaScript is also supported). Kotlin is fully supported for writing apps using Android Studio, hence its surge in popularity, and can call native C/C++ code in the same way that Java can, through the JNI (Java Native Interface), being the subject of this article.

The header file jni.h is a prerequisite for creating a suitable C++ module (actually a DLL .dll under Windows or a Shared Library .so under Linux/Android). This is in fact a C header, which is fully compatible with C++ too. However, the goal being C++ called from Kotlin, you will be pleased to learn that no actual C or Java code is required. However, lets take a look at the client (caller) code before looking at the native (callee) code (apologies for lack of syntax highlighting):

Continue reading “Calling C++ code from Kotlin”

A std::format primer

As mentioned in a previous post, the popular C++ {fmt} library has been added to Modern C++ (in the form of the standard header <format>). This library’s development into standardization has seen a number of considerable improvements over the original, which of course remains available to non-bleeding-edge versions of compilers. The latest Microsoft C++ compiler (version 19.29 at the time of writing) supports std::format and supporting functions (by using either #include <format> or import std.core; with /std:c++latest in the options). This compiler was used to test the example code below, also requiring using namespace std;.

Continue reading “A std::format primer”

C++ folding expressions

Modern C++ has support for passing parameter packs to functions. What may not be apparent is that in addition to forwarding them to other functions, or recursively to the same function, they can be manipulated in a type-safe manner by the same operation applied to each one in turn. The concept of folding (as this application is known) may be familiar to programmers coming from a functional programming background, or to those with a background in Math.

In C++, folding expressions come in left and right forms, both unary and binary. They all operate on a parameter pack, with the expansion being implied by the use of an ellipsis (...). The binary left form can be demonstrated using std::cout inside a variadic function:

Continue reading “C++ folding expressions”

Templates in C++ primer (4)

In this article we’re going to look at an application of Substitution Failure Is Not An Error (SFINAE). The use case described actually uses SFINAE twice, firstly to create a traits template very similar to the one already seen in this mini-series, and secondly to switch a function template instantiation on or off.

So let’s first define the problem. Other languages allow a class to define a .toString() method (or similarly named, Python uses .__str__()) in order to allow output of objects of that class. As a starting point we might try to define a template function along the lines of:

template<typename T>
ostream& operator<< (ostream& os, const T& obj) {
    return os << obj.to_string();
}

This naïve approach might work depending on your library implementation and code structure; problems arise when (as you might guess) an ambiguity occurs as to whether template instantiation should be attempted.

Continue reading “Templates in C++ primer (4)”

Templates in C++ primer (3)

In this article we’re going to look at how to output a std::tuple to a std::ostream, such as the familiar std::cout. The method used is an example of TMP (template metaprogramming), where the compiler generates code for us at run-time. In this case we need it to output every element of a std::tuple (with a separator) in a fully generic (any combination of types and tuple sizes) and type-safe way (as we would expect from C++).

But let’s not get ahead of ourselves. Firstly, let’s appreciate that templates can be used to output a std::pair (the element type for all associative containers, thus giving this code a use case). The actual effort involved is not great, literally just a one-line function (template):

Continue reading “Templates in C++ primer (3)”

Templates in C++ primer (2)

In the first part of this mini-series we looked at the different types of template parameter and the syntax involved in using them. In this part we’re going to look at two idioms that involve use of templates in C++, both of which have acronyms: CRTP and SFINAE.

Curious Recurring Template Pattern

From the earliest days of the availability of templates for C++ (around the mid-nineties, shortly before the publication of the C++98 standard), CRTP has been so named and known about. The pattern it describes is not difficult to comprehend – a base class has a template type parameter which is the type of a (single) derived class. What does take some explaining is that this has a use case in fully describing compile-time polymorphism (as opposed to run-time polymorphism which uses virtual functions, or compile-time “duck-typing” which is the usual behavior of template type parameters).

Continue reading “Templates in C++ primer (2)”

Templates in C++ primer (1)

Types of template

There are three different types of templated entity in C++, they are:

  • Class templates
  • Function templates
  • Variable templates

Of these, class templates are the most flexible as they can optionally be both fully or partially specialized. Function templates (including member function templates) can be optionally (only) fully specialized. A template declaration or definition differs from a normal declaration or definition by being prefixed with: template<parameters...>

Types of template parameters

There are (again) three different types of template parameters in C++, they are:

Continue reading “Templates in C++ primer (1)”

Locking C++ streams for multi-threaded code

Let’s start by introducing a fun program which has a slight defect (can you spot it?):

#include <future>
#include <vector>
#include <iostream>

using namespace std;

void print(int n) {
    cout << "This is line " << n << '\n';
}

int main() {
    vector<future<void>> tasks;
    for (int i{}; i != 10; ++i) {
        tasks.push_back(async(launch::async, &print, i + 1));
    }
}
Continue reading “Locking C++ streams for multi-threaded code”

A handful of C++ idioms

Just like in natural (spoken) languages, C++ coders will probably use idioms from time to time. These are code patterns that perform a specific task, whose function may not be obvious to new C++ coders. Once learnt, however, they can be applied in a variety of settings; a few useful ones are listed here:

1 – Shrink-to-fit

Used with std::string and std::vector mostly, the aim is to reduce memory footprint in a running program. It may be thought that a construct such as s.reserve(s.size()) would to this, however member function reserve() only takes action when increasing the amount of memory reserved for the container. The correct shrink-to-fit construct for a std::string object named s would be:

std::string{ s }.swap(s);
Continue reading “A handful of C++ idioms”

A Modern C++ and Unicode primer

Without doubt, the near-universal acceptance of Unicode has been one of the globalized software success stories of recent years. With its availability, internationalization (i18n) is no longer an esoteric topic, while localization (l10n) of software can, in many cases, be performed without either recompilation of, or even access to, the source-code.

The Unicode Standard (UCS-4) defines slightly over a million code points, which are often written as hexadecimal (eg. U+20AC for the Euro currency symbol). A number of encodings exist in 32, 16 and eight-bit forms, in both big- and little-endian (they are: UTF-32BE, UTF-32LE, UTF-16BE, UTF-16LE and UTF-8). The UTF-8 (Unicode Transformation Format – Eight Bit) encoding is possibly the most common and can encode any Unicode code point (from either UCS-2/16 bit or UCS-4/32 bit) into a code sequence of between one and four bytes in length.

Continue reading “A Modern C++ and Unicode primer”

Reference semantics for C++ classes

A number of other modern, object-oriented programming languages use the keywords struct and class but unlike C++ they differ when making copies of objects of these types, possibly by assignment or by use as a function parameter. Simply put, structs (value types) are passes by value, while classes (reference types) are passed by reference. The goal, of course, is efficiency; a reference is cheap to copy.

This article intends to demonstrate that this is also possible with Modern C++, but before that a little (re-)introduction to a long-established coding principle is in order. The “pimpl” idiom (an abbreviation of private implementation) is a way of separating interface from implementation for reasons of code confidentiality and compilation speed. Consider the following class definition:

Continue reading “Reference semantics for C++ classes”

Logical and bitwise operators

Decisions in if-statements often have to be based on one or more conditions evaluating to (logically) true or false. There are three logical operators in C++ which operate on expressions and yield a boolean result; they are “and” (&&), “or” (||) and “not” (!). Modern C++ allows the keywords and, or and not to be used instead of the more traditional symbolic versions; there is no difference between the two and the keywords may even be implemented as macros.

Both of the binary operators (“or” and “and”) “short-circuit” which means that the value of the first operand determines whether the second is evaluated at all:

Continue reading “Logical and bitwise operators”