C++ Coroutines Primer (2)

In the previous article we looked at co_yield producing a result from a coroutine and co_return (without parameter) causing an early “return” from a coroutine. In this article we’re going to look at the third C++ coroutine keyword, being co_await, and introduce the C++ boilerplate code necessary to allow it to function.

While co_await and co_yield are closely related, the semantics and meaning are slightly different: co_await says “suspend the current coroutine function until further notice, then resume at this point when directed by the caller“, while co_yield says “return this value to the caller right away, then resume at this point when another value is requested“. When control flow through a coroutine arrives at co_await a;, where a is an object we’ll call a “coroutine context”, a number of steps take place:

  • A coroutine state object is allocated on the heap (free store)
  • All variables local to the coroutine function are copied to this object
  • Control flow is allowed to continue via a method of a, the coroutine context, passing the newly assigned state object to that method

To illustrate this process in action, consider a function that acts as a progress meter, maintaining its own timing state and outputting to the terminal every so often (say every half-second). This could be implemented using threads, but we’ll choose to stay single threaded by using a coroutine. (We expect to call the coroutine far more often that it would update its progress, but the cost in additional execution time will be small.)

Here’s a fun program which eats a lot of single-threaded CPU time, calculating all of the primitive (non-multiple) Pythagorean Triples (such as (3, 4, 5) and (5, 12, 13)) up to a set maximum length for the hypotenuse. (This code is not intended to demonstrate an efficient algorithm.)

using SideT = unsigned;

int main(const int argc, const char *argv[]) {
    std::vector<std::tuple<SideT,SideT,SideT>> triples;
    SideT max_hypot = (argc == 2) ? atoi(argv[1]) : 2500;
    for (SideT c = 2; c != max_hypot; ++c) {
        for (SideT b = 2; b != c; ++b) {
            SideT a = 2;
            while ((a != b) && ((a * a + b * b) < (c * c))) {
            if ((a * a + b * b) == (c * c)) {
                if (std::gcd(a, b) == 1) {
                    triples.push_back({ a, b, c });
    // display results

Say we want to update our progress every triples.push_back(), but only if 500ms or more has elapsed. Here are the supporting CoroCtx and AwaitCtx classes and the coroutine itself:

struct CoroCtx {
    struct promise_type {
        CoroCtx get_return_object() { return {}; }
        std::suspend_never initial_suspend() { return {}; }
        std::suspend_never final_suspend() noexcept { return {}; }
        void unhandled_exception() {}
        void return_void() noexcept {}

struct AwaitCtx {
    std::coroutine_handle<> *hp;
    constexpr bool await_ready() const noexcept { return false; }
    void await_suspend(std::coroutine_handle<> h) { *hp = h; }
    constexpr void await_resume() const noexcept {}

CoroCtx progress(std::coroutine_handle<> *continuation_out, auto n, auto delay) {
    AwaitCtx a{ continuation_out };
    auto t0{ std::chrono::high_resolution_clock::now() };
    for (;;) {
        co_await a;
        if (auto t1{ std::chrono::high_resolution_clock::now() }; (t1 - t0) >= delay) {
            std::cout << "Found " << n() << " triples...\n";
            t0 = t1;

The CoroCtx class consists of just a nested struct with name promise_type; this struct has to have a method get_return_object() which returns an instance of the outer class. The AwaitCtx has a member variable of type pointer to std::coroutine_handle<> in order to maintain its own state, plus three methods, one of which updates this member variable. (These are the minimal definitions necessary.)

The progress() coroutine itself has CoroCtx as the return type and takes a pointer continuation_out, a callable n assumed to return an integer, and a delay as parameters, the last two are declared auto for convenience. This function is called only once, in order to initialize the local variables of the coroutine, being a and t0. The co_await a; occurs inside an infinite loop with a test (t1 - t0) >= delay producing output if met.

The adaptation of our previous silent program is not complex, three additional lines in total:

int main(const int argc, const char *argv[]) {
    std::vector<std::tuple<SideT,SideT,SideT>> triples;
    std::coroutine_handle<> running_total;
    progress(&running_total, [&triples]{ return triples.size(); }, 500ms);
    SideT max_hypot = (argc == 2) ? atoi(argv[1]) : 2500;
// ...
                if (std::gcd(a, b) == 1) {
                    triples.push_back({ a, b, c });
// ...

A coroutine-restarting callable running_total of type std::coroutine_handle is initialized and its address passed to progress() (along with a trivial lambda function and time delay). This is then invoked using running_total(); as often as we like, guaranteed to only output if the test in the coroutine after co_await is satisfied, and is assumed to have a low overhead (potentially less than using a separate thread). Sample output from a run is:

Found 141 triples...
Found 181 triples...
Found 210 triples...
Found 232 triples...
Found 250 triples...
Found 265 triples...
Found 280 triples...
Found 291 triples...
Found 307 triples...
Found 320 triples...
Found 329 triples...
Found 339 triples...
Found 348 triples...
Found 356 triples...
Found 367 triples...
Found 374 triples...
Found 381 triples...
Found 390 triples...
Finished with 395 triples:

That concludes our look at just about the simplest use of co_await. Further details about the technicalities of coroutines are available at the links below, as is the full source code for this article.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s