New floating-point types in C++23

In common with the C language we have been used to having the float, double and long double types available since the first versions of C++. (Did you know that long double often uses 80 bits of precision, padded to 96 bits, on 32-bit platforms, but a full 128 bits on 64-bit platforms?) In this article we’re going to discuss the <stdfloat> header new in C++23, which provides types with explicit precisions between 16 and 128 bits regardless of hardware platform, within the std:: namespace.

The first point to note is that even if this header is present (which it isn’t yet with Clang’s libc++), support for all of the five new floating-point types is not guaranteed. For each of these there are feature test macros with value 1 if support is present. Even then, hardware support is not guaranteed—these types may instead be emulated in software by the standard library (or there could be typedef to a user-defined type).

There are two new 16-bit types, these being float16_t and bfloat16_t. The latter of these (known as Brain Floating-Point, with applications in neural networks) has the same exponent range as 32-bit floats, but with much reduced precision—only 8 bits. Then there are float32_t, float64_t and float128_t, with the first two of these mapping exactly to float and double in terms of precision and exponent range (and therefore native hardware support). On 32-bit machines, float128_t support most likely either does not exist or will use software emulation, while hardware support for bfloat16_t is probably limited to the GPU.

The feature test macros have names of the form __STDCPP_FLOAT128_T__ and can be tested for with #if (support for these does not yet appear to be present in Clang). All types except for bfloat16_t map to C native types of the form _Float128, which is useful if binary compatibility is required. There are also literals suffixes of the form f128 (or F128) available when the type alias is supported (it would appear that using namespace std::literals is not necessary with either GCC or Clang, but output directly to streams is not currently supported by Clang).

Here is a sample program to illustrate basic usage, with output being 1.20312 x 9.99922e+09 = 1.20125e+10:

#include <stdfloat>
#include <iostream>

#if __STDCPP_BFLOAT16_T__ != 1
#error No support for 16-bit brain float
#endif

int main() {
    auto a = 1.2bf16;
    std::bfloat16_t b = 1e10bf16; // suffix literal is needed
    std::cout << a << " x " << b << " = " << a * b << '\n';
}

The table below shows the literal suffices, C language type and ranges for all five types defined in header <stdfloat>:

Type
Literal suffixC language typeType properties
bits of storagebits of precisionbits of exponentmax exponent
float16_tf16 or F16_Float161611515
float32_tf32 or F32_Float3232248127
float64_tf64 or F64_Float646453111023
float128_tf128 or F128_Float1281281131516383
bfloat16_tbf16 or BF16(N/A)1688127

It would appear that conversions between different types is not implicit, so unless using auto, the type specifier match the suffix exactly. Also, mixed-mode arithmetic between bfloat16_t and float16_t, and automatic promotions to larger types may not be provided.

To summarize, the presence of the <stdfloat> header provides a standardized way for the C++ library to target code to different floating-point hardware for the specified precision. Fixed-width types are beneficial in scenarios requiring precise memory control or specific performance characteristics, such as embedded systems or high-performance computing. In the case that new hardware support is provided, a simple recompile is all that is needed.

Leave a comment