How many ways to store a string?

C++ inherits its string-literal syntax from C, that is any number of characters enclosed between double quotes is copied verbatim to the generated executable (with a trailing NUL-byte added automatically). A pointer to the first character is what the compiler assigns to any “string” variable; this points to a read-only part of the running program.

auto s1 = "abracadabra!"; // s1 is actually of type: const char *

To get write access to this string, a C-style array can be used; this also allows use of the begin()/end() family of function templates, so your “string” is now compatible with range-for.

char s2[] = "abracadabra!"; // s2 is of type: char[13]
s2[0] = 'A';
for (auto c : s2) {
    cout << "- " << c << '\n';
}

Here the string-literal is still stored read-only, and is copied into the stack space reserved by the run-time when the function is entered. In addition, calling a function with s2 as an argument would cause only a pointer to the first character to be passed; the length attribute is “lost” (this feature of C++ is known as array decay).

To always pass the length with the string, and in cases where mutability is not required, then since C++17 we have std::string_view. As a function parameter, this type is constructible from built-in “string” types as well as std::string, and is not usually declared const or reference (&) in the parameter list as these attributes are implied. The standard literal suffix for std::string_view is: ""sv.

void f(string_view s) {
    for (auto c : s) {
        cout << "- " << c << '\n';
    }
}
//...
    f(s1); // Note: trailing NUL-byte is not output

If you want to pass a (mutable) string by value to a function (as opposed to by reference) then look no further than std::array which, like std::string_view does not use heap memory. The length of the string must be known at compile-time.

template <typename T,size_t N>
void g(array<T,N> s) {
    s[0] = toupper(s[0]);
    for (auto c : s) {
        cout << "- " << c << '\n';
    }
}
//...
    array s4{ "abracadabra!" }; // Note: uniform initialization syntax
                                // type is: std::array<char,13>
    g(s4); // Note: trailing NUL-byte is output

To store a string on the heap, operator new[] can be used. However don’t forget to always have a corresponding call to delete[] or you’ll end up with a (silent) memory leak. Much safer and better in Modern C++ is to use a std::unique_ptr<char[]>.

auto s5 = new char[]{ "abracadabra!" }; // s5 is of type: char *
s5[0] = 'A';
delete[] s5;

unique_ptr<char[]> s6(new char[]{ "abracadabra!" }); // braces inside brackets
s6[0] = 'A';

If std:array or operator new is too inflexible (as the length cannot easily be extended) and you want to use a standard container, you could consider std::vector<char>. Of course this always uses heap memory, however it has the advantage of being able to be passed to functions either by reference or by value. (Initializing with a string-literal is not possible.)

vector s7{ 'a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a', '!' };
s7[0] = 'A';
for (auto c : s7) {
    cout << "- " << c << '\n';
}

Just a quick mention of a variation of the above, which is to use std::deque<char>. Being a double-ended queue it allows fast insertion at or deletion from the beginning as well as the end.

Which brings us (finally) to std::string. This has to be every C++ programmers first choice for a string class, providing mutability, pass by value and by reference and even not using heap memory for short strings (with length under 16 typically, this is the short-string optimization). Just about everything possible with any of the previous examples can be performed with a std::string.

void h(const string& s) { // Note: pass by const-reference
    for (auto c : s) {
        cout << "- " << c << '\n';
    }
}
//...
    auto s8 = "abracadabra!"s; // or: string s6 = "abracadabra!";
    s8[0] = 'A';
    h(s8);
    h(s1); // Note: implicit conversion from const char * is permitted

So which to use? It’s hard to recommend anything other than std::string (or maybe std::u8string in Modern C++) as the storage type, however for accepting a read-only “string” parameter std::string_view is an attractive and efficient choice as it can be initialized from both const char * and std::string. Both std::string and std::string_view have constexpr variants in C++20 so it’s difficult to see a need for C-style arrays or new[]/delete[] even in metaprogramming. If heap memory must not be used, and a container interface is required (front(), begin(), size() etc.) then you could consider std::array, however the length is always fixed at compile-time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s