Understanding Iterator Invalidation

As introduced in the Tutorial on this site, practising correct use of iterators is essential for correct and efficient use of the Standard Containers. This article aims to uncover the details of pitfalls that novice and intermediate C++ programmers may encounter when accessing and modifying elements with iterators. To revisit familiar territory, there are a number of different iterator categories: Forward, bidirectional and random access, together with two variants on forward iterators, these being input iterators and output iterators.

Once the data structures involved in a particular container are understood, possibly from study of library source code/documentation, the reason for provision of the type of iterator becomes obvious. Forward iterators are the least featureful, allowing only traversal of the data structure from the beginning. Bidirectional iterators can go forward or backward by one element each time, while random access iterators can “jump” forwards or backwards by any number of elements, although it is the responsibility of the programmer to stay within the bounds of the container. An iterator can access a single element’s data with read or write access, while a pair of iterators specifies a (sub-)range within the container. (This is almost the full story; input iterators can only be dereferenced once before incrementing them, while output iterators can only be written to once before incrementing them.)

Iterators are therefore akin to pointers from the C origins of C++—efficient, lightweight and with a natural syntax (*iter to dereference, ++iter to increment, etc.). They are also type-safe: an std::vector<int>::iterator is not able to be substituted for an std::list<std::string>::iterator. Unfortunately, because they are not “owned” by their respective container, they can be made to point to the wrong location (possibly causing undefined behavior). This phenomenon is known as iterator invalidation.

If a container goes out of scope, all of its heap data is automatically destroyed and therefore all iterators would become invalidated. Due to scoping rules it is unlikely that any iterators would still be in existence themselves, so use of them would be impossible. Similarly if method clear() is invoked on any container, all iterators would be invalidated:

void f() {
    std::vector<double> vd{ 1.1, 2.2, 3.3 };
    auto iter = ++vd.begin();  // iter points to 2.2
    std::cout << *iter << '\n';
    vd.clear();  // all iterators invalidated
    std::cout << *iter << '\n';  // undefined behavior, may print garbage or cause program termination
}

It is difficult for the compiler to warn about such programming errors, so they must be manually detected by the test suite or during code review. Note that there is no such thing as a “checked iterator” (along the lines of std::vector‘s at() method) in the Standard Library.

To make matters more complex, different containers (each with one of the three main iterator categories outlined above) provide different (overlapping) operations each of which have varying rules for causing iterator invalidation. An incomplete list is provided in the following table:

ContainerInsertion of Element(s)Removal of Element(s)Other Operations
std::vector and std::stringInvalidates all iterators if the insertion causes a reallocation, otherwise iterators after the insertion point are invalidatedpop_back() invalidates only the iterator to the last element, erase() invalidates iterators at and after the point of removalresize(), reserve() and shrink_to_fit() cause reallocation and invalidate all iterators
std::dequepush_back(), push_front(), etc. may invalidate all iterators, especially if reallocation occurs, insert() invalidates iterators at and after the point of insertionpop_back() and pop_front() only invalidate iterators to the last or first element, respectively, erase() invalidates iterators at and after the point of removal.
std::list and std::forward_listInsertion does not invalidate iterators, as lists do not reallocate memoryerase() invalidates only the iterator to the erased elementMoving elements between lists with splice() does not invalidate iterators to other elements in either list
std::set, std::multiset, std::map, and std::multimapInsertion with insert() or emplace() does not invalidate iteratorserase() invalidates only the iterator to the erased element
std::unordered_set, std::unordered_map, std::unordered_multiset, std::unordered_multimapIf rehashing occurs (due to exceeding the load factor), all iterators are invalidated, otherwise no iterators are invalidatederase() invalidates only the iterator to the erased element
std::flat_mapAs for std::vector with the added consideration that the insertion point must maintain sorted order, iterators after this point are invalidatedAs for std::vector, erase() invalidates iterators pointing to the erased elements and any iterators after the erased elementAs for std::vector

As well as for the Standard Containers, iterators into C++20 Ranges (for example std::views::filter and std::views::transform) can become invalidated through write operations. This video from CppCon 2023 of a talk by Nicolai Josuttis features the following program which contains a bug:

#include <ranges>
#include <vector>
#include <iostream>

int main() {
    auto print = [](const auto& coll){ for (auto elem : coll) std::cout << elem << ' '; std::cout << '\n'; };
    auto isEven = [](auto&& i){ return i % 2 == 0; };

    std::vector<int> coll{ 1, 4, 7, 10 };
    print(coll);

    auto collEven = coll | std::views::filter(isEven);
    for (int& i : collEven) {
        i += 1;
    }
    print(coll);

    for (int& i : collEven) {
        i += 1;
    }
    print(coll);
}

The erroneous output (third line) from running this program is:

1 4 7 10 
1 5 7 11
1 6 7 11

Here 5 is incremented to 6 although it is not an even number. The reason for this (as explained in the talk) is that the view caches its begin() member, for performance reasons. The Standard states (again quoted from the talk):

Modification of the element a filter_view::iterator denotes is permitted, but results in undefined behavior if the resulting value does not satisfy the filter predicate.

For this reason it is difficult to recommend using std::ranges::views to modify elements in-place.

That just about wraps up this article. We’ve seen that some operation(s) (in addition to clear()) on different types of Standard Containers invalidate some or all iterators, and that in-place modification of std::views::filter() ranges are risky. The recommendation would be to audit all container method calls within a container’s iterator(s) lifetime, and to be aware of the fact that begin() is cached for views, meaning they should be treated as read-only.

Leave a comment