Designing Classes for Serialization (3)

In the previous articles we’ve looked at creating an in-memory XML hierarchy (or DOM), and serializing from and deserializing to this memory model. In this article we’re going to look at querying the DOM in a similar fashion to JavaScript methods such as document.getElementById() available in a browser. The scheme chosen is to overload operator [] (twice) and provide other convenience methods such as name() and numberOfChildren().

The first overload of operator [] with index size_t is the most obvious, and returns the member variable children indexed with the same number as a const XMLElement& (this involves dereferencing the unique_ptr). Out-of-bounds error checking is performed by default, rather than providing a separate at() method as for the Standard Library containers.

The second overload with index std::string_view is more tricky. We don’t make a distinction between attribute and child elements, returning the value of the child element if the name matches. Being able to return a const std::string& in either case is the key here, which is why indexing with a string implies the last and only such use of operator [] in an expression.

These two member functions in full are:

class XMLElement {
// ...
public:
    const XMLElement& operator[](size_t idx) const {
        if (idx >= children.size()) {
            throw XMLError("Index " + std::to_string(idx)
                + " out of range for element: " + name);
        }
        return *children[idx];
    }
    const std::string& operator[](std::string_view idx) const {
        auto iter1 = std::find_if(attributes.cbegin(), attributes.cend(),
            [&idx](const auto& attr){ return idx == attr.first;
        });
        if (iter1 != attributes.cend()) {
            return iter1->second;
        }
        auto iter2 = std::find_if(children.cbegin(), children.cend(),
            [&idx](const auto& child){ 
                return idx == child->name && !child->value.empty();
            });
        if (iter2 != children.cend()) {
            return (*iter2)->value;
        }
        throw XMLError("No such attribute or leaf child name "
            + std::string{ idx } + " for element: " + name);
    }
// ...

Lines 12-14 perform a linear search through the attributes (if any) using the std::find_if algorithm and a suitable lambda. If a match for the name is found, its value is returned by line 16. Lines 18-21 perform a similar search through any children, matching for the name being the same and the value being set (which implies the child being a leaf element), and returning any found value at line 23. Failure to match an attribute or leaf child causes an exception to be thrown at line 25.

Other query functions which return a std::vector<std::string>> are getAttributeNames() and getChildNames(). The implementation of these is straightforward, the remaining convenience functions are one-liners:

class XMLElement {
// ...
public:
    const std::string& elementName() const {
        return name;
    }
    size_t numberOfAttributes() const {
        return attributes.size();
    }
    size_t numberOfChildren() const {
        return children.size();
    }
    const std::vector<std::string> getAttributeNames() const {
        std::vector<std::string> attrs;
        std::for_each(attributes.cbegin(), attributes.cend(),
            [&attrs](const auto& attr){ attrs.push_back(attr.first);
        });
        return attrs;
    }
    const std::vector<std::string> getChildNames() const {
        std::vector<std::string> childNames;
        std::for_each(children.cbegin(), children.cend(),
            [&childNames](const auto& child){
                childNames.push_back(child->name);
            });
        return childNames;
    }
// ...

Lines 15-17 populate the attrs array using the std::for_each algorithm and a capturing lambda, while lines 22-25 do the same for childNames.

With these member functions defined, we can now test them out using a sample main():

int main() {
    try {
        XMLElement xml_doc(std::cin);
        auto children = xml_doc.numberOfChildren();
        std::cout << "Root element has " << children << " children\n";
        for (size_t c = 0; c != children; ++c) {
            std::cout << c << ": " << xml_doc[c].elementName() << ":\nAttributes:\n";
            for (const auto& n : xml_doc[c].getAttributeNames()) {
                std::cout << "  " << n << ": " << xml_doc[c][n] << '\n';
            }
            std::cout << "Children:\n";
            for (const auto& n : xml_doc[c].getChildNames()) {
                std::cout << "  " << n << ": " << xml_doc[c][n] << '\n';
            }
    }
    catch (std::exception& e) {
        std::cerr << e.what() << '\n';
    }
}

Line 4 retrieves the children count from the root element, and then this is used to index the root element starting in line 6. Lines 8-10 use a range-for to loop over the return value of getAttributeNames(), using these values to index into the next-level elements. Lines 12-14 perform a similar loop over the return value from getChildNames(). The output from running this program on the previously used XML data is:

0: Person:
Attributes:
  id: 41
Children:
  Name: Alice
  Age: 25
  Location: New York
1: Person:
Attributes:
  id: 42
Children:
  Name: Bob
  Age: 30
  Location: Los Angeles
2: Person:
Attributes:
  id: 43
Children:
  Name: Charlie
  Age: 35
  Location: Detroit

That’s all for this time, in the next article of this mini-series we’ll look at how to convert the results of our XMLElement queries into in-memory data, and back again.

Leave a comment