d-ptr pitfalls

It’s been a week of obscure-bug-hunting. A bug report (PR in FreeBSD words) pointed out that KOrganizer was crashing. Since it works on Linux, it falls to the KDE-FreeBSD team to figure it out. A quick session with lldb shows it’s a nullptr-dereference. Since we had exciting nullptr differences before, my first idea is to look for smart pointers. And they’re there, and can lead the unaware into a bog, a fen, a morass of undefined behavior (but it happens to work on Linux).

Symptoms and Implementations

A “smart” pointer manages a chunk of memory it points to. Examples in Qt include QScopedPointer and QSharedPointer. C++ standard examples are std::shared_ptr and std::unique_ptr.

Smart pointers need a particular implementation: the Qt source code implements the Q-flavored ones (in Qt source code, and there’s only one Qt source code), but the standard ones are implemented by multiple standard libraries. There’s GNU libstdc++ and LLVM libcxx, for instance. There are differences in the implementations.

One important difference lies in the implementation of the destructor of a std::unique_ptr. The LLVM implementation replaces an internal pointer by a nullptr and then calls the destructor of the held object, while GNU calls the destructor of the held object and leaves the internal pointer alone (the std::unique_ptr is being destroyed anyway, so why bother updating the pointer-value).

This becomes visible in some situations where a not-completely-destroyed smart pointer is used: with the GNU implementation it may still hold a valid pointer, with LLVM it holds nullptr. Some will crash, some will not – it doesn’t really matter because to get into this situation you need to be in Undefined Behavior territory anyway and you should be glad that your computer doesn’t catch fire, fall over, and then sink into the swamp.

The visible symptom in a backtrace is an unexpectedly nullptr “smart” pointer.

D-ptring

A d-ptr is a technique for maintaining binary-compatibility when a class’ implementation details change. It’s explained fairly well on the Qt wiki.

When the size of (instances of) a class changes, all of its consumers need to be recompiled. Consider this very simple class that describes the state of a Game. It can print the state of the game, too.

struct Game {
    int score = 0;
    void print() const { std::cout << "Score=" << score << '\n'; }
};

If we now realise we need to keep track of which turn it is in the game, we add a data member – but that changes the size of the instances of the class to accomodate the new data member. Now all the code that relies on class Game needs to be recompiled.

struct Game {
    int score = 0;
    int turn = 0;
    void print() const { std::cout << "Score=" << score << '\n'; }
};

To avoid having to recompile consumers, the d-ptr technique moves the implementation details – those pesky data members – to a private class that consumers don’t know about. The Game class now holds a single smart pointer to that private class. Pointers are always the same size, so the Game class won’t change size any more no matter what we do with the implementation.

In Qt-related code, it is common to have a pointer from the internal, private class instance, to the owning instance. It’s just common practice, and this is called the q-ptr (a “q” is just a “d” pointing in a different direction, get it?).

So we introduce a private inner class, and give the Game class a smart pointer to an instance of that. While we’re at it, give the game a method to win some points. This is the last time we need to recompile our consumers.

This example code puts everything in one declaration of class Game and fits in one file. Typically you would hide the implementation detail in a separate translation unit or a .cpp file. So “last time” is a bit of a lie: in a typical realistic implementation this would be the last time.

class Game {
    struct Private {
        Game* const q;
        int score = 0;
        int turn = 0;
        Private(Game* owner) : q(owner) {}
    };
    std::unique_ptr<Private> const d = std::make_unique<Private>(this);
public:
    void print() const { std::cout << "Score=" << d->score << '\n'; }
    void win(int n) { d->score += n; }
};

Diagram with d- and q-pointers between classes. The d-pointer is smart, unlike q.

Now we can add data members all we like in the Private class, and it won’t affect the size of Game objects and everything is hunky-dory – except that the stage has been set for the bog monster.

Object Lifetime

In C++, objects have a lifetime. Before an object is alive, it cannot be used. As it lies dying, special rules apply, and once it is dead, it cannot be used. Accessing an object outside of its lifetime (there isn’t an object then!) is Undefined Behavior, and I hear the bog monster has quite shocking teeth.

Here’s a pretty innocuous idea: when the game ends (e.g. it is destroyed), we should print the score as well. There’s already a function implementing this for us, and in the interest of not adding functions to the Game class, let’s use the destructor of the internal Private class to do the work: we’ll just add one line,

        ~Private() { q->print(); }

See, when the Game destructor is called, the Private destructor is called, and we just print the score. (Narrator: this was not the case)

Depending on the environment your bog monster lives in, it may be a cute one or a monstrous one. Keep in mind that q is a raw pointer, and it points to the owning Game instance, which is “in the process of being destroyed” (not true, more on that later).

We call the Game::print() method, which dereferences the std::unique_ptr to get at the score variable held by the Private instance. The GNU implementation still has a pointer to memory, score is read, and the print() function seems to work.

The LLVM implementation has already set the pointer to null, so the dereference falls over with a SEGV. It doesn’t really matter, since it’s undefined behavior to call print() at all here.

Lifetime Details

Thinking about the destruction of Game and all of its data members (including Private, which is held through that std::unique_ptr) as “one action” is conceptually convenient until things go wrong. Then the details of object lifetime matter:

The destructor ~Game is called (the destructor starts). At this point the lifetime of the object is over. Dereferencing pointers to this object is undefined behavior from now on. That includes the q pointer held by the Private object. Basically, q is a dangling raw pointer.
During the destructor call it is still permitted to call member functions from the destructor (with some caveats), so calling print() here would be ok.
Destructors for members are called after the destructor of Game has run – so Game is really really dead by now. Only now does the destructor for the member d run, so that’s how we get to the destructor of Private. There is no Game object any more, so using q is undefined behavior (still).

We can apply the same analysis to the destruction of Private even without looking at Game. Suppose a Private object is owned by some specific std::unique_ptr d, and we call d.reset() to destroy the held object.

d.reset() might destroy the held object first, and then change the internal pointer to null, or might do it the other way around, it doesn’t really matter.
The destructor ~Private is called. At this point the lifetime of the object is over. Dereferencing pointers to this particular object is undefined behavior from now on. That includes the internal pointer of d.
During the destructor call it is still permitted to call member functions (and methods of other objects that are still alive and free functions and everything). So q->print() is legal to call, but ..
Dereferencing d in the implementation of print() is either going to dereference an invalid pointer to the Private object whose lifetime has ended, or dereference an invalid null pointer. Either way is Undefined Behavior.

Takeaway

Looking for a pithy rhyming takeaway I can’t get more than

If you Q_Q it, don’t do it.

Which is overly dismissive of Q_Q. Somewhat less pithy:

When d-pointering, the destructor of the Private class must not use the q pointer.

Following that advice avoids both of the scenarios leading to UB sketched in the previous section. It’s easy to overlook, and seems innocuous, until it crawls out of the bog and bites someone (in Kldap, this has been patched).