The joys and pains of unrestricted unions
One of the less talked about features of C++11 are unrestricted unions. In this post, I’ll explain what they are, how they’ve been useful to me and why we’ll quickly forget about them in the future.
Once upon a time, back when I was working on TosLang, I set out to write an interpreter for the language. Not to go into the fine details of the interpreter implementation, which would fill a blog post by itself, I’ll just say that it was designed as a simple AST-walking interpreter. To give a visual example of what I mean, let’s take the following program:
and turn it into an AST:
Then, to interpretet the program, I just needed to use a simple visiting scheme to associate a value to each node and perform some actions if needed (like anything to do with I/O for example).
As you can see, I needed an abstraction over a value that could be produced by an AST. A value that, following TosLang’s grammar, could be either a integer, a boolean or a string. In other words, I needed what we call in type theory a sum type i.e. a data structure that can hold a value that can be of a fixed number of different types.
Enter the unrestricted union
Fortunately, C++ (in fact, I shoud say C) already offers us a mean by which to express a sum type: the tagged union. To do so, one only has to encapsulate in a class/struct a value (held in a union) and a tag (usually an enum value) indicating what the type of the value is. Below is an concrete (academic) example of a tagged union.
Now, what’s even better, is that, since C++11, the restrictions over what types of variables could be put into an union. In case you didn’t know, in C++03, only types with trivial special member functions (copy constructor copy-assignment operator and destructor), could be put into unions. With C++11 this restriction got lifted and we can now put pretty much whatever types of variables we want into an union except reference types.
Armed with that knowledge, I could easily implement the abstraction over an interpreted value that I needed.
There! Job’s done! Let’s try it out.
One thing I quickly realized was that every special member function of a class gets deleted the moment the moment you use an unrestricted union as a data member. So, basically, my
IVal class looked like this to the compiler:
What we’re left with is a situation were we have to deal with everything (assignment, destruction and initialization of members with non-trivial types) by ourselves.
Dealing with initialization
To initialize a union member with a non-trivial type we need to be able to explicitely construct it at its address in memory. Fortunately, C++ offers us something called placement
new that does just that. To make sure that we can handle any new non-trivial type that might come our way, let’s encapsulate it in the following helper function:
For those not familiar with the placement
new syntax, you have to read it as follow:
- Create a
- Whose type will be
- And whose value will be
Dealing with destruction
Since any value with a non-primitive type will have been initialized through a placement
new, it goes without saying that we will need to
delete them explicitely. To help us doing so, let’s define a helper function.
For those wondering, this bug is the one that causes trouble with Clang.
Dealing with assignment
Going with the hypothesis that everything that needed to be deleted beforehand has indeed been deleted, we can sum up the act of copying an
IVal with the following function:
Putting it all together
All that’s left to do now is to define the special member functions that the compiler deleted by making use of the helper functions we just created.
Voila! Job’s done for real!
(And if you question me about how we could make
IVal movable, let’s just say I’ve left this as an exercise to the reader :) )
I do admit that this can be a bit more trouble than anticipated at first. For those that don’t want to go through the hassle, you’ll be happy to know that
std::variant, which will bring a better sum type abstraction to the language, has been voted in for C++17. Let’s have a peek at what the future holds by rewriting our previous example.
Ain’t that neat.