Posts The joys and pains of unrestricted unions
Post
Cancel

The joys and pains of unrestricted unions

One of the less talked about features of C++11 are unrestricted unions. In this post, I’ll explain what they are, how they’ve been useful to me and why we’ll quickly forget about them in the future.

Story time

Once upon a time, back when I was working on TosLang, I set out to write an interpreter for the language. Not to go into the fine details of the interpreter implementation, which would fill a blog post by itself, I’ll just say that it was designed as a simple AST-walking interpreter. To give a visual example of what I mean, let’s take the following program:

1
MyInt = MyOtherInt + 2;

and turn it into an AST:

Desktop View

Then, to interpretet the program, I just needed to use a simple visiting scheme to associate a value to each node and perform some actions if needed (like anything to do with I/O for example).

Desktop View

As you can see, I needed an abstraction over a value that could be produced by an AST. A value that, following TosLang’s grammar, could be either a integer, a boolean or a string. In other words, I needed what we call in type theory a sum type i.e. a data structure that can hold a value that can be of a fixed number of different types.

Enter the unrestricted union

Fortunately, C++ (in fact, I shoud say C) already offers us a mean by which to express a sum type: the tagged union. To do so, one only has to encapsulate in a class/struct a value (held in a union) and a tag (usually an enum value) indicating what the type of the value is. Below is an concrete (academic) example of a tagged union.

1
2
3
4
5
6
7
8
9
10
11
struct TaggedUnion
{
  enum Type { BOOL, INT };

  Type mType;
  union
  {
    bool bVal;
    int iVal;
  };
};

Now, what’s even better, is that, since C++11, the restrictions over what types of variables could be put into an union. In case you didn’t know, in C++03, only types with trivial special member functions (copy constructor copy-assignment operator and destructor), could be put into unions. With C++11 this restriction got lifted and we can now put pretty much whatever types of variables we want into an union except reference types.

Armed with that knowledge, I could easily implement the abstraction over an interpreted value that I needed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class IVal
{
public:
  // Possible value types of an AST node.
  enum class ValueType { BOOLEAN, INTEGER, STRING, VOID, UNKNOWN };

public:  // Constructors
  IVal() : mType{ ValueType::UNKNOWN } { }
  explicit IVal(bool val)
    : mType{ ValueType::BOOLEAN }, boolVal{ val } { }
  explicit IVal(int val)
    : mType{ ValueType::INTEGER }, intVal{ val } { }
  explicit IVal(const std::string& val)
    : mType{ ValueType::STRING }, strVal{ val } { }

public:  // Value accessors
  bool GetBoolVal() const
  {
    assert(mType == ValueType::BOOLEAN);
    return boolVal;
  }
  int GetIntVal() const
  {
    assert(mType == ValueType::INTEGER);
    return intVal;
  }
  const std::string& GetStrVal() const
  {
    assert(mType == ValueType::STRING);
    return strVal;
  }

private:
  ValueType mType; // Type of the IVal
  union            // Real value of the IVal
  {
    bool boolVal;
    int intVal;
    std::string strVal;
  };
};

There! Job’s done! Let’s try it out.

1
2
3
4
5
6
7
int main()
{
    std::string hello{ "Hello" };

    IVal val1{ hello };
    IVal val2 = val1;
}

Image: God dammit!

God dammit!

Delete everything

One thing I quickly realized was that every special member function of a class gets deleted the moment the moment you use an unrestricted union as a data member. So, basically, my IVal class looked like this to the compiler:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class IVal
{
public:
  /* ... */

public:
  ~IVal() = delete;
  IVal(const IVal& val) = delete;
  IVal& operator=(const IVal& val) = delete;
  IVal(IVal&& val) = delete;
  IVal& operator=(IVal&& val) = delete;

private:
  /* ... */
};

What we’re left with is a situation were we have to deal with everything (assignment, destruction and initialization of members with non-trivial types) by ourselves.

Dealing with initialization

To initialize a union member with a non-trivial type we need to be able to explicitely construct it at its address in memory. Fortunately, C++ offers us something called placement new that does just that. To make sure that we can handle any new non-trivial type that might come our way, let’s encapsulate it in the following helper function:

1
2
3
4
5
template <typename T>
void Init(T& member, const T& val)
{
  new (&member) std::string(val);
}

For those not familiar with the placement new syntax, you have to read it as follow:

  • Create a new variable
  • At member’s address
  • Whose type will be std::string
  • And whose value will be val

Dealing with destruction

Since any value with a non-primitive type will have been initialized through a placement new, it goes without saying that we will need to delete them explicitely. To help us doing so, let’s define a helper function.

1
2
3
4
5
6
7
8
9
10
void IVal::Destroy()
{
  // We only have to handle the string case in this example
  if (mType == ValueType::STRING)
  {
    // Using statement is necessary to compile with Clang
    using std::string;
    strVal.~string();
  }
}

For those wondering, this bug is the one that causes trouble with Clang.

Dealing with assignment

Going with the hypothesis that everything that needed to be deleted beforehand has indeed been deleted, we can sum up the act of copying an IVal with the following function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void IVal::Copy(const IVal& val)
{
  switch (val.mType)
  {
  case ValueType::BOOLEAN:
    boolVal = val.boolVal;
    break;
  case ValueType::INTEGER:
    intVal = val.intVal;
    break;
  case ValueType::STRING:
    Init(strVal, val.strVal);
    break;
  case ValueType::VOID:
    break;
  default:
    assert(false);  // Should never happen
  }

  mType = val.mType;
}

Putting it all together

All that’s left to do now is to define the special member functions that the compiler deleted by making use of the helper functions we just created.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class IVal
{
public:
  /* ... */
  explicit IVal(const std::string& val)
    : mType{ ValueType::STRING }
  {
    Init(strVal, val.strVal);
  }

public:
  ~IVal() { Destroy(); }
  IVal(const IVal& val) { Copy(val); }
  IVal& operator=(const IVal& val)
  {
    // Nothing to do in case of self-assignment
    if (&val != this)
    {
      // Goodbye old value...
      Destroy();
      // ...and hello new value!
      Copy(val);
    }

    return *this;
  }

private:
  void AssignFrom(const InterpretedValue& val);
  void Destroy();
  template <typename T> Init(T& member, const T& val);

private:
  /* ... */
};

Voila! Job’s done for real!

(And if you question me about how we could make IVal movable, let’s just say I’ve left this as an exercise to the reader :) )

The future

I do admit that this can be a bit more trouble than anticipated at first. For those that don’t want to go through the hassle, you’ll be happy to know that std::variant, which will bring a better sum type abstraction to the language, has been voted in for C++17. Let’s have a peek at what the future holds by rewriting our previous example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <iostream>
#include <string>
#include <utility>

using IVal = std::variant<bool, int, std::string>;

int main()
{
  std::string hello{ "Hello" };

  IVal val1{ hello };
  IVal val2 = val1;

  std::cout << std::get<std::string>(val2) << "\n";
}

Ain’t that neat.

This post is licensed under CC BY 4.0 by the author.