Building a build system - Part 2 - Compilation
Wrong! Wrong! (And most definitely) WRONG!!
Graph are ubiquitous in computer science: databases, compilers, and, as my last article established, build systems. Hell, if I was a betting man, I’d say that Skynet will probably be born out of an out-of-control TensorFlow (or whatever en vogue machine learning framework at the time) graph that will start to want to link up with EVERYTHING.
My point is: learning about graphs and how to manipulate them is a critical skill to ensure our children lives in a world free of despotic AI overlords. If you want an example of how to do so, read on.
Compiling a C++ application (the right way)
Going back to our problem of compiling C++ files. We just built a dependency graph indicating what we should be compiling. Now, we have to walk through it.
Normally at this point, I should be talking about how recursion is the way to go since our dependency graph is essentially a tree. However, with all the constraints we put on our build system, we can go for a far more straightforward solution. This will save us a lot of trouble since recursion can be tricky and result in awakening nameless fear like the dwarves did in the mines of Moria if we’re not careful enough.
However, remember the shape of our dependency graph. At the top, we have the main node to which we’ll link every library file produced. Then, at the level just below, we have subgraphs that will be used to create the library files.
With such a flat interpretation, a breadth-first traversal is more than adequate to accomplish our task.
Here’s how we’ll attack it. First, some definitions:
Then, a simple sanity check to make sure we don’t perform more work than necessary.
As I said earlier, every subgraph below the root node will be used to generate a library file.
Here’s the fun part. Since we have settled on no further partitionning of these subgraphs, we only need to walk through them to collect their constituant nodes. An effortless way to accomplish that is to ask a node about its children when we visit it and note down these children in a work queue to also visit them later on. For the more visually inclined, this website offers great visualizations of many algorithms including breadth-first graph traversal.
Translating the above description into code might yield something like this for our use case:
Notice two little optimizations that I’ve used in the code above. First, I made sure that we won’t process a node multiple times by using
std::set to reject duplicates insertion in our work queue. Second, I made it so no node would be copied by working on pointers to the actual nodes. I put them here if only to make you think about issues that may plague you when you try to scale a build system.
Now, at a high level, creating the actual library file is actually quite easy. Just use some magical function (read: no way this can be done in a portable fashion, so I’ll refrain from presenting an actual implementation for fear of offending anyone) to produce object files. Afterwards, aggregate those object files into a library file with some more magic.
When you finally go out of that loop, a final magic trick will deliver us the promised executable.
I hear you say: “But what if my subgraphs aren’t flat? How do I handle dependencies in that case?”. I’ll be honest, I’m tempted to leave that as an exercise to the reader. But I’m here to teach and I won’t be caught doing an half-assed job. I’ll fully ass my job.
The trick here is that the dependency graph of a source code repository a directed acyclic graph. Consequently, an ordering of its vertices can be obtained via topological sorting.
Now, it’s up to you to choose where you’ll fit this sorting in the build algorithm. You could either use it on the nodes collected by the breadth-first traversal or, if you happen to have a subgraph abstraction, you could directly pass it to a topological sorting function that would give you back an ordered sequence of nodes. In any case, the result would be a sequence of nodes that could then be fed to the
CompileFile function as shown above
Coming up next: More threads, more fun!