On reviewing code

All the code for this article can be found here.

Foreword

As with the Refactoring 101 article, this article is an adaptation of materials I made for a class that I once contributed to as a teaching assistant. As such, the targeted audiences are students and junior developers who don’t have much experience reviewing code. Still, I hope people of any level of experience might find something useful in my ramblings. Enjoy!

Motivation

Code reviews are some of the most important tools that software development teams have at their disposal to create high value products that won’t blow up in their customers’ faces. This beg the question: what’s the secret behind a good code review?

Answer

¯\_(ツ)_/¯

Joking aside, beyond some cursory validation of the project’s coding standards (which should be automated as much as possible anyway), the way a code review is conducted will tend to vary depending on who performs it. Some people might stick on things that will be barely a blip on someone else’s radar. Accordingly, instead on pontificating over some subjective rules of Good Code Reviewing™, I’ll be walking you through an example of how I’d review some real-world code.

Case Study

As with the refactoring article, I’ll use the Abuse source code as the basis for this article. More precisely, I’ll review the Abuse/src/imlib/linked.h and Abuse/src/imlib/linked.cpp files.

To reiterate what I’ve said in the refactoring article, I’ve chosen this codebase because it’s a prime example of legacy code: it’s old, unmaintained and untested (which isn’t surprising considering it’s the source code for a game released in the mid 90s). Hopefully, after having dabbled in such a codebase, students should be rid of their fears of having to dive deep in a new codebases in their future interships/jobs!

Laying Some Ground Rules

A code review is a contract. It involves two parties which both have duties they must attend to if they want the review to be successful.

On the reviewee side, you have to:

Be open to constructuve criticism (it’s the point the whole exercise!).
Give enough context to the reviewer. They should have a clear understanding of why the code to be reviewed has been written.
- For example, clearly indicate if the code is meant to push to production or merely a prototype to generate discussions.
Don’t forget that a code review is a learning process for both parties!

On the reviewer side, you have to:

Give constructive criticisms using respectful language. Shit code doesn’t mean the reviewee is a shitty person and deserve to be treated like shit.
Never hesitate to ask for precisions. Both reviewer and reviewee need to understand the problem for which the code has been developed.
Be generous with kudos for a job well done! It goes a long way for building positive relationships with your colleagues.
Don’t forget that a code review is a learning process for both parties!

Understanding The Intent

Personally, I think that a good code review always starts by asking some existential questions: Why was this code written? What are the design principles behind this code? How much is it going to cost to maintain it? (In the best case scenario: a valid reason, solid principles and next to nothing.)

Already with the linked_node and linked_list classes, things are starting to get muddy. At first glance, it seems that the intent was to create an intrusive list. What give off this impression is the decision to expose linked_node in the public API instead of hiding it away inside linked_list as an implementation detail. Further reinforcing this idea is the fact that linked_list doesn’t own any linked_node, it merely points to a head node. However, one detail doesn’t fit in this evaluation of the purpose of linked_list: it’s responsible for freeing linked_nodes’ memory. Normally, you’d expect an intrusive list to never touch its constituents’ memory (even with a ten feet pole). What gives?

Given this situation, here’s what I’d ask the author:

Why have a “semi-intrusive” list?
If having an intrusive list is really the goal, have you considered using tried and tested solutions already available? For example:
- Boost has an intrusive list.
- EASTL also offers an intrusive list.
- There’s already a similar class in the Abuse source code: isllist. Could this class be used to solve the problems which gave birth to linked_list?

Evidently, there’re a number of valid answers that the reviewee could give me. To wit:

A given library’s license could be incompatible with your employer’s business model.
(Statically) linking a third party library might unreasonnably balloon up an executable’s size.
The performance (on some chosen criteria) of external alternatives aren’t up to your or your employer’s the standards.

Detecting Potential Problems

One important thing to remember that we’re becoming partly responsible for some code the moment we start reviewing it. Yes, somebody else wrote it. But be that as it may, we still gave it our seal of approval. If the code ends up causing massive damage to the clients, both the author and the reviewer will end up on the chopping block!

Robustness

Where do I even begin?

The thing that jumps at me like an alien facehugger when I look at the linked.* files is that they are minefields of potential segfaults. For example:

linked_list::add_end: Give it a nullptr and BOOM!
linked_list::prev: Call it on an empty list and BOOM!
linked_list::unlink: Lines 64 and 65 give me nightmares. Their only saving grace is that they make excellent canaries for this coal mine!

Given the situation, I would advise the code’s author to use defensive programming as much as possible. This means:

Validate ALL the inputs. It’s better to be safe than sorry.
NEVER trust a pointer. This will bite you hard in the ass. Always validate them before use.

Coupling

A thing that saddens me about the contents of linked.h is that they force an unhealty coupling upon the whole codebase. This manifests itself in two ways.

First, there’s static coupling. For those unaware, the inclusion mechanism in C++ is basically glorified copy-paste. Consequently, all inclusions in a header file will transitively be included in all files that include said header file. More often than not, not doing any effort to keep headers to the bare essentials will result in humongous headers that’ll slow down compilation to a crawl. A small change will ripple through the inclusion network and end up as a tsunami of files to compile anew. Here’s a rule of thumb: before including a file, ask yourself: Do I really need it? If there’s nothing of use in a given file, don’t include it. If you can prevent an inclusion by using a forward declaration, go for it! In our case, I would point out to the reviewee that linked.h includes stdio.h, stdlib.h and string.h but uses nothing from them. If there’s a need for any of them (and I’d ask to make sure), it would be in linked.cpp which is were the includes should be.

Next, there’s also dynamic coupling. In this particular instance, it takes the form of the inheritance needed to define a node in a linked_list. Nothing can enter a linked_list if it doesn’t derive from linked_node first. This imposes a burden on client code who has to (re)implement linked_node’s interface when they want to make use of linked_list. While this interface is minimal at the moment and the burden is thus light, this could change in the future. Even worse, inheritance is used here as a mechanism for code reuse; linked_node isn’t a pure interface but a class on its own. This means that any modification must be done with the utmost care. Otherwise, it’d be easy to introduce a bug not only in it but also in (part of) its derived classes. Let me be clear, I don’t mean to say that inheritance is (the base class of) evil. It has its uses and here is clearly not one of them. Instead, I’d suggest the author to decouple linked_node from its children by favoring composition over inheritance. By having linked_node be generic over its contents we could remove the dynamic coupling and its problems altogether! No more forcing an interface on a class whose instances we want to put in a linked_list! No more subtle bugs caused by a modification to linked_node’s implementation!

(Im)Mutability

One of the things that I was (in)famous for among my students was that I am very conscientious about what ought to be mutable (or not) in my programs. The less mutability there is a program the better! Why?

This can help future developers understand my code. One of the hardest things when reading someone else’s code is to follow the data’s tranformations. The more moving parts (i.e. mutable data) there are, the harder it is to follow the flow of the program.
This can help reduce the incidence of code 18. Who hasn’t wasted hours trying to find the cause of a bug only to realize that he had assigned an erroneous value to a variable that should have been marked const in the first place? And if it hasn’t happened to you (yet) please use const as much as possible to make sure you don’t fall into this trap!
This can help the compiler generate more efficient machine code by making its job of propagating and reducing constants easier.

When it comes to the case at hand, the first comment I’d make would be: did the author think about what methods will modify this and which won’t? One of the important thing when defining an API is to set the correct expectations. Correct constness specifications will help clarify what a client can do with our class and in which circumstances some functionnalities are available.

Then, I’d also point out that while implementing linked_node’s and linked_list’s methods it’s important to add const wherever possible if only to clearly indicate to future developers what can be done with the pointers being used. Can the pointee be modified? Can the pointer be modified? Again, these are important questions to ask ourselves to make sure that any developers that’ll use or modify our code later on clearly understand what our intentions were.

Best Practices

Contrary to what some might believe, programming languages (including C++) do evolve over time. It is therefore important for a practicing software developer to keep up with the times. New versions of a language change what is considered good and bad practices. Moreover, taking advantages of new features can help you make the code you write more robust and efficient. In the linked.* files, here’s what could be done to to bring to code to the 21st century:

With the advent of nullptr, NULL as fallen out of favor. In fact, except to interact with APIs that specifically require the use of NULL, it shouldn’t be used anymore.
While on the subject of pointers, I’d strongly argue to use smart pointers to signify ownership of data on the heap. Manual memory management is rife with dangers (memory leaks and double deletes for instance) so please avoid it as much as possible.
The last one is somewhat controversial: the auto keyword could be used to make the code more concise. I won’t advocate for it strongly and rather advise you to follow the coding standards of the project you’re working on. Anything more than that might result in brutal flame wars.

Testing

When I think about it, many of the malpractices that I pointed out so far could have been avoided if (unit) tests were written at the same time as these classes. With tests, many bugs (or crashes!) could have been found by the reviewee and not the reviewer. My message to the author here would be to follow the Beyoncé principle:

Conclusion

To reiterate: what I presented here is only my take on a small part of the Abuse codebase. It’s higly likely that someone else would’ve made different comments and recommandations. What’s important isn’t really how one does (or ask for) a code review. It’s rather that a code review is a fundamentally collaborative process with the shared goal of producing high-value software. So help each other make better software!

Image: These dudes get it — *These dudes get it*