C++ unit testing frameworks revisited

For a long time, I've been using UnitTest++ for all of my C++ unit testing needs. UnitTest++ is a very nice and small unit testing library, which covers all basic needs and is very easy to integrate with a project. The problem is that the development has been on halt since 2008; I've been doing some minor adjustments locally, but in the long run, I would like to use an up-to-date library and get new features from time to time.

So I went out to search for a modern unit-testing framework in C++. Primary requirements were:

  • Mature: It should be used on larger projects and have a stable API. I don't want to have to update all my unit tests anytime soon again.
  • Portable: Linux and Windows is a must, Mac OS X is a nice-to-have, console platforms is bonus. It should be at least reasonably easy to port if necessary.
  • Fast integration: It should require minimal work to integrate. Tests should require at most one macro (something like TEST(Foo).) It should not require to link against a shared DLL and the framework itself should be very easy to integrate. Ideally, I would like to compile it as part of the project (using CMake.)

After some searching, I found two major contenders:

  • Google Test: The framework used by Google (internally), LLVM and Chromium
  • Boost.Test: The Boost testing framework, used by Boost

Boost.Test

Boost.Test is a huge library bundled with Boost. It has to be built using the Boost tools, but as I'm a heavy Boost user anyway, this is no problem. On the fast integration side, it has BOOST_AUTO_* which make it very easy to get started. The test macros are easy to use, and provide check and require test levels (i.e. if a require fails, the test execution stops, while it continues over checks. This is very useful, you can break a test for instance if the result has zero size, and then check all contents of the results with checks.) It also brings collection equality tests, which are handy.

As you would expect from a good test framework, it has also support for fixtures and output customisation. From the portability side, it runs everywhere Boost runs, which covers all platforms mentioned above. Being used to test Boost, it also qualifies as mature, even though I'm not aware of any bigger project outside Boost that uses it.

Google Test

Google Test is the framework used internally by Google and now available for free. Building is trivial, as there is a single-source amalgamation file. Integrating is easy as well, for each test, you have to use a single macro. Test suites are defined along with each test, which means you're forced to group them -- which is not as bad as it sounds, as you usually want to group all tests for a submodule/class together anyway. The test functionality is very similar to Boost; on the one hand, GoogleTest provides death tests which Boost hasn't, but it is missing a few tests that Boost provides (collection equality.) Adding additional tests is straightforward though.

The output customisation is very simple through an event-based interface. Portability is slightly worse than Boost; some features are supported on a one platform only (some only run on Linux.) On the other hand, it has nice platform-dependent tests (for instance, you can check HRESULTs on Windows.) It's used on LLVM, which is a quite large library, as well as on Chromium, so it also qualifies as mature.

Conclusion

It's a really tight race between Google Test and Boost.Test with no clear winner. For small one-off projects, I typically use Boost.Test now, as it's really dead simple to integrate and puts the least burden on the client (i.e. I assume all users of my stuff have Boost installed.) For utmost portability, you should also stick with Boost.Test, as it focuses on standard-conformant C++ and should run almost everywhere.

On the other hand, the simple customisation of Google Test and the nice Google Mock library makes it an excellent candidate for larger libraries. In my case, I've integrated it into the build of my in-house stuff, so it gets built along with the rest of the projects. In this case, I also spent some time to customise the output etc. By providing platform-dependent tests, it also excels for applications which are supposed to run on Linux/Windows/Mac OS X.

It's actually good to see that there are two equally good C++ testing frameworks out there which are on par with the tools you get for Java/C# and other languages. A much more comfortable situation than several years ago, where you basically were forced to write your own testing framework if you were serious about unit testing.

Reporting compiler bugs

Once in a life-time, you run into a compiler bug ... well, that used to be the case, before HLSL/GLSL/OpenCL/CUDA/C++0x whatever started to crop up and require new front and especially code generation back-ends. More often than not, people just accept that there is a bug somewhere and try to work around it, but you should always remember to report it so it can get fixed quickly. In this post, we'll take a look how to efficiently report bugs in compilers, as it usually requires quite some work to make an useful bug report.

Verification

A small note of warning: In languages like OpenCL with little compiler development, it's quite likely you'll see one bug or two every few weeks. In this cases, it's also typically easy to verify that it's an compiler bug (some plain old C doesn't compile any more, messages like "internal phase error" etc.)

However, if you're using C/C++, the chances that it's a problem on your side and not in the compiler are far above 99.9999%, unless you're making heavy use of a new feature (for instance, lambda function composition in C++ or something like this.) Typically, the mature compilers are practically bug-free, so please verify that it's indeed a compiler bug. If you're a programming newcomer reading this, always assume it's your fault and try to find someone senior before reporting a bug in GCC or so (the OS functions typically aren't broken as well.)

Repro case

The first and most important thing of every bug report is the repro case. A good repro case should have:

  • The absolute minimal amount of code to trigger the bug.
  • As few dependencies as possible.
  • Contains only a single (!) problem. If there are two bugs triggered by one statement (for instance, pointer dereference triggers one bug, and assignment a second), then don't group them together.

Make sure that it's really a compiler bug (ideally, you would try with one, better two other compilers.) In OpenCL, I've easily hit a few bugs due to code selection (i.e. back-end.) These bugs just require to rephrase the input code to get fixed, for instance:

float a = 3; if (b) a -= 1;

would trigger an error, while

float a = 3; if (b) a = a - 1;

works. These are the easy bugs, and the bulk of bugs is in this category. The expression tree gets somehow so complicated, that some phase of the compiler starts to choke on it. In this case, all you need is typically a tiny code snippet without any dependencies.

It's getting much more complicated if the problem is bad code generation. Lately, we had a problem in our lab with some multi-threaded, SSE optimised application (written in C++.) At some point, it would simply trash a data structure for no good reason, but only if run in release, and compiled with a particular compiler. That's basically the worst case; in order to check whether it's a programming error (some memory being overwritten, for instance), we ported the application to Linux so we could check with the memory debugging tools there (ideally, this would have been valgrind, but it choked on the SSE4.1 intrisincs, so we had to use efence.) This didn't highlight anything obviously wrong, so we started to disable optimisations until we found the function which caused the problem.

Now it was "just" some looking at assembly, until we found out that the compiler incorrectly assigned registers. However, this was deep in some large function, so it took us another few hours to cut out enough of the code to get a comparatively small repro case. We also made sure to compile and save the generated code, so we could attach it to the bug report; moreover, we found a workaround. In total, we spend half a week to make sure that it's a compiler bug! In this case, we ended up with:

  • Small repro case: A tiny executable, which, when executed, would create an incorrect result.
  • Assembly output of the buggy part as well as manually corrected version.
  • A usable workaround.

If all of this is in your repro case, you're ready to go to the next step, bundling it.

Bundling

So you have a tiny piece of code which clearly triggers a compiler bug; now how to make sure that the guys on the other side can start fixing it right away? Obviously, you need some kind of project, and for some bugs, you might also need some input to trigger it.

If it's only a single file using standard headers (i.e. supplied from the compiler's vendor), it's typically enough to paste the file. It get's more interesting once you use custom code. For languages with a pre-processor, your best bet is to run the preprocessor and send the generated file. The resulting file will be really huge, so make sure you add comment markers to your code.

For bugs which require some particular input data to trigger an error, I've found two methods to be most useful:

  • Embedding all data into the binary itself by processing binary data to a string literal.
  • Add a tiny binary blob reader, which reads a file and creates a memory buffer out of it. The blob reader is plain C, so there isn't a bug hidden in there or complicated code to choke on. It's plain C, as for instance OpenCL is C-based, so it's easier to provide a C only file with the report instead of writing one with C++.

Armed with this, you can also add all data that was passed on -- this is particularly interesting for things like OpenCL, where a compiler bug might not show up on trivial input (i.e. it won't crash), but it will crash on real-world data. Make sure you grab the inputs as late as possible in this case (for OpenCL, I simply write out the contents of all buffers right before the function call.)

In case your code consists of several files, you now need to create a project file. In my experience, simple Makefiles work best (you can compile them with nmake on Windows.) With Makefiles, there are no doubts about what is going on, and they cause far less problems than for instance Visual Studio solutions. As I noted above, prefer C to C++ if possible, as it removes yet another possible error source (for instance, you might have exceptions enabled or something like this.) Zip the files together using plain old ZIP, and cross your fingers!

Closing notes

If you ever run into a compiler or runtime bug, you should really take some time to report it. It'll make your life easier in the long run (you won't have to consistently work around), and it'll help other developers as well. I'm constantly amazed how few people actually report bugs; most of the time, it's just "I'm getting mad with this XYZ crap" instead of trying to improve the situation for everyone.

New blog style

I'm working a new blog style. Please report any problems you encounter!

Blog style until June 2010

Right now, there are some issues with Opera and Chrome, while Firefox 3.6 works correctly. For the best viewing experience, please try Firefox for the time being while I iron out the issues with Chrome/Opera. [Update]I think I've fixed the issues with Chrome.[/Update]

Blog with new style from June 2010

The theme uses CSS3 for all shadows (specifically, text-shadow and box-shadow is used.) This keeps the load time pretty low and still provides some very nice effects. The menu uses the :last-child selector, and thus does not work with Internet Explorers before IE9. Overall, it's a pretty tiny theme, as I finally got round to create it as a child theme instead of rebuilding most from scratch. The parent theme used here is twentyten (Wordpress 3.0 default.)

Disk failure (with some data loss)

My primary disk just failed yesterday evening (around 21:30), so I lost at least all data from this week (last backup is from Sunday, 19:00.) However, I think I didn't miss any unread e-mails, but of course I cannot check as all mails from this week are definitely lost. In case there was something urgent, please send me a mail again.

I ordered a new hard disk which should arrive early next week. I hope to be back up (no pun intended) & running around Thursday again. As soon as the RMA for the faulty disk finishes, I'll plug it in for daily backups. While I'm at this: What's your backup strategy?

Here I have one additional internal HDD which I use for backups once a week. I don't have a "disaster" backup yet at a secondary location; I wonder how well Amazon S3 would be suited for something like this (or SkyDrive.) I'm also planning to install a NAS in my home network and use this for backups, but I haven't found a nice and free backup software yet. Second problem with the NAS is how to backup the NAS, in case of failure (RAID-5 might be a solution, but how to guarantee that I can rebuild the RAID-5 in 4 years, or, worse, in case the controller fails and can't be replaced?) I'm curious to hear what backup strategies you employ, and how you coped with disk failures so far.

Avoid unsigned types by default

For some reason or another, unsigned types in C++ are heavily overused. I know that Java doesn't have them, and many students who learn programming with Java think this is a really serious deficiency which they have to make up by using unsigned types everywhere ... however, it turns out, that you will rarely need unsigned types, and in most cases, they do more harm than good.

First of all, there are some cases where you need unsigned types. Pretty much all of these cases can be grouped into two categories:

  • Interop: Someone somewhere expects an unsigned value, so you have to convert your stuff at some point. You should try to convert it as late as possible, but with some sanity check. For instance, if you call an API which expects an unsigned size, and you have a signed size in your application, convert only after sanity-checking your signed value!
  • I/O: Disk or network I/O often requires the use of unsigned data types, as you typically want to work on the raw bit pattern of the value and not on the actual value. In this case, unsigned types fit and should be used.

Everywhere else, you should really try hard to avoid them. Especially when you design an API, stay away from unsigned types. There's a very good article by Scott Meyers on this topic which explains the problems with unsigned types and interfaces. The main problem is that an unsigned type makes error/sanity checking impossible. If you have a function like memset(void* t, size_t size) and a pod like

struct pod {
    unsigned char flags;
    Vector3 coords;
};

and you pass on something like memset(&pod, sizeof (pod) - sizeof (Vector4)); because you vaguely remembered that pod has a Vector4 and a flags field in it, you're going to clear around 18446744073709551613 bytes worth of data if running on a 64-bit system -- with signed types, a quick assert(size <= 0); would have failed with size == -3.

This happens far more often than you would expect. Since I'm trying to use signed types as much as possible, I've found lots of lots of places where I can sanity-check values using asserts and pre-condition checks than before. Even for typical use cases like I/O routines (fwrite) or memory stuff (malloc), you can get away with one less bit in practically every case -- I don't remember ever having seen a call which would allocate or write more than 231 elements on a 32-bit machine; let alone 263 elements on a 64-bit machine. So, next time before you start typing unsigned type size, think twice if you're really ever going to need that extra bit, or whether you are willing to trade it off for error checking.

Note: This does not imply that you don't have to think about valid value ranges. The only help you get from signed types is that overflows are much harder to exploit if you sanitize most values in between. You should however still use things like SafeInt to make sure your arithmetic stays in range.