Memory optimisations

Often underestimated, memory allocations and copying can be a serious performance problem. Especially when using STL containers, which often allocate memory "behind your back", the memory usage can become difficult to predict, resulting in non-obvious performance problems.

There are many reasons for this, at which we'll take a look now. I assume you are writing an application with C++, though most of this applies to C, too. The first problem only applies to C++ and the STL containers. Be careful to always pass by reference. A very simple mean, which can have dramatic effects, especially if you often copy small containers with just a few elements. While it can be tolerable for vectors, for all other containers you surely want to avoid copying at all costs, as it usually results in a lot of tiny allocations.

Tiny allocations are also a major problem. A standard list of integers is probably the worst you can do, as it will fragment the memory extremely. If your application becomes multi-threaded, this problem is multiplied, as the system allocators are often protected by a global lock, which can easily serialise all your threads. Even if you append to a container, you can fragment the memory. For example, a vector will double (or so, the Microsoft STL multiplies by 1.5) its memory each time it runs out of it. On each reallocation, the following happens:

  • A new memory block, large enough to store the data, is created.
  • The old data is copy-constructed into the new memory, and then destroyed (until we get move construction with C++0x).
  • The remaining data is initialised by running the default constructor.
  • The old block is deallocated.

The order of this operation may be different, but it should be obvious that his cannot be cheap. If possible, try to reserve the required amount of memory up front, instead of relying on automatic resizing. In most cases, you can either use a range-based insert (which in theory should call reserve first), or manually call reserve/resize and insert the elements.

Beware of non-efficient implementations of STL containers. I'm very sad that something like the EASTL is not coming anytime soon (if you are interested to work on a free implementation, contact me). In the meantime, different STL vendors have sometimes extremely inefficient implementations. For example, Microsoft has added range checks to the index-based operator [] in their STL (unless you compile all your code with _SECURE_SCL disabled, but there is hope Microsoft will disable it for VC10), which make the access approximately half the speed of direct pointer access.

Containers reloaded: Use vectors if possible. Vectors store their memory in a contiguous array, and have thus the best cache performance. The Loki library contains a convenient vector-based map implementation, which will be much faster for small maps than the standard map (those use often a red-black tree, and jump around the memory like hell). It's no use to have a list if you have only 5 elements, as you can be pretty sure that the vector will outperform it in every way.

So much for now, I'm pretty sure I forgot half the stuff I wanted to blog about, so stay tuned for a part two (or simply use the comment function).

Indexing PDFs for Windows Search on Vista x64

Ever wondered why your x64 Vista sucks at indexing PDF files, even when you installed the Adobe Reader? The reason is simple: The reader has only a 32bit filter for the Windows Search, which does not work on Vista x64, so the desktop search cannot look inside the PDF files. The fix is equally simple, go and get the x64 PDF IFilter, which allows you to finally index all those PDFs.

Boost 1.38 incoming

Pretty soon, Boost 1.38 is going to be released. One very annoying problem has been fixed: Boost.DateTime no longer includes <windows.h> by default:

On Windows platform made inclusion of windows.h optional. The header is only used when the BOOST_USE_WINDOWS_H macro is defined. Otherwise (and by default), the library uses internal definitions of symbols from this header.

This is very nice, because the thread library also uses Boost.DateTime, which forced me to write simple wrappers and use PIMPL to avoid putting <windows.h> into my headers. I try to avoid including <windows.h> in public headers at all costs. More so as I recently had problems because it adds lots of #defines like CreateFile, which is not funny if you have a function with the same name (see also this older post about <windows.h> for tips how to handle this problem).

Building Boost worked with my old build scripts just fine, without further changes. I also gave the experimental CMake build a try, which worked fine right out of the box, and I'm really looking forward to Boost using CMake exclusively (especially as I'm sure that this will improve CMake as well). I didn't try anything fancy, just get some basic libraries built and linked, and this at least worked right out of the box.

The final version should be due in the next weeks.

Debug war story

Time for a war story: I've been just debugging a rather interesting bug. The application is written in C++. Symptoms:

  • Starting the release build by double clicking fails
  • Starting the release build from the command line works
  • Appending a debugger to the release build after the failure fails
  • Debug runs fine

The failure was an exception somewhere in the system, which was never caught and terminated the application. I first tried to find out where it happened, and it turned out that a 3rd party library call failed. I was passing a string, which I read from a file, and the library refused to process it. First check, I was reading the right file, and the file itself looked fine. One pass with the debugger also verified the file was ok, and the passed string was valid. That is, in the debug window, I was passing exactly the right string to the function.

Still, the library couldn't process it when double clicking. Fortunately, it provided an interface to retrieve a human-readable error message, in which it complained about some invalid characters at the end of the file -- in my case an additional ',' which was not expected. I added error handling to each call with a descriptive message, so I could immediately see which one was failing. As a side effect I got pretty good error handling in this function, which will hopefully save me time in the future. As the additional ',' seemed only to appear when the file was double-clicked in the explorer, I had to add a wait call at the end of the application to read the error message, otherwise it would terminate immediately.

Ok, so we narrowed down the problem:

  • The file is ok
  • Reading from a file is ok
  • The string that comes to the function is wrong

So the problem in release must happen after reading from the file, but before passing the string to the function. Another quick check showed that when double clicking, the string indeed contained invalid characters at the end. I went back into debug mode, and verified that the reading was not an issue, but I was passing the data as is to the string constructor. The constructor in turn tried to determine the length of it using strlen, and here is what happened:

  • In debug, the memory after the data was initialised to 0
  • In release, starting it from the explorer filled the memory with random data -- and while some characters were ignored, a ',' was not.

And here we go, a simple fix to initialize the string with just the read data, and the mysterious bug was fixed. In cases like this, make sure you check each step, and don't scratch your head. Having seen the symptoms, I was thinking it's going to be some really nasty environment or what related problem, or maybe some problem with an object file which didn't get compiled or something like that, but not an actually quite simple problem.

Results:

  • Now, I'm checking an additional invariant (that the length of the string created by reading a whole file to a string is equal to the size of the file)
  • The error handling is much better
  • I've added an unit test to verify that the range-based construction works

All in all, it was really worth to fix it, as a large chunk of code improved during the post-mortem analysis (in which I added the assertions, which I didn't need while debugging). This is something which you shouldn't forget, each time you fix a more complex bug, take the time to do a proper analysis, and if possible, take a look at similar places. Often enough, you can add a few more assertions. Most important, if you get down to debugging, assume nothing, and check each step with special care, even those where you wouldn't expect a bug.