Latest posts

Invest in testing

The title line says it all. Invest time in testing, the earlier, the better. Recently, I’ve been (re-)implementing some fairly simple class here, with a few unit tests. I tend to test at least the basic functionality with unit tests (i.e. testing good code paths), so I never commit really broken code. I was pretty sure that I did a good job on the class, after all, it was much simpler than the previous one.

Today, I started adding tests for the various corner cases and error reports. Most of my error checking is in the interfaces, so I can easily write a mockup implementation, test it and be sure all derived classes have exactly the same error checking. This one didn’t reveal anything new, except for one case, in which the class was not setting a state properly. No big issue, as this was just an error state, and the next access would fail anyway, but nevertheless, I wanted to get rid of it, and it turned out that I could simplify the code a bit. Not much, 5 LoC and a member variable less — but if you do this for many classes, the savings accumulate.

5 LoC here, 10 there, repeated over 20 classes, and at the end you might end up with a 10 KiB smaller binary. Remember, the fastest and most correct code is the one that isn’t there, so I consider this a real win. It’s quite interestingly to see how clean you can get stuff. After a few iterations, my classes are much better designed, more robust, require less code to use and have less code to maintain themselves. This sounds like the initial implementation is usually bad; but I tend to disagree — it’s only after having done it that you often see how to do it better, and at the beginning, there is no test net to safely try stuff.

However, it requires really a two-part approach:

  • Early unit test start. If you add basic unit tests too late, refactoring becomes too expensive. The tests should cover the working cases.
  • A code refactoring step, which should be done either after a break from this particular code or by another person. In this step, tests should be created to test error and corner cases.

I tend to write code & unit tests concurrently, and after the class is implemented, I document the most important functions. A week later or so, I go back and add unit tests for failure cases, trying to break the class. During this time, I almost always find parts to refactor, either because I see that I repeat code while writing the unit tests, or because I hit bugs. Funnily enough, it really takes a break before I get a feeling for where a class or function may break. This is probably something which can be done better in pair programming, but unfortunately, I didn’t have a chance to try that yet.

So much for today, happy coding!

Posted December 29th, 2008 | no comments; | Filed Under: Programming | read on »

Post schedule, SSE, other stuff

Well, I know, I’ve been not posting for some weeks now, and I’m getting complaints now about this, so here we go :)

Posting schedule

As you can easily see, my posting schedule is no longer really regular. I tried for some time to keep on at least one post per week, the problem is, I don’t want to post about work in progress stuff. So as long as I’m working on something, I usually don’t write about it — just after I finish. For the other kinds of posts, I do them on a case-by-case basis, usually based on user request. So if you want to read about a subject, drop me a line per mail or just comment somewhere, and I’ll take a look.

Current work

That said, currently I’m working on some image processing tools. Not finished yet, so no further details there. What might be more interesting is that I’ve been using SSE2 for a part of this, and so far I’m very pleased with the results. Especially for image processing, SSE2 is a perfect fit, although it does not support 16 bit floats. For 8 bit images however, you can usually process 2 pixels à 4 channels at once. Why 8 elements only? Because many SSE2 operations on integers result in 16 bit integers, so you need pack/unpack, and given that you have only 8 registers, you can’t hope for having space for more than 2 pixels.

Here I also gotta say that I’m a happy user of compiler intrinsics. While some people avoid them like the plague, I observed pretty good code generation so far, even if I intentionally used more registers than available to the compiler had to decide where to spill them. Moreover, as the GCC and Intel C++ understand all the same intrisincs, I immediately get portability across x86 and x64 on Windows, Linux and Mac OS X, which is well worth the extra typing.

One word of warning, optimising with SSE2 takes a lot of time. First, you have to write a C baseline version, which must be maintained and tested as a fallback for CPUs which don’t have SSE2. After having a correct and verified C version, you can start optimising the C version until you get the feeling the compiler should be able to make great SSE2 out of it. And then you gotta do the SSE2 translation, always looking at the compiler output to make sure it didn’t rearrange stuff and such (with the VC 2008 SP1, I didn’t have a problem so far — didn’t try with the 2010 CTP yet due to a lack of time). Do it if your profiling tells you so, but don’t do it just for fun.

Compiler optimisations, abstraction penalty

Some more notes on writing fast code. First, don’t assume a compiler will unroll all those loops over 3-4 elements, some don’t, and sometimes this results in slightly slower code. Beware of passing around standard containers, sometimes the compiler cannot construct the container right at the target site, giving you a copy (which is more expensive than you think, as it requires at least one additional allocation, a memcpy, which can be not that fast, and a deallocation).

Absolutely avoid indirection. I couldn’t believe it, but even reducing one level of indirection can give you 5-10% of speed. In my case, I was replacing a call via a virtual function by storing the function pointer explicitly. For the virtual function call, the call chain is lookup the object, offset into the vtable, call the target entry. The more direct call is: Jump to the memory address stored here. I was pretty surprised actually, as I assumed that both pointers would be placed in the L1 cache and hence the access cost should be minimal.

Blog theme

Different subject, for some time now, I’m planning to give my blog a total overhaul. If you know a good page with blank templates for Wordpress, I’d be happy to know about it, as I’m looking for a very clean template to start with.

Posted December 19th, 2008 | 1 comment | Filed Under: General, Programming | read on »