Relaxed comment rules

I've signed up for a wordpress account so I can use the "Akismet" anti-spam plugin. From today on, you can register and comment immediately without having me to approve your first comment. I'll try to keep it this way as long as possible.

Compiler "optimisation" ...

2007, the days of handwritten assembler are a thing of the past, right? Well, not really, if you are still using (otherwise excellent) compilers like VC++ 8.0 or even Intel C++ 9.1.

Vector code, anyone?

Around 2000, Intel decided to invent the SSE instruction set. Seven years later, the compiler writers are still not aware of it! The reference code is:

__declspec(dllexport) void mul4 (__m128& left, const __m128& right)
{
    left = _mm_mul_ps (left, right);
}

which transforms into:

mov    ecx, DWORD PTR _right$[ebp]
movaps  xmm0, XMMWORD PTR [eax]
movaps  xmm1, XMMWORD PTR [ecx]
mulps   xmm0, xmm1
movaps  XMMWORD PTR [eax], xmm0

Note that it's using movaps because __m128 is properly aligned by default. In the following examples, I've been passing a float* so a movups call would be needed to load the four floats at once. Let's see if we can write the code in such a way the compiler will automagically transform it? At least, that it can invoke the mulps call? The following tests were done with Visual C++ 8.0 SP1. First try, most straightforward:

left[0] *= right [0];
left[1] *= right [1];
left[2] *= right [2];
left[3] *= right [3];

Ok, slightly modified:

 left [0] = left [0] * right [0]; left [1] =
left [1] * right [1]; left [2] = left [2] * right [2]; left
[3] = left [3] * right [3];

Maybe with loop unrolling?

for (int i = 0; i < 4; ++i) {
    left [i] = left [i] * right [i];
}

The compiler unrolled the loop, but that's all. Next try, other unrolling ...

for (int i = 0; i < 4; ++i) {
    left [i] *= right [i];
}

Argh! Maybe with a temporary?

float r[4];
r [0] = left [0] * right [0];
r [1] = left [1] * right [1];
r [2] = left [2] * right [2];
r [3] = left [3] * right [3];
std::copy (r, r+4, left);
return left;

Hmm, using the __m128 data type? This data type is aligned by default, so maybe the compiler heuristics would see that a movaps would be sufficient in this case.

left.m128_f32 [0] *= right.m128_f32 [0];
left.m128_f32 [1] *= right.m128_f32 [1];
left.m128_f32 [2] *= right.m128_f32 [2];
left.m128_f32 [3] *= right.m128_f32 [3];

Unfortunately, this didn't help either. I've tried a bit with restrict on the input, without results. None of the tries led to optimal code - you always wind up with four mulss code instead of a single mulps call which would do the job. Argh! At least, knowing ASM still seperates the men from the boys :)

Working at warning level 4

On Linux, I develop usually with -pedantic -ansi -strict -Werror -Wall which makes basically any warning an error and turns on virtually all available warnings (unused variables, anyone?). Most of the time the Linux programs were rather small though. Yesterday, I tried the same (i.e. warning level 4) on Windows. Read on for results.

Warning level 4 - the cutting edge

Warning level 4 is above the default level, and it is there for a good reason. You will get a lot of warnings, no matter how good your code is, the good thing is that you usually wind up with 3-4 bogus warnings you can disable and if your code is standards-conforming and well written, that should have been it.

Warnings I disable by default

  • C4251 'identifier' : class 'type' needs to have dll-interface to be used by clients of class 'type2'. This appears for things like boost::noncopyable. As long as your know that your clients will have the interface in question, you can safely ignore that.
  • C4275 non DLL-interface classkey 'identifier' used as base for DLL-interface classkey 'identifier'. Basically the same as above.
  • C4190 'identifier1' has C-linkage specified, but returns UDT 'identifier2' which is incompatible with C. Happens as you you have an external "C" function which returns a C++ class. As long as you know that your clients will use C++, this can be ignored.

Disabled warnings from level 4

  • C4127 conditional expression is constant. Actually I didn't want to disable it, but unfortunately it crops up all the time in boost::lexical_cast. It's not that bad to disable it because mistakes like a = b instead of a == b will be caught by a different warning.
  • C4512 assignment operator could not be generated. Well, that's bad luck, but you'll notice it anyway when you try to assign it, if you don't try, then you probably didn't want it.
  • C4510 default constructor could not be generated. Crops up when you define a struct with a const char* member. Yeah, it cannot be default initialized, but as usual, you'll get an error if you try to.
  • C4610 struct 'indentifier' can never be instantiated - user defined constructor required. Same as above.

With this warnings disabled I've been able to recompile niven without warnings, and I found a bug! In one function, I haven't used a formal parameter which was now highlighted as a warning and was indeed a real bug. So, turning you warning level up to 4 can be really helpful!

Development fun

Yeah, you read right, developing software can be fun :) Especially if you stop walking the old roads and try new stuff.

Template metaprogramming

This can be real fun, seriously, as long as your compiler keeps up with you. I can recommend the MSVC++ 8.0 compiler (although, it generates sometimes very weird code) and an EDG based one, preferably the Intel C++ 9.1 compiler. It really kicks ass, provides meaningful messages in most cases and does proper template checking (not that lazy check when you instantiate which MSVC++ does). I've been playing around with those two compilers and some template metaprogramms lately, and I have to admit once you find a good problem domain then it can really suck you in. Finally, you can generate huge amounts of boilerplate code in basically no time. Even if you wind up writing 100 lines of templates, it is still better than 50 lines of boilerplate - if you ever need to change the boilerplate, you will have to rewrite it from scratch, while you should be able to avoid it with the template route. Moreover, the tougher the problem, the merrier :) As an entry, try to write the "dot" product function for a 16 element vector (and using a for does not count, we want HPC style "all-inline" code), and then tell me which way was more fun :). I've rewritten most of niven's math library using some rather heavy template code, and so far I'm very pleased with the results.

Testing

In each book about software development, you get told that tests are the most important thing in a good software. If you don't believe the books (or you don't read any ;) ), then believe me: Tests are the most important thing in software development. The immediate pay-off is usually close to null, you spend some time writing them, in exchange you catch more bugs earlier. But the real killer argument is that if you ever come to the point where you are about to change some larger piece, the tests become an invaluable source of information. First of all, they catch bugs, but second, you have to think twice about the interface of your new stuff, and this is more important than I thought. Point in case: In niven, matrices used to provide no special means to access their data (and exposed a pointer to it directly), whereas vectors encapsulated them (yeah, I know, bad design, that's why I've rewritten it). The first rewrite though used vector [1] and matrix (row, column) for accessing. After taking a look at the unit tests, I also added support for vector (1) just to be consistent with the matrices. Without having quite a lot of test code in place I'd probably never spotted this.