Debug war story
Time for a war story: I’ve been just debugging a rather interesting bug. The application is written in C++. Symptoms:
- Starting the release build by double clicking fails
- Starting the release build from the command line works
- Appending a debugger to the release build after the failure fails
- Debug runs fine
The failure was an exception somewhere in the system, which was never caught and terminated the application. I first tried to find out where it happened, and it turned out that a 3rd party library call failed. I was passing a string, which I read from a file, and the library refused to process it. First check, I was reading the right file, and the file itself looked fine. One pass with the debugger also verified the file was ok, and the passed string was valid. That is, in the debug window, I was passing exactly the right string to the function.
Still, the library couldn’t process it when double clicking. Fortunately, it provided an interface to retrieve a human-readable error message, in which it complained about some invalid characters at the end of the file – in my case an additional ‘,’ which was not expected. I added error handling to each call with a descriptive message, so I could immediately see which one was failing. As a side effect I got pretty good error handling in this function, which will hopefully save me time in the future. As the additional ‘,’ seemed only to appear when the file was double-clicked in the explorer, I had to add a wait call at the end of the application to read the error message, otherwise it would terminate immediately.
Ok, so we narrowed down the problem:
- The file is ok
- Reading from a file is ok
- The string that comes to the function is wrong
So the problem in release must happen after reading from the file, but before passing the string to the function. Another quick check showed that when double clicking, the string indeed contained invalid characters at the end. I went back into debug mode, and verified that the reading was not an issue, but I was passing the data as is to the string constructor. The constructor in turn tried to determine the length of it using strlen, and here is what happened:
- In debug, the memory after the data was initialised to 0
- In release, starting it from the explorer filled the memory with random data – and while some characters were ignored, a ‘,’ was not.
And here we go, a simple fix to initialize the string with just the read data, and the mysterious bug was fixed. In cases like this, make sure you check each step, and don’t scratch your head. Having seen the symptoms, I was thinking it’s going to be some really nasty environment or what related problem, or maybe some problem with an object file which didn’t get compiled or something like that, but not an actually quite simple problem.
Results:
- Now, I’m checking an additional invariant (that the length of the string created by reading a whole file to a string is equal to the size of the file)
- The error handling is much better
- I’ve added an unit test to verify that the range-based construction works
All in all, it was really worth to fix it, as a large chunk of code improved during the post-mortem analysis (in which I added the assertions, which I didn’t need while debugging). This is something which you shouldn’t forget, each time you fix a more complex bug, take the time to do a proper analysis, and if possible, take a look at similar places. Often enough, you can add a few more assertions. Most important, if you get down to debugging, assume nothing, and check each step with special care, even those where you wouldn’t expect a bug.