Useful HTML reports

While doing research, you sometimes run into the situation where you are getting loads and loads of results, and you need to create nice reports to simplify the analysis. Often this is tabular data, sometimes you might be able to create some images, but the key point is you need a simple way to bundle this all together. As the topic title suggest, HTML fits the bill very nicely. It's reasonably easy to generate (after all, it's text only), you can embed additional data (images), and you can also use JavaScript to improve the usability of the report.

For HTML generation, I started to use basic C++ iostreams, but this quickly becomes unmaintainable, and you will run into debug problems. If your markup is wrong, it's very complicated to see in the C++ code where the error source is. A much nicer solution is Google ctemplate, a C++ HTML template engine in the spirit of Smarty. Basically, you can author the template using any HTML editor, add some markup for repeating sections, then you fill out some values on the C++ side, and your done. For debugging, you simply strip the template tags, and validate the code with any HTML validator. Two things are especially nice about ctemplate. First of all, repeating sections is trivial. If you add a dictionary for a section twice, it will be repeated twice in the output, no need to set up loop constructs on your own. Seconds, it's fast -- generating a 50 MiB HTML file takes ~ 2 seconds with it. For viewing such large files, I can recommend to use a WebKit based browsers, as at least Firefox 3.0 has some problems when closing such a file.

To improve the report usability, you can use make them interactive using a JavaScript. I can recommend the very nice JQuery framework -- in fact, it worked so fine, that I didn't bother to try another one ;). JQuery allows to add hide/add hyperlinks very easily, which is useful if your report contains detail data that you don't want to display all the time. All you need is to give your items an id, and then toggle the visibility of the item using $("#yourId").toggle(). If you have tabular data, JQuery has a real killer plugin: TableSorter. This allows you to sort tables interactively, without any additional setup code. All you have to do is to use proper headers for your table, and you're ready to sort.

I have to admit that I still use plain-text only reporting quite a bit, but it's usually much faster for analysis to have a nice report with some comfort (hyperlinks to additional data, embedded images). With the ctemplate stuff, the cost of adding HTML output over plain-text is rather minimal, so the next time you run into a situation where you'll need some output, you should really consider to give HTML a try.

Google diff, match and patch ported to C#

The title says it all, I've ported the Google diff-match-patch library to C#. It's a nice diff library, and supports Java, JavaScript, Python, C++ (with Qt) and C# with exactly the same API.

Porting wise, everything went fine due to the high test coverage (IIRC, something around 98% of the code is covered by unit tests), so I could quickly check if everything works as expected. For unit testing on C#, I initially used MSTest, but switched to NUnit for portability. The code works fine with Mono 2.4 and .NET 3.5, possibly also with older version. If you change the LINQ extension calls like First () to indexed based access, you'll probably get it working on .NET 2.0 as well. So, have fun with this nice library!

Thanks to Neil Frasier for adding this into the project codebase and for further maintenance.

Some notes on strings, data structures and data bases

A short war story on (not) using strings. In my application, I had objects which use a texture, and the texture was given as a string. I needed to index all objects by used texture, so I started by replacing the texture with an integer identifier in the objects, and provide a map from texture to id. This improves the performance, as grouping by texture is no longer a string compare, but just an integer compare (longer explanation: As I create indices in order, the identifier is actually an index directly into the texture array - you can't get faster than that.)

The problem with this approach became obvious when I had to serialize the objects in a later step. I needed another map from integer id back to the string, so I could simply write the string out to disk. Which in turn required maintaining two mappings, and passing on one of those mappings down to the serialization function, which is worse than it sounds, because now the serialization is tightly coupled to the map.

Eventually, I dropped all this stuff, and stored the strings directly on both sides, and changed everything to use strings (all maps, all sets, every intermediate structure would use strings again.) Guess what? The performance difference was not measurable.

What can we learn from this experience? Strings are not inherently evil, and unless there is a clearly better solution, there is no reason to avoid them. Having strings has also the nice advantage that it is much easier to debug strings than looking at integers, especially if you dump a data structure to disk. If speed is really a concern for you, I'd suggest you to look at the excellent string-based containers in LLVM. They have string based sets and maps, which are likely to perform good enough for most use cases. If you use C++ TR1, you can also try the hash-based containers. Of course this is no excuse to use strings in tight inner loops where an integer identifier would work as well, but this seems an unlikely use-case anyway (plus your profiler will probably show this usage rather quickly).

Databases to the rescue?

In retrospective, it might be even worth to dump the indexed data structures in my application completely in favour of a memory-based embedded database, like sqlite. This solves the persistence problem nicely - every database comes with a way to store the data to disk. Plus, it allows higher-level manipulation of the data itself. I guess this is something I'll have to try: Basically, instead of loading a data structure from disk and transform it into a runtime representation using maps, sets and vectors, I'll make a copy from a disk-based database into a memory database, and work directly on this. That is, retrieval gives you a proxy structure which represents a row, setting parameters results in queries, and retrieving results is always cached. This is not so interesting for prototyping, due to the coding overhead, but could be a viable solution for the final version.

[Edit]: A short warning to not use strings in inner loops and other performance-critical cases.

Improving your programming skills

If you are serious about your craft, you probably try to improve your skill set regularly. Read on for some ways I found effective, and which could also help you to improve. The sorting order is roughly by increasing effectiveness.

Single developer approaches

For these approaches, you don't need any friends or peers, but only your trustworthy computer.

  • Reading books: Believe it or not, the standard classics are actually useful. Books like the Mythical Man-Month, the Pragmatic programmer or Code Complete can give you insights which you wouldn't get otherwise. For me, the most important take-away from these books is to see approaches to problems which have worked already and which I might not be familiar with. So if you didn't do it already, go and grab a few of those books.

  • Reading source code: Reading other people's source code can be also a gold mine, if done right. Done right means you try to solve the problem on your own first, before looking into a working solution. Point in case, if you want to learn how to write a scanline rasterizer, you would try on your own as hard as you can, before you open up the Quake 1 source code to see how to do it properly.

  • New tools, development methods, languages: Sounds weird at first, but the idea is to try a different method for developing. If you never did test driven development, try it on the next project. If you don't know Scala, write a small application to get an idea what it is. Same goes for new tools or new libraries. The goal here is to expand the horizon and see what problems are meant to be solved by these new things, and how they approach the problem. Often enough, you can reuse some of the ideas in another context, but you have to constantly expand your toolset. Just keep in mind that none of these tools or methods is going to solve all your problems. While I'm totally for test driven development, I don't use it in my research program at the moment due to a large 3rd party library which cannot be put easily into a test harness instead, I rely on excessive logging, and try to verify the results at each step.

  • Analyzing your code: Every time you finish a project, or at the end of a week, go ahead and review your code on your own. Look out for two things. First of all, is the code crystal clear to you? Do you see things you could have done better? If so, note them somewhere. The second one is look at the bugs your code had or still has. For each bug, think about how you could avoid it next time. For example, in my code I had once a copy-and-paste problem where I copied three lines to update the three components of a vector. Since that day, I always write a loop if it's more than two iterations of something and I didn't run into a similar problem yet.

Peer-based approaches

No matter how hard you try on your own, one day you'll inevitably hit some plateau, and if you are on your own, it's extremely difficult to escape from it. Be afraid of this day, because if the plateau is high enough, you might have difficulties to notice how much further you can improve, and simply stop. Remember the wise Bruce Lee:

If you always put limit on everything you do, physical or anything else, it will spread into your work and into your life. There are no limits. There are only plateaus, and you must not stay there, you must go beyond them.

True words, so let's go beyond this point, by leaning towards our peers.

  • Code reviews: I cannot stress this enough, but doing code reviews with someone else is probably one of the most effective ways to improve your programming skills. You get other opinions directly on the stuff you are working on. To get the most out of a code review, don't ask for what is good, but rather encourage people to actively criticize you. People tend to be more honest when pointing out problems than when praising the quality of your code, and after all you want to improve that is, get rid of the stuff which is not good.

  • Pair programming: If you can, drag a fellow to your computer, and do pair programming with him. To get the most out of it, make sure you discuss the problem first, describe how you want to solve it, and if both of you agree that your approach is viable, let someone watch you how you do it. You'll get a real-time code review, plus the other person is actively trying to solve the problem as well, which makes the comments much more valuable.

  • Mentor programming: Basically, pair programming taking to the extreme, where you have a really senior developer next to you, who constantly tries to challenge you when you write code. In the best case, your mentor will try to get as many possible approaches for a problem out of you, and encourage you to think through each of them guiding you in this process. Unfortunately, most companies value the time of such senior people far too high to let them train other people, ultimately hurting the progress of the whole development team.

Of being determined

Some of you fellow readers have seen me, when I was in the "zone", writing code like there is no tomorrow. A few weeks ago, I had a particularly good opportunity to see what it means to be determined when writing code, something we should try to be every time. Actually, I believe it's the main trait that separates people like Carmack from us mere mortals. They are able to focus on the work at hand, and overcome every problem with seemingly no effort, while we fail on the simplest hurdle.

A side note on this post, if you wonder about the subtitles, there's a simple pattern, and if you get it, feel free to comment ;)

Let's get back to chronological order, or rather, it's a sunny, and I come to work. I'm doing research now, so progress is much more important than everything else, and I had to get some basic algorithm running so we could see whether we're heading into the right direction. I know now that the direction was wrong, so all the stuff I describe here has been scrapped and replaced with a hopefully superior approach, but it's still a very good example for what I want to explain in this post.

It's only pain

The problem was an optimisation problem, for which we wanted to use an iterative solver. To make it simple, we would compute each step by brute-force (instead of some fancy gradient descent or some other optimisation scheme), so I started the day by prototyping the algorithm in C++. Per iteration step (I'll abbreviate as IS for the remainder of this post), it would have to perform a smaller number of computation steps (CS) over a few million elements. The time for a CS clocked in at 3 seconds, which would allow 50 iterations per night -- this should be enough to see whether the algorithm was optimising anything at all.

However, I had this feeling that I can do better:

  • the CS was crying to be done with SSE, as it was already working on float arrays (with the length being a multiple of four, it won't get more obvious than that)
  • the IS could execute all CS in parallel -- and my machine had four cores

So, my mental plan was to do the SSE optimisation first, and as soon as it gives reasonable results, I would jump into the making it multithreaded, and by lunch, I would be able to start it; so far for the plan.

Land of confusion

I gave the SSE optimisation a quick shot with intrisics, giving me 15 seconds per CS instead of the 3 seconds for the C++ version. You can imagine how puzzled I was, especially as all I did was to replace the single-float operations by proper vector operations. However, I was not ready to give up that quickly. Maybe I have read to much of Michael Abrash's "Black Book". I took a more in-depth look at the used intrisics, dug out a new SSE3 operation I never used, unrolled a loop, and came down to 6 seconds. Still much slower as the compiler, and if somebody would step in, I'd have a hard time to explain why exactly I was doing assembly for over an hour ;)

My next try was to investigate the generated code. A good one, as the the compiler generated a bit weird code for the intrisincs, and at that moment, I was convinced I could do it better on my own, by rewriting the routine in assembly. Bear in mind, right up to that moment, I had never used inline assembly in Visual C++, so I had no idea how to find out in which register the parameters are, and how to actually access memory operands. Let alone that my general assembly skills were not quite stellar.

Look no further

However, it was all too late, and I started reading the docs, browsing through a bit of example code, and hack. I think I was crazy enough to finish it in less than half an hour, including debugging, and bam, I was down to 1.3 seconds. Some random rescheduling of the instructions brought it down to 1.2, and that was it, immediate satisfaction, I had won another time. Over night, I would be able to run three times as many iterations as before.

Quiet times

The next hours were filled with evaluation, that is, whether the CS was working properly. There were some slight differences between the vector operation code and the C++ code, but it turned out to be floating point precision issues. Everything looked good so far, and one hour before I was planning to finish working, I thought it might be a good time to give the parallelisation a go. In the spirit of the day I decided to use the Intel Threading Building Blocks, as just as inline assembly, I have never used them before (this is not entirely correct: I had used the concurrent vector from the TBB in an OpenMP based project, but I had not used the TBB to execute code in parallel yet).

Makes me wanna pray

For the IS, I would have to run several CS concurrently, and gather the results, so I had to come up with a way to store the intermediate results without any shared state by the CS. This proved to be quite simple, and I wrote the IS with a parallel loop, ready to see it running in all it's parallel glory.

What I saw instead was an error that the CRT DLL was not found, which could not be solved quickly by copying the CRT DLL into the folder. Rather, an "application initialisation error" would come up. The handy dependency walker was telling me that my application was doing something wrong with the CRT. It really looked like some manifest or versioning problem, or better: Like several hours of fun to figure out which libraries were compiled against which DLL version.

I did a quick hacking, and the application started indeed, just to crash at the first TBB call, so I suspected that it might be some dynamic reloading which causes the problem, plus I would probably have to rebuild all my dependencies.

When I'm dead and gone

At that time, my supervisor came in, and I explained the problems to him in the darkest colours. He tried to convince me to rather throw away the TBB stuff, and thread on my own with the bare WinAPI, and skip optimisations so early in the project. The reasoning is not wrong, after all, if I spend 7 hours to optimise something down from 8 to 1, I could have let it run for 8 as well -- the net result would be the same.

That was the final nail in the coffin, I would let it go, and come in back after the weekend, try to get it running for no more than one hour, and evaluate the non-parallel IS.

Do somethin'

I fired up my RSS reader to browse through the news, fed up with this issue. The next week, I came in to work again, and started to check the algorithm. Each IS would take four times longer than necessary, and this blog post would be a rather lame on at this point. If you know me, then you know that I'd rather spend the 7 hours to get it running 8 times faster, and then rewrite the whole stuff the next morning to be both fast and good to maintain. Probably, I would rewrite it once more afterwards, just to make sure it has been rewritten three times, because I tend to rewrite everything at least three times.

It's hard to explain properly, but sometimes, you can literally feel this determination building up. I guess that our programming heroes are in this state all the time, which allows them to go where none of us can. But at this moment, I had this feeling that no matter how big the obstacle, I would be able to overcome it, even if this would mean that I would have to (re-)write the IS using something different. Two things were clear for me at that moment. Number one was I would rather die than to leave this single-threaded, and number two was if the TBB won't run in 30 minutes, I'll take yet another library -- this time for thread pools -- and learn how to use it. No chance that I would write something with the WinAPI, as this wouldn't allow me to improve any skill. I had my share of WinAPI threading, and there is little to be learned by manually threading an application.

As I only write about days with happy ends, I eventually got the stuff working, and was processing one IS in 3 minutes. The next week, I could evaluate it with the confidence that I did as good as I can, and for this day, there was nothing left which I could improve.

Mercy on me

Looking back, I realize that I could have finished the stuff earlier, if I would have been determined to do it when the first errors with the TBB showed up. After all, it was a minor problem, and by the time my supervisor came in, it could have been running. I hope that the next time I come into a similar situation, I will remember this experience and try harder. This is one point, the other interesting question is: Would I have been able to do more that day, if I would have skipped the optimisation stuff altogether and started evaluating in the afternoon?

Surprisingly, I'm sure the answer is no. No, because as explained above, I would have gone mad waiting 12 minutes for one IS to finish, with the knowledge that this is only my fault. For me, this was becoming yet another fork on the way to become a good programmer, on the left side, the broad road going down the hill, and on the right, the rocky path going upwards. And this time, I have chosen the right one, and there is nothing to regret. After all, I was going to get results much faster, and get more interesting results.

The swing of things

There is one key point to take home from this post: If you come to a tough problem, try to become really deeply determined to solve it, at a level where you can be sure that you're trying as hard as you can. Having had this recent experience, I'm pretty sure that this is one of the key points to get into the "zone" of high productivity, and eventually heroic programming. If you don't manage it on your own, ask a co-worker to push you, or chat with a friend and explain the problem. You'll be amazed how lame your excuses will sound.

It can be difficult if you can't be confident in your skill set, and if you don't have a long track record of solved problems, but remember: We're all in the same situation. Few of us are programming heroes. The only thing we can do is to try harder every day, and I hope that this post gives you a little more confidence the next time you are about to do heroic programming, or just to get things done! For me it's simple: I'll think about all you fellow readers who are on the right road towards awesomeness, and I'll try a bit harder to keep up with you!