Postmortem: Diploma Thesis, 1

As I'm in the finishing stages (writing ;) ) of my diploma thesis, it's time for a post-mortem analysis of the source code. This week, I'll investigate the things that went right and which I'll try to use in the future again.

  • CMake and the subsequent port to Linux: Making the application portable allowed me to clean up the source code a bit, and get eventually better performance. In retrospective, I should have used CMake earlier in the process to save time porting later. For the next project, I'll definitely start using CMake as early as possible in the process.
  • OpenGL for all viewers and GUI applications: Even though I'm strongly biased towards DirectX due to the superior debugging capabilities and the cleaner API, OpenGL proved to be reasonably easy to use and made the porting very easy. Using GLSL was not too painful, but I was running on nVidia hardware only.
  • Libraries: SQLite turned out to be very easy to integrate as an intermediate format, and allowed me to store processing data without having to deal with file I/O issues. Definitely a good choice. For reports, I used CTemplate, which was also very easy to use and allowed me to generate nice HTML reports -- much better than purely textual output, as I could easily integrate images into them.
  • Text formats: All input/output data would eventually end up in text files. I didn't use any binary formats, and I could easily debug all data. This turned out to be a major time-saver, as I could parse the text files using scripts as well. Notice that the text files contained data like meshes while SQLite was used for storing intermediate data like per-mesh information.
  • Modular design/Python: The whole processing was designed as a pipeline of several libraries and executables which would transform the data. Some of these modules were written in Python, while most where in C++. Especially towards the end, I could still easily add new algorithms without breaking stuff. This became important as I had to extend one processing module: I could simply write a new module, call the old module were necessary -- thus reusing a lot of code -- while still allowing me to quickly add the new functionality. All I had to do was to replace the calls to the old module, which were very easy to identify due to the separation. Python turned out to be a very good choice for quickly cleaning up data, and together with SQLite I could easily exchange data with the other C++ tools in the pipeline. Much easier than doing that in C++.
  • UI: The applications usually had something like 20 parameters, which became a problem towards the end when I had to run them very often and tweak just a few parameters. The solution here was to write small UIs to wrap the parameters; with PyQt4 + QtDesigner I could get GUI front-ends running in half a day, while saving me lots of time later on. In future projects, I'll try to write such small GUI runners; one nice point of having a GUI runner is also that you can give a demo more quickly ;)

All in all, the programming went ok, and I'll try to do some things just the same in future projects. Next week (hopefully), I'll take a look at the stuff that didn't work out quite as expected.

Leaving the Côte d'Azur

It's time to leave, this week, my time at INRIA has come to an end, and I'm leaving the south of France to exchange it with the cold winter in Germany. I've been here now for six months to finish my studies, and I'll be hopefully able to write about the results soon - at the moment, we're working on the paper.

The next three-four weeks are going to be quite busy for me, as I also have to write up the results for my diploma thesis which is unfortunately expected to be longer than a standard paper ;) Moreover, I have to tidy up a few loose ends regarding my studies, so I can properly graduate. The net result of all this stuff is that I might not be able to blog as regularly as I did the last few weeks, as my backlog is currently completely empty, and even though I have quite some stuff in the pipeline, none of it is ready for publishing. One thing which is definitely ready will be out by December this year - stay tuned for more details on this.

Linux developer tools

After switching to Linux recently at work, it's time for the follow-up post with the tools I currently use for development.

Core tools

For compiling, I use the standard GCC -- currently 4.3, which has support for OpenMP and is more or less on par with Visual Studio 2005/2008; C++ support is quite good, and I had minimal problems while porting. It's also faster during compiling than VS 2005, which has some weird problems with some generated code I have thrown at it (took 30 minutes to compile a 500 KiB file containing straightforward C code.)

For building, I use CMake, and by now it should be clear that I like it :) I'm really happy so far, especially because it allows me to generate standard makefiles on Linux as well as KDevelop/Eclipse projects. In particular, it is also supported by Qt Creator which can open CMake projects directly. I tried both KDevelop and Eclipse, but Qt Creator really wins hands down at the moment, even though it lacks a bit of polish. Just make sure you use the latest released versions, my Ubuntu came with 1.0 (while 1.2.1 was current), and it's really under rapid development. Supposedly, Eclipse CDT 6.0 also brings a lot of improvements, so I have to give that a try.

Side note on building: I replaced a bunch of batch files with Makefiles, which are really handy for various tasks. Especially as you can build any crap with them, including UIs and documentation without any problem ;)

For profiling and memory checking, you can use the excellent valgrind tool (together with callgrind and kcachegrind). Especially for memory checking, valgrind is probably the best tool you can find, and if there is a good reason to port, then valgrind. It runs the code in a virtual machine, and logs all memory accesses, so you can easily find every memory leak, off-by-one-error and access to uninitialized memory, without having to modify the application.

For UI development I use PyQt. My tools were written using Python anyway, and adding an UI with PyQt was straightforward. You can leverage the Qt Designer to create UIs for it (it comes with a custom compiler), and it's straightforward to integrate. Nokia also provides a LGPL alternative -- PySide -- which might be interesting if you develop commercial applications. For me, the more mature PyQt turned out to be a good choice, didn't have any problems so far.

Other tools

There's of course a bunch of other tools you need as a developer, and some of these are not exclusive to Linux, but I add them here as well -- some of them might be a good choice to use on Windows, as you can continue to use them on Windows.

For editing, I use Kate and VIM -- VIM is very nice when it comes to large files, as it is very fast even on 500+ MiB sized files. Kate is a simple text editor, nothing fancy, I guess the closest Windows equivalent is Notepad++.

For mail, I use the stock KMail, after trying Thunderbird (which is my primary mail client on Windows). KMail is an integrated part of the KDE desktop, and feature-wise I'd argue that it's on par with Thunderbird, I've yet to find a missing feature.

All my documentation is now written with asciidoc -- a very nice documentation generator written in Python. It requires only minimal markup so the files are still readable as plain text, which is a killer feature compared to docbook and friends.

Well, doing graphics, I also need a 3D app, and in my case I run Maya 8.5. It works nearly without problems (sometimes, the UI shows some weird transparency in the outliner, which might be related to compositing), and I'm simply used to it. Installing on Ubuntu is a bit tricky (you should follow this excellent guide) , but it's possible, including license management from network and mental ray rendering. I gave Blender a shot, but until they start to follow some basic UI guidelines (left-click selects, for instance), I see no point in learning Blender. Don't get me wrong on this, I do believe that Blender is a great tool, but you have to invest a lot of time to learn it, especially if you used other tools (in my case, Maya and MAX) previously. For image editing, I use GIMP.

Summary

That's probably the tools which account for 99% of my work time. I hope you find this list useful if you come also try to get started with Linux development, at least I would have saved some time if I knew it beforehand :) What's very nice about nearly all of this tools is that installing them is very easy, as they are free and directly available from the package manager -- something which I miss on Windows.

Yachtseeing in Antibes

In case you're in the area, and you want to see a few nice yachts, head to Antibes today. Right now, there's a bunch of yachts in he harbour, among others:

... and a few more of nearly the same size which I missed ;). Specifically, the "Quai des Milliardaires" at Antibes is crammed up to the last place, and the rest is at anchor around the harbour.

Switching to Linux: A Windows developer’s view

A few weeks ago, I switched my development environment from Windows to Linux, on a project which was developed so far on Windows only. In this post, I want to describe the issues that brought me to this switch, a short overview how I did the actual port, and some observations on Linux for developers. This is the first post in a series of at least two, the second post will describe the tools I use on Linux right now.

Background

The project I'm working on is written in C++, with some Python tools mixed in. My original development environment was Visual Studio 2005 on Windows XP. This is already the first issue: Updating Visual Studio or Windows is not trivial, as both the OS upgrade as well as IDE updates require new licenses, and especially in companies new versions are not bought immediately.

The problems became apparent when I tried to multi-thread parts of the application. At the core, it's doing a lot of number crunching, in small work blocks which can be processed independently. As I couldn't use OpenMP due to dependency issues (a 3rd party library could not be linked when OpenMP was enabled), I was threading manually. Unfortunately, the application had to allocate some memory in each thread, and as it turned out, the scaling on XP was catastrophic. While I did get a speedup from 1->3 cores, it became slower from 3->4 -- clearly, I was hitting some issues with either the scheduler or the memory subsystem, as my code didn't have any I/O in it.

A quick try on Vista showed that the same application ran more than twice as fast, but unfortunately, I couldn't install my IDE on Vista as well, and developing on Windows XP and testing on Vista was out of question. Again, with a free IDE, the change would have been no problem (and the express editions don't have x64 support, nor OpenMP, more on this later. See below for an update!).

On the other hand, getting a recent Linux with a recent compiler was not a problem. With Wubi, Linux can run side-by-side with Windows while giving you a full Linux based development environment. Running side-by-side is especially important when you're in a corporate environment, as you usually cannot simply erase the disk and install Linux without making the IT angry.

Linux

I used Wubi, with the Kubuntu flavour, as I like the KDE environment a bit more than GNOME -- especially as I use Qt for UI development now. Specifically, I used Kubuntu 9.04 x64, while I used a x86 Windows XP previously.

Porting

I started by checking out the code from SVN, so no problems here. Even though the application was written in standard C++ and didn't depended on Windows-specific functionality, it was built using Visual Studio project files and used a few WinAPI calls. As a first step, I ported everything to CMake -- something I could/should have done on Windows already. With CMake, I was able to quickly convert one project after the other, and immediately check for compile errors. This proved to be the best way, as I never came into the situation where I would get huge amounts of compile & linker errors at the same time; I had that when I moved an already CMake based project from Windows to Linux and tried to get it running in its entirety. During porting it's best to port on subproject at a time, even if the project is originally using a portable build system.

As I mentioned, I used explicit threading on Windows, which I replaced by OpenMP on Linux. Now I could also throw out all configuration stuff for threading; among other things, I wouldn't have to reduce the priority of my application on start-up -- this was necessary on XP, as the machine would become unresponsive during processing otherwise. Boost.Threads might have been a valuable alternative here, but OpenMP is well suited to the kind of loop-parallelism I had in my code, and even simplified it compared to the explicit splitting/execution I used previously.

For graphics, I was already using OpenGL. As I could easily get the nVidia binary drivers running, I had no trouble on this side. Overall, it took me half a day to port, including the time to set up the Linux installation.

Results and some thoughts

The net result is interesting: The same application is running 5-10x faster now when using all four cores, so porting to Linux was really worth the hassle. I assume that with Visual Studio 2010, running on Windows 7, I would get similar performance, but the key point to take with you here is: Getting your stuff to work on Linux only costs you time, and not too much if you are a bit careful. Using CMake on Windows (or another portable build system), writing more or less clean C++ and using portable libraries makes porting to Linux easy, and the switch itself is not too complicated. At the moment, the tooling on Linux is reasonably comfortable (more on this in the next post), and the "pain factor" to switch from Visual Studio to for instance KDevelop or Eclipse CDT is no longer there.

Actually, the switch is so simple that Microsoft should get concerned. For instance, I have been developing mainly on Windows since several years, and I occasionally tried Linux, but I never did a complete switch due to various smaller and bigger problems. However, since 1-2 years, the Linux desktop, together with the tools, is good enough to provide some real benefit, especially if you cannot access the latest Microsoft products. Microsoft used to have the best developer tools by far, and quite stable APIs, which were in my opinions the corner stones of their success. However, they're changing APIs now rather quickly (WinForms? WPF? WinAPI?), they provide new platforms which require rewriting your applications (I'm still waiting for an application like AutoCAD which has a C# UI and a C++ backend), and the tool release cycle is simply too long -- waiting 2 years to get a compiler bug fixed is just ridiculous.

On the other hand, developing on Linux means you have an extremely stable API (POSIX isn't going to be replaced with PoseFX, for instance), the UI side is rather clear (GTK or Qt, you choose), and the tools are getting better as well (GCC and LLVM are getting better quickly, and installing a new GCC does not require buying a new license). If Microsoft does not turn around the ship with Visual Studio 2010 and some clear statements on the APIs, I assume that more and more developers will find that Linux can be also a very nice environment. Again, more on this next time!

[Update] A few notes, as this article is getting a lot of attention and there are some misunderstandings. Regarding the library, I had access to the source, and I could have built it on Windows -- it's simply very time-consuming, as it depends on many libraries like zlib, which I would have to compile with the new settings on Windows as well. Getting these dependencies on Linux is obviously much easier.

OpenMP vs. manual threading: This did not improve the performance compared to the manual threading, but it was a nice bonus as it cleaned up the code a bit. On Windows, I could not use OpenMP due to link issues. The other bonus came from the switch from x86 to x64. Finally, the Linux memory allocator is much better in multithreaded environments, and this was the biggest performance improvement. In total, the application runs several times faster now, without changes on my side.

Summary: In this particular case, the switch to Linux took me just a few hours, while the benefit for me was rather big: Improved performance, and less hassle with dependencies. The drawbacks -- getting used to Linux, different tools -- are out weighted by the advantages for me; and that's the main point of this blog post. I was surprised how easy it was to switch from Windows to Linux completely on this project; as I expected a lot of problems (like for instance, not being able to get the stuff running at all!)

Things might have been different if the library and the dependencies would be easier to build on Windows, and I would have gotten access to a new Windows version/new compiler version, but this is simply how things turned out, and I don't miss my Windows setup at work.

[Update] The x64 editions of Visual C++ Express require some setup to support x64, but it is possible to add x64 support manually using the compilers from the Windows SDK. Check out the official C++ Express feature list and a guide how to enable x64.