Archive

Archive for the ‘Programming’ Category

Bazaar for version control

March 8th, 2010

I’ve been a long-time user of Subversion – and I really like the project. However, I recently switched to Bazaar for nearly all of my work, and I’d like to explain some of the reasons behind this. First of all, for those who are not familiar with Bazaar: Bazaar is a distributed versioning system built by Canonical, the same guys that created Ubuntu. It’s written in Python, and runs on Windows, Linux and Mac OS X. The UI is very consistent, as it is written with Qt, so you can easily switch between systems. As I said, it’s distributed, which is a major differences to centralised versioning systems like Subversion or Perforce. Centralised systems store the history – like the name implies – at a central location. If you run log or diff, this central location has to be reachable. The clients store only the current revision and have to query the server for many operations.

Distributed systems on the other hand don’t require a central server at all. Instead, they store all history at each client. That is, you can work locally all the time and you never have to connect somewhere. It’s likely that you will have some location to which you push your changes so they can be easily discovered by your co-developers, but this is by no means required. Many modern versioning systems are distributed, like git, Mercurial and Bazaar.

Why distributed

Distributed systems have some advantages over central systems, specifically:

  • Working offline is easy: You can do all your work locally, revert, query the logs, diff, without network access. That’s really cool if you are not online or your server is unreachable.
  • Simple branching: Branching is typically very efficient and well supported in distributed systems, as it’s a core part of them. Merging is usually much more flexible as well.
  • Easy backup: Every developer has a full backup available all the time, so if your server explodes, nothing is lost at all.

Of course, there are some disadvantages as well:

  • It may be more difficult to know which copy is the “latest”. Usually, a server is still running somewhere to which all developers push or which gets the latest changes merged to, but you have to agree on this. That’s simply an organisational issue.
  • Local history can become huge, especially when you store binary files. If you’re about to store binary data, a central system is usually superior.

Bazaar

I’ve tried git, Mercurial and Bazaar, and I stuck with the latter. Feature-wise, they are all very similar. Git is more low-level than the others, so it might be easier to hack something together with git, but for day-to-day use, I didn’t find a killer feature in any of them. Performance-wise, git is the fastest, but that’s only relevant if you do benchmarking. For the projects I work on, Bazaar and Mercurial are fast enough, as the response time for most commands is less than a second. Most of the time, you’re going to be limited by network speed during pushing/pulling anyway, and typical commits (changing a dozen files or so) are instantaneous with all of them. One nice feature of Bazaar is the excellent interoperation with other VCS. I’ve been using Bazaar heavily against a Subversion server, and this really worked fine — much better than with git, which was very slow when importing from SVN, and each operation against the SVN server took some time as well. The nice thing when using Bazaar against SVN is that renaming/changing content is much more robust than with SVN alone — I did some large-scale refactorings with both SVN only and Bazaar against a SVN server, and using Bazaar is much more comfortable. For instance, you can rename, change content, rename again, commit, and then push to SVN without any problems, even if one of your renames only changes the case.

Right now, I’ve switched here to Bazaar for a rather big project (5 years of history and 3000 revisions), without any trouble. I’m developing from three different machines, and keeping them up-to-date was very easy. One didn’t have internet access for instance (let’s call it B), so I would pull the trunk on machine A, and pull updates to B from A — now B has internet, and I pull from trunk again, without any problems.

Another reason I like to use Bazaar is that the UI is nice, useful and consistent across all platforms. Most of the time, I actually use the UI, as it makes many operations rather easy (like selecting which file to commit, see added files, edit ignore lists …) Finally, Bazaar has corporate backing, which provides two main benefits: Someone is getting paid to write docs, so Bazaar is really well documented. The other benefit is that some developers are working full-time on Bazaar, which means that bugs, suggestions, and other issues get eventually looked at even if they’re rather boring. With git, you don’t even have a proper way to report a bug, while you can be pretty sure that any bug you report against Bazaar gets processed some day. All in all, I would recommend to take a look at Bazaar when you are choosing a DVCS, as it might be actually all you need in an easy-to-use package with a nice, helpful community.

One side note: If you try Bazaar 2.1.0, you should get the Bazaar Explorer 1.0.0rc2, as there is a known issue with the one bundled with 2.1.0 — the toolbar is simply empty in the explorer bundled with 2.1.0.

Beta testing time

October 26th, 2009

Visual Studio 2010 Beta 2 has been released recently, and CMake 2.8 is also in RC mode right now. In case you haven’t done yet, this is the time to give it a try, as Microsoft and Kitware are really into fixing bugs. For instance, nearly all of the bugs I reported in Beta 1 of VS 2010 have been fixed in Beta 2 (on the other hand, a whole crop of new bugs appeared in Beta 2 ;) ) Anyway, if you’ll have to use VS 2010 in the future (I will have to as DirectX only works on Windows), you should really take some time and test your stuff now and report problems, otherwise you’ll end up cursing for the next few years.

For CMake, it’s not that critical, as the release cycle is much shorter, but if you want good support for VS 2010, you should start testing now. My personal project here does not work quite with CMake and VS 2010, but Kitware is very helpful and we’re trying to iron out the last remaining issues.

In my opinion, far too many developers just blame the tools, and never ever report bugs actually. Sure, you can hope that your particular problem gets caught by the QA one day, but reporting really increases the chances. After all, you want your clients to properly report bugs as well instead of spreading bad publicity, don’t you?

Note: I’m not working for Microsoft or Kitware – I just picked those two, as they have products in the making which are likely to impact a lot of developers.

Postmortem: Diploma Thesis, 2

October 19th, 2009

Welcome to the second part of the post-mortem analysis. In the first part, I mentioned the things that went right, now it’s the time to find out what could have been better:

  • Direct usage of std::iostreams everywhere: Well, the main problem here is that I thought that an iostream will throw an exception if used incorrectly (as I was used to from .NET and my own projects), but this is of course not true – an iostream will just set the fail bit, and you have to check for that. Unfortunately, this became only a problem really late when certain files were missing and things would break silently. For the future, I’ll wrap the iostream creation in a function which throws an error if the file is missing.
  • File I/O part two, even though I had text files, I missed two key points: Versioning and type-information. That is, each file should have a header like “Scene 4” or so, which is checked by the corresponding reader. Again, this wasn’t a problem during development as I used to pass the right files everywhere.
  • You probably see a pattern already, but there were a few more instances of “silent” failures – I’ll have to take special care next time to avoid all possibilities of loading wrong data or partial data. I’m also thinking about adding checksums to each file, even though it makes quick manual editing a bit more complicated.
  • Manual threading: All threading should have used OpenMP from the beginning, which would have saved quite some time. Notice that with OpenMP, static scheduling can lead to bad behaviour if the individual items take vastly different amounts of time, make sure to profile the CPU usage. I had to use dynamic scheduling to get maximum efficiency.
  • Unit-testing: I started too late, and added all in all too few unit-tests, which led to some hard-to-track down bugs.
  • Functional testing: I had no functional testing in place. In retrospective, there is a really easy solution to this: Add a script layer (for instance, using Lua) to issue all commands to the app and also to query the state. This allows easy recording of all UI actions and replaying them, and makes it extremely simple to automate testing. Another huge advantage is that an “undo” system comes for free, as one can replay everything until the last command. Definitely something I’ll try next time.
  • So much for the down-sides, doesn’t look to bad after all. I guess I’ll read through the code once again in a month or so, but so far, I’m halfway pleased with the results. Especially as the code was under constant change most of the time :)

Postmortem: Diploma Thesis, 1

October 12th, 2009

As I’m in the finishing stages (writing ;) ) of my diploma thesis, it’s time for a post-mortem analysis of the source code. This week, I’ll investigate the things that went right and which I’ll try to use in the future again.

  • CMake and the subsequent port to Linux: Making the application portable allowed me to clean up the source code a bit, and get eventually better performance. In retrospective, I should have used CMake earlier in the process to save time porting later. For the next project, I’ll definitely start using CMake as early as possible in the process.
  • OpenGL for all viewers and GUI applications: Even though I’m strongly biased towards DirectX due to the superior debugging capabilities and the cleaner API, OpenGL proved to be reasonably easy to use and made the porting very easy. Using GLSL was not too painful, but I was running on nVidia hardware only …
  • Libraries: SQLite turned out to be very easy to integrate as an intermediate format, and allowed me to store processing data without having to deal with file I/O issues. Definitely a good choice. For reports, I used CTemplate, which was also very easy to use and allowed me to generate nice HTML reports – much better than purely textual output, as I could easily integrate images into them.
  • Text formats: All input/output data would eventually end up in text files. I didn’t use any binary formats, and I could easily debug all data. This turned out to be a major time-saver, as I could parse the text files using scripts as well. Notice that the text files contained data like meshes while SQLite was used for storing intermediate data like per-mesh information.
  • Modular design/Python: The whole processing was designed as a pipeline of several libraries and executables which would transform the data. Some of these modules were written in Python, while most where in C++. Especially towards the end, I could still easily add new algorithms without breaking stuff. This became important as I had to extend one processing module: I could simply write a new module, call the old module were necessary – thus reusing a lot of code – while still allowing me to quickly add the new functionality. All I had to do was to replace the calls to the old module, which were very easy to identify due to the separation.
    Python turned out to be a very good choice for quickly cleaning up data, and together with SQLite I could easily exchange data with the other C++ tools in the pipeline. Much easier than doing that in C++.
  • UI: The applications usually had something like 20 parameters, which became a problem towards the end when I had to run them very often and tweak just a few parameters. The solution here was to write small UIs to wrap the parameters; with PyQt4 + QtDesigner I could get GUI front-ends running in half a day, while saving me lots of time later on. In future projects, I’ll try to write such small GUI runners; one nice point of having a GUI runner is also that you can give a demo more quickly ;)
  • All in all, the programming went ok, and I’ll try to do some things just the same in future projects. Next week (hopefully), I’ll take a look at the stuff that didn’t work out quite as expected.