# Optimising performance: Strings, maps, sets

I had to optimise the performance of my research stuff lately to get into near-interactive rates -- with a few new insights how to quickly improve code which uses lots of strings and complex data structures. Note that this is actually a follow-up to my post on strings, in which I said that the performance using strings is usually not too bad -- that part was still sufficiently fast, but in another area, I had to improve the performance. So let's investigate the core issues and how they can be solved.

## Problem statement

In my case, I had a small graph with string nodes, and I had to map another graph partially onto it -- 30.000 times. Caching and reusing some results would be complicated, so the goal was to improve the full calculation. The basic steps are:

• Gather ~ 20 subgraphs
• For each subgraph
• For each node
• check if the node is in the main graph
• retrieve some data
• small calculation
• Gather the results for each subgraph
• Select one

Even though I won't mention it throughout the text, I used a profiler to verify assumptions -- which is necessary to avoid optimising the wrong parts. Looking at the processing steps, we can easily identify a few problems: Creating and destroying lots of small containers containing strings, and lot of comparison instructions, especially during retrieval, as I used a standard map. The profile also indicated that a lot of time was spent in the memory allocation and in string comparisons.

### Optimisation, first part

As the first optimisation, I wanted to make sure I use proper data structures. In my case, I was mainly using sets and maps, with very few elements in the average case (~5-20). The sets turned out to be mostly static, so I simply replaced them with vectors -- this is very easy, you just need to call `std::sort`, `std::unique` and `erase` to make a set out of a vector. For the maps, I choose to use hash maps instead of the red-black tree maps.

The net result was good for the set optimisation, but now, a huge amount of time was spent in the string hash function. The main problem was that each lookup into a container would recompute the hash, even though the strings were immutable.

## Part two

Eventually, I replaced the strings in one part of the graph by identifiers, and translated between them at the beginning of the processing. This resulted in a 10x improvement, so it became obvious that for best performance, I would have to replace all my string processing ...

... which turned out to be fairly simple. I created a proxy structure with a single member -- an integer -- and a shared array of strings. Basically, I exploited the fact that my set of strings was limited. Each time a new string would be encountered, I would create a new slot for it, otherwise, I would simply return the id. Comparisons were now merely integer compares. Together with a very lightweight locking mechanism (a simple read-write lock, which would allow multiple readers to pass through if no writer was modifying the shared state), parallel execution and some other minor tweaks the final performance improved by 4x -- fast enough.

## Conclusion

Using strings is not inherently evil, and it also does not make future optimisations extremely difficult. Ideally, I would determine the set of strings up front in my case, and properly generate the identifiers instead of doing it on-the-fly, but even with this version (and the additional synchronisation) I could quickly improve the performance. Moreover, the code was already debugged, and for debugging, using strings instead of identifiers proved to be invaluable. It's always a difficult decision, but in my experience, using strings first, and tuning if necessary usually pays off: You get faster development, and the optimisation can be oftenÂ  done rather easily.

# SIGGRAPH 2009

Back from SIGGRAPH, and with quite a backlog of papers to look at. But first things first, let's take a look at how SIGGRAPH was!

## Travel & US

Travelling to the US was surprisingly painless. I expected to wait at the airports, but both the immigration and the customs went very smoothly. In New Orleans, there was a special shuttle service to the hotels provided by SIGGRAPH, quite a bit cheaper than the cabs, so getting to the hotel is really easy.

Arriving at the US is a bit of a change for a European, as the air conditioning is usually running extremely cold. On the airport, and also in our hotel, the temperature was approximately 20° -- while it had 30-35+° outside. Due to the extremely high humidity in New Orleans, the difference felt even bigger. At SIGGRAPH, it was even worse, so bad that in fact after a few days, everyone was running around with a cold -- even the Americans, which should be used to extreme air conditioning.

I was continently located right next to the convention centre, so I could walk there. Very nice, especially for taking a short break.

## SIGGRAPH

SIGGRAPH itself filled around 1/3 of the convention centre, and was located on three floors. I had prepared a plan, so I didn't have to wander around -- something I can recommend before going to a conference. Make a plan, with an alternative for every session, in case it turns out to be less interesting than expected.

I'll probably skip quite a lot of stuff here, but I'll try to post about them in the future.

## People

Well, quite obvious, but SIGGRAPH is full of famous people. You can easily spot academy award winners, game developers and researchers everywhere. Moreover, it's easily to discuss things with well-known people, as they are usually quite open to talk about tech stuff. Plus they really know what they talk about ;)

## Presentations

I found two sessions extremely interesting: The production sessions for "Star Trek", "Terminator 4" and "Transformers 2", and Will Wright's Keynote. Of course, there were other great sessions like "Beyond programmable shading", but those two really stood apart.

Will Wright's keynote was highly interesting, entertaining and very well presented. You can immediately see that Will has really a huge amount of experience in the entertainment industry, and is a very bright guy as well. The presentation was totally in the spirit of "The Zen of Presentations", with over 250 slides with mostly pictures, and a really fast speaking Will Wright next to it. If you can get a video capture of it, take a look, it's really worth it.

The other sessions is one for which a video capture is highly unlikely, so if you didn't see it -- bad luck. ILM was showcasing three movies, with lots of confidential material (concept art, pre-viz stuff) -- very interesting. It's just incredible to see how much time they spend on minute details, while still having some fun. For "Transformers 2", they even came with a small making-of video with quite a few jokes ("Could you load the devastator model?".) The next time I'll get a chance to go to SIGGRAPH, I'll definitely try to checkout the production session as well.

## Exhibition

The exhibition was interesting, but besides the few new announcements, there was not too much to see. nVidia presented OptiX, with some demos at their booth -- ATI didn't have anything new to see. On the other hand, both have people from their driver/developer relation team at the booth, so it's a good chance to ask questions and do lobby work (personally, I tried to make clear to ATI that support for `EXT_direct_state_access` would be most welcome.)

## After show/Party

Never underestimate party going. Every day at SIGGRAPH ends with a party, be it an official one, or an unofficial. The key rule is to know where the free stuff is ;)

## Closing words

Looking back, I'm very happy to have seen SIGGRAPH, mainly for two reasons: One is the ability to get in touch with interesting people, which are difficult to reach otherwise. The other one is to get a broader look at how the industry looks like -- the ILM stuff was really incredible, and it's simply stuff you never get to see as a student.

I'll try to cover some more interesting papers and talks in the following weeks, but at the moment, I have a huge backlog of stuff to read here. Worse still, I have some deadlines looming above me as well, so bear with me if I miss some of my post slots.

# Unit testing introduction

I'm a fan of unit tests since I started writing them. So far, they also turned out to be very valuable, even if added late to a project, so let's take a look at this whole unit test thing -- what it is, why you should do it, how to do it, and some practical notes.

## Who? What? Why?

A unit test is a test which covers a small unit of functionality. Ideally, you would have one unit test for each "leaf" function, or at least for functions which operate on isolated state. This has a nice side effect that unit testing helps to write side effect free (pun intended ;) ) code, as you want to test each function individually.

For object-oriented code, you probably want to test single objects or at most the interaction of two objects in a test. There are lots of nice examples, to take one from my work: Recently, I implemented a tiny graph class, and wrote unit tests to add a node, connect two nodes, check that the edges exist and similar low-level operations on the graph.

But why unit test in the first place? Sure, it increases the code coverage counter, but what else? I see mainly two points for unit testing: You get more isolated objects, and you can easily refactor or optimise code later. The latter is actually my killer argument, because it's highly like that you will have to either add functionality or improve performance. In both cases, you want to make sure you have a "good" set of functions on which you can rely on.

One side effect of unit tests is that you usually get better debugging capabilities as well, as instead of debugging the whole app, you can debug individual unit tests. Plus, you can be sure that your stuff actually works at all -- if it passes the unit tests, it's at least not completely broken.

So what makes up a good unit test:

• Fast: You should be able to run thousands of them per second. Loops, disk I/O, GUI creation is not unit testing! For I/O, consider to convert the binary data into a static array and compile it into the tests.
• Single point of failure: If a test fails, you should have to look at only one piece of code. That is, you should avoid testing several different things in a single test -- better write individual ones -- and you should also try to check only one code path per test. If you have a function which returns an error if a parameter is wrong, and a valid result otherwise, write two tests!
• Deterministic: Don't mess around with random numbers, current time, date, CPU count or other stuff which might change. That means also that checking whether your CPUID instruction works is something that should be moved into a functional test.

## When/How to start

There are two main approaches to unit testing, one is test-driven development, and the other one is -- everything else ;) With test driven development, you write a test first, then you write the code, and continue like this. It gives you very high coverage and nice interfaces, as you write the usage first. With TDD, the best way is to write tests right from the start of a project, or if you jump into one which does not have unit tests, then you write those for new code.

Adding tests to an existing project which didn't have unit tests in the first place is an interesting problem anyway. As a rule of the thumb, I tend to add unit tests before refactoring and optimisation. Especially for the latter, having a large test set is important, as you might easily wind up to break your function while throwing around with all these movmskps!

What I found to work in practice very well is to write the unit tests directly after designing the interface, as it gives you one additional information over writing the test along with the interface: If your interface turns out to be difficult to test, you immediately can check under which assumptions it has been created in the first place. If you start with the test right away, you might miss something, so in my opinion, it pays off to think twice about each interface.

## Practical notes

• For C#, NUnit and MSTest are very comprehensive test suites. They cover all you need, provide nice graphical tools and are very similar. Unfortunately, the order of parameters is slightly different, so switching between them might require some work.
• For Java, I only used JUnit, and I never had something to complain.
• C++: There are two unit test frameworks I can recommend, having excessive experience with them ;) (I'd guess 10x more than with Java and C# combined). First, there's Boost.Test, which is the canonical choice if you have to get running with unit tests quickly. In the latest version, it's easy to use, has minimal setup and well -- it's Boost, so it's tested on various platforms and has lots of docs. The alternative is UnitTest++, which is a tiny library. The nice thing about UnitTest++ is that it is so tiny, that it's worth to embed it into larger projects to be independent of Boost. Usually, if I know that I'll be writing lots of tests, I tend to prefer UnitTest++, simply as I can be pretty sure that in case I need to support a new platform, porting UnitTest++ will be easy (it has been ported to the XBox 360 for instance).

Of course, there are probably lots and lots of things I missed here, but I hope I could show why unit testing is actually something worth caring about. Especially when it comes to later maintenance of a project, having unit tests in place is a huge time saver. For me, these were the moments in which I was always happy to safely shuffle stuff around without randomly breaking things and noticing it much too late.

# OpenGL tips & tricks

A few tips & tricks which might be useful if you write stuff with OpenGL. I use OpenGL for research stuff at the moment, which means I need to get stuff working quickly, and it should be robust so it does not break in case of frequent changes.

## Direct state access

If you're starting fresh with OpenGL, you should definitely take a look at the direct state access extension. OpenGL is originally modelled as a state machine, which brings in one huge problem: Whenever you want to set something, you have to modify the state first, then the target object, and then you can modify the itself. For example, binding a texture requires you to change the active texture unit and it remains changed, unless you query the current one first and set it back afterwards.

With direct state access, all of these problems go away. You can directly set shader parameters, textures, matrices and quite a bit more, without interrupting the state. This is a real time saver, as it solves many cases where some weird state change breaks something.

## State management

As already noted, OpenGL maintains a state, and you have to be extremely careful to maintain the correct state for your application. For rapid development, I would recommend to use direct state access and pushing the complete state before every function block. If you are going to render something, push the state, render, pop it, and chances are high that your state afterwards is more or less the same as it was at the beginning.

While the fixed function pipeline is nice for prototyping -- you just bind a texture, pass on some vertices -- using shaders even for simple texturing does have one huge advantage: Debugging. Without shaders, you'll never know whether your surface is being rendered at all, whether the UVs are completely way off or whether the texture is simply not there. With a shader, you can first render a constant colour to see check point one, then you can render the UV coordinates to check point two, and then you can check whether the texture is properly bound.

What I found to work well is to have a simple class for each individual shader, instead of trying to have a more general mechanism. Usually, the number of shaders in a research application is not too large, so this method is well suited. Plus you can validate the parameters individually, which can catch quite a few bugs.

## Random tricks & tips, and notes

Randomly sorted, a few things which would have saved me quite some time:

• For fast resizing, render a fullscreen quad into a framebuffer with the right size.
• Always check your viewport/projection matrix when changing render targets!
• If a parameter is not used in a shader, binding to it will fail -- so make sure you use it somehow (multiplying by 0.00001 and adding to the result is already enough) during development.
• Check your shader version, some features require `#version 110`, while some require 130 or higher -- but 130 brings in quite a few changes.
• `gl_FragCoord` is in pixels, so you have to divide by the screen size when addressing fullscreen buffers.
• Always check your texture coordinates when you get weird results, maybe you sample only one pixel.
• Beware of imprecise interpolation. If you set a value to be 1.0 on all corners of a triangle, it still might be slightly different in-between -- enough to fail in a `== 1.0` test. Had this issue a lot of time, an easy workaround is to check for `< 0.5 && > 1.5` if you want "integer" style selection.
• When drawing 2D elements, disable the depth test. Be aware that blending always blends against the framebuffer contents, so drawing an opaque quad and a transparent after over it will not result in transparency, as the opaque one has been already blended with the framebuffer! Thatâ's one for the d'uh files, even though it's obvious :)

So much for today, I hope I could save some trouble :) I'm still not too happy when using OpenGL, even though the immediate mode rendering stuff is very nice during prototyping (with DX10, it takes me 60 LoC in my own framework to draw a fullscreen quad; in OpenGL, it takes me 10).

# Interesting graphics papers, articles and other stuff

A premiere on this blog -- a look at a a few interesting things going on in the graphics scenes. If I get positive feedback, I'll try to repeat this, for now, it's just a test. So let's start this walk of random graphics topics by looking at a few recent papers.

• "RenderAnts: Interactive REYES Rendering on GPUs": A very interesting paper, using the BSGP framework by Kun Zhou. This paper describes a full REYES renderer running completely on the GPU. That is, from bounding over dicing to shading and sampling, everything is done on the GPU. The interesting part is how they do the scheduling across 3 GPUs with very little overhead. For rendering the micropolygons, they use a simple scanline based rasterizer. I think this is also quite interesting, as the rasterization of micropolygons is still a problem, especially when effects like depth of field or motion blur are added.
• "Data-Parallel Rasterization of Micropolygons with Defocus and Motion Blur" is a paper which solely investigates how to rasterize micropolygons. It contains a description how Pixar's rasterizer (used in PRMan), and also efficiency measurements. The paper also proposes a new rasterization algorithm, which outperforms Pixar's -- which makes it worth to take a look.
• "A bit more deferred", a presentation by Crytek's Martin Mittring. It describes how the CryEngine 3 is approaching lighting. The basic idea is to decouple lighting from surface shading -- just as in deferred shading -- but still execute the actual shading while forward rendering the surfaces. Sounds a bit weird at first, but it's similar to what "light shaders" in RenderMan do. If you look at the RenderAnts paper, you will see that they group together lighting calculations (illuminance in RSL) -- with the same idea in mind as here. This approach is also described by Wolfgang Engel in his "Light pre-pass renderer", and has been picked up by the Realtime-Rendering blog as well. I guess that in the future we will see even more approaches which compute the lighting information independently of surface shading, and I'm curious to see how far this can be pushed. There are a few more interesting papers out there So much for the first time, if you are interested in reading more like this, feel free to comment!