I had to optimise the performance of my research stuff lately to get into near-interactive rates – with a few new insights how to quickly improve code which uses lots of strings and complex data structures. Note that this is actually a follow-up to my post on strings, in which I said that the performance using strings is usually not too bad – that part was still sufficiently fast, but in another area, I had to improve the performance. So let’s investigate the core issues and how they can be solved.
In my case, I had a small graph with string nodes, and I had to map another graph partially onto it – 30.000 times. Caching and reusing some results would be complicated, so the goal was to improve the full calculation. The basic steps are:
- Gather ~ 20 subgraphs
- For each subgraph
- For each node
- check if the node is in the main graph
- retrieve some data
- small calculation
- For each node
- Gather the results for each subgraph
- Select one
Even though I won’t mention it throughout the text, I used a profiler to verify assumptions – which is necessary to avoid optimising the wrong parts. Looking at the processing steps, we can easily identify a few problems: Creating and destroying lots of small containers containing strings, and lot of comparison instructions, especially during retrieval, as I used a standard map. The profile also indicated that a lot of time was spent in the memory allocation and in string comparisons.
Optimisation, first part
As the first optimisation, I wanted to make sure I use proper data structures. In my case, I was mainly using sets and maps, with very few elements in the average case (~5-20). The sets turned out to be mostly static, so I simply replaced them with vectors – this is very easy, you just need to call
erase to make a set out of a vector. For the maps, I choose to use hash maps instead of the red-black tree maps.
The net result was good for the set optimisation, but now, a huge amount of time was spent in the string hash function. The main problem was that each lookup into a container would recompute the hash, even though the strings were immutable.
Eventually, I replaced the strings in one part of the graph by identifiers, and translated between them at the beginning of the processing. This resulted in a 10x improvement, so it became obvious that for best performance, I would have to replace all my string processing …
… which turned out to be fairly simple. I created a proxy structure with a single member – an integer – and a shared array of strings. Basically, I exploited the fact that my set of strings was limited. Each time a new string would be encountered, I would create a new slot for it, otherwise, I would simply return the id. Comparisons were now merely integer compares. Together with a very lightweight locking mechanism (a simple read-write lock, which would allow multiple readers to pass through if no writer was modifying the shared state), parallel execution and some other minor tweaks the final performance improved by 4x – fast enough.
Using strings is not inherently evil, and it also does not make future optimisations extremely difficult. Ideally, I would determine the set of strings up front in my case, and properly generate the identifiers instead of doing it on-the-fly, but even with this version (and the additional synchronisation) I could quickly improve the performance. Moreover, the code was already debugged, and for debugging, using strings instead of identifiers proved to be invaluable. It’s always a difficult decision, but in my experience, using strings first, and tuning if necessary usually pays off: You get faster development, and the optimisation can be often done rather easily.