Scalable geometry handling

At work, I often have to process large geometric data sets. Recently, I have overhauled my geometry pipeline so it is more scalable now and easier to use at the same time. When it comes to geometry data, I usually run into the following problems:

  • Data sets come in various formats: Meshes are usually stored as OBJ or PLY. Some data sets are procedurally generated though and have no natural storage format.
  • Data sets can be very large: The meshes have between a few thousand up to several hundred million triangles. The largest mesh I had to process has over 900 million triangles.
  • Slow algorithms: Some of the algorithms I have to run are rather slow; the easiest solution to speed them up is to run them in parallel. However, memory per thread is usually limited (1-2 GiB per thread), so the meshes need to processed in small chunks.

Given this problems, I've recently come up with a simple solution which resolves them elegantly: Geometry streams and mesh views.

Mesh views

Mesh views are a simple abstraction for geometry data. Instead of passing around the geometry, I use a simple interface which allows for efficient retrieval of per-vertex attributes and connectivity.

The tricky part here is how to make the interface efficient, as calling one virtual function per vertex will definitely result in bad performance. The way the mesh interface works is that the virtual functions only handle batch requests. The client must gather a bunch of requests first, which are then processed with a single function call. Usually, this ends up in a single pass through memory with some stride on the reading side into a tightly packed output buffer.

For non-indexed meshes, the index accessors simply return consecutive indices. This is potentially a waste, but it unifies the handling on the client side and if a non-indexed mesh is later converted into one with shared vertices, clients immediately benefit from reduced memory usage.

Constructing mesh views over various file and data formats is easy; allowing me to create a view onto a PLY file just as easily as onto an OBJ or some other custom in-memory representation.

Geometry streams

With mesh views, all geometric data is treated uniformly on the client side. What's missing is a geometry format which is compact, easy to create and consume, handles large data sets easily and allows for efficient processing.

What I ended up with is geometry streams: A geometry stream is a stream of geometry chunks, and each chunk is a block of geometry with additional per-chunk attributes. Each geometry chunk is completely independent of all others in a stream. There is no special header information which starts a stream; rather, a stream consists simply of chunks concatenated one after another.

This makes it trivial to process streams in parallel and merge the results. On all operating systems I care about, I can directly write individual chunks from multiple threads -- the final ordering inside the file will be undefined, but this hasn't been a problem so far. Removing or skipping chunks in a stream is also efficient.

For the on-disk format, I'm using a minimal header followed by a dictionary which stores per-chunk attributes. A dictionary makes versioning much simpler; instead of changing the file format, I just add new attributes. This is similar to how a protobuf implementation would work, but it's even easier as the keys are human-readable and can be easily inspected if necessary.

Geometry chunks can contain indexed meshes, but the most common use is just a list of triangles without shared vertices. This usually results in a slightly larger memory use than necessary, but makes processing trivial. Most of the time, the vertices are very small, containing only position and maybe normals, and by adding some compression, shared vertices are just compressed away. I'm a big fan of LZ4, an extremely fast compression method which gives reasonably good compression rates. For archival use, I simply use the high-compression mode, which further reduces memory usage but results in much slower processing. This can be easily offset though by processing each chunk in parallel.

Of course, I can construct a mesh view for each geometry chunk, so the fact that I'm using them is transparent to all clients. Both the mesh views and the geometry streams have dramatically changed how I process geometry data. Compared to loading an .OBJ, loading from a geometry stream is roughly 10x faster while using less disk space, and by converting all data into geometry streams, I can skip lots of checks in the various importers and move them to geometry stream filters.

Overall, I'm very happy with the results and I expect to build all my processing algorithms in the future to work on streams and view exclusively. Some stuff that I have to implement are re-indexing triangle soups to verify whether it helps for regular and procedural geometry and how to store and expose adjacency information.

Making your library easy to integrate

Let's assume you have been writing a small-to-medium library which solves some specific problem, and now you would like to publish it as open-source so other people can integrate it. Think of stuff like a new compression algorithm, some image processing, mesh loading, you name it. How can you make sure that your library gets reused by many people, and patches come flowing back?

Use a proper license

License the code as BSD, MIT/Boost, or place it in public domain. Any other license is an unnecessary hurdle to adoption. Putting your code under a license which requires a lawyer to be properly understood means many people in companies won't even take a look at it (they might not be even allowed to look at your code!)

BSD is probably the best choice, it's widely accepted and allows people to track back the code to you. If in doubt, I would use the BSD. Just make sure you don't use a copyleft license like GPL. While there are certainly areas where the GPL makes sense, small, reusable libraries are not that area -- putting your stuff under GPL means basically any company developer will ignore you. Not the best way to get many people to use your code!

Use a sane version control

It should go without saying that you use a version control system, but if you put a library online, make sure that patching and branching is easy. That basically means you need to use Mercurial or git, as all other version systems are either to unpopular or make branching and patching extremely hard -- Subversion is such a system, so keep away from it.

Ideally, you want to put it only somewhere together with at least a short description and, if possible, an issue tracker. Bitbucket, github, gitorious, Google Code are all popular choices with few differences in functionality between them.

The build system

I assume that your library is in C or C++, if it is in a scripting language like Python, just skip this paragraph. If you use C/C++, make sure that building your library is simple even without a build system in place. Ideally, just provide an "amalgamation" file like SQLite does, if that is not possible, at least make sure that throwing all your source files into a single project results in a working binary.

Some projects split themselves into multiple libraries, with heavy-duty dependencies, and use extremely complex build-scripting to get built. An example for this is the Alembic library, which requires the ILM libraries and the HDF5 library. The easiest way to build such a library is not to use the provided build files, but just slapping everything together into one project and be done with it (all of the ILM files can be easily compiled into a single static library, same for all of Alembic, and you're done.)

If your library is indeed more complex, as you are doing compile-time configuration/generation of files, provide a version with all generated files. If someone just wants to build your library, that will be the easiest solution. If that isn't possible, then your library is probably to large or complicated, and not easy to reuse anyway. Consider releasing sub-parts of your library in this case.

There's nothing wrong with providing a bunch of makefiles as well, but don't assume people will be using them. If you provide makefiles, it should consider to also provide source files for CMake as well, which is a pretty standard system to get project files generated for a wide range of IDEs.

The code

Your code should be reasonably portable and clean (see also this great blog post on how to clean bad code). In general, it should have few dependencies, and compile without warnings at the highest warning level. Assume that the users will always compile at the highest level and fix them. Ideally, you should also run some static analysis over the code to make sure you don't have obvious null-pointer dereferences and other stuff. Make sure that you don't include dead code in your library, that function that was useful five revisions back should live in the repository history, and nowhere else. This reduces the amount of code that a user has to port, if he decides to use your code on a new platform.

Portability is also very important. Try to make sure that your code works fine on the major compilers (MSVC for Windows, GCC for Linux, Clang for Mac OS X). The easiest thing to do is probably to stick with simple C (C89) or simple C++ without fancy meta-programming. If you have dependencies, make them few, and try to use the most common libraries. Boost is usually overkill, but if you really need threads, it still the next-best choice instead of writing your own wrappers.

One important part is I/O. If your library does some I/O, make sure the user can provide his own I/O routines. Often enough, I want to load some data from memory, or from a socket, or some other non-standard source. RPly is for instance a very nice library for reading PLY files, unfortunately, it assume your file is readable by fopen. A good example of how to do it is the libpng, which makes it trivial to hook in your own input/output routines.

The patch handling

And here's the most important part. Assuming your have a nice, easy-to-integrate library, you can expect that a few users will come back to you with patches. Make sure that submitting patches is easy, for instance, by providing a contact address in every source file or by adding some prominent link to the mailing list to your project description. If someone submits a patch, give timely feedback; it's ok to reject patches but explain why.

It's also nice if there is some simple set of tests which can be used to verify the correctness of the code, but the most important thing is to get usable patches. Once accepted, try to actually commit and push the patch out quickly as well. This makes it easier for people who keep their own private fork.

What's also nice is to have some kind of road-map so everyone can understand if you are planning big changes to your library or whether the code is in maintenance-only mode. If you are in maintenance-only -- which you should try to be, after all, your library is small and solves a particular problem -- users will be more willing to perform deeper integration on their side.

Open source projects waiting to be implemented

There's a bunch of things out there, which should be implemented as an open-source project for the good of the (developer) community. This is of course a biased, personal list, but if some young developer out there is looking for an interesting project idea, feel free to pick one from the list below:

Minimal, fast, portable mail client

A portable, minimal, easy-to-extend mail client, which uses a proper database for all storage. Thunderbird is a good client, no doubt, but having a simple client written in for instance Python + Qt, with sqlite as the backend, should make it faster than Thunderbird, more robust, and easy to synchronise across multiple machines.

Going 100% Qt and C++ might be an option as well, but my gut feeling is that PyQt should be fast enough for the job. Having a database backend for storage is the crucial thing here, to ensure that everything is robust and fast. mbox just doesn't cut the mustard any more when your inbox has several thousand e-mails, especially when it comes to backup.

Clang-based documentation generator for C++

We have Clang, but we're mostly still stuck with Doxygen for documenting C++. While Doxygen can be used to generate good documentation, it has lots of shortcomings as it has to guess the meaning of every single statement.

A proper documentation system requires accurate information, and Clang seems like the perfect solution for this. Together with a database backend, this would allow much higher quality documentation and potentially for a docutils-style mixing of code and documentation, where the code is tested while the documentation is built.

The documentation generator should also not contain output-related stuff. I've been working on the robin Doxygen to Sphinx bridge, which generates easy-to-integrate documentation in restructured text from Doxygen XML. Ideally, the next-gen documentation tool will directly produce structured and interlinked documentation fragments, which are easy to convert into whatever representation you need. The focus should be on the data though, and the presentation should be independent (just take a look at the Doxygen XML to see how presentation and content can get intermingled if the data representation is only a second class citizen.)

Robust meta-shading language for OpenGL/DirectX

OpenGL and DirectX come both with their own shading language, which is not quite identical but extremely comparable otherwise. There is Cg, which is a unified language targeting both OpenGL and DirectX, which unfortunately doesn't get updated to often and is a "black-box", and there are some HLSL/GLSL converters like HLSL2GLSL or ANGLE. So far, there is no open alternative, which unifies GLSL and HLSL into a single front-end and generates GLSL/HLSL at the backend.

Having a meta-shading language and the toolchain in an easy-to-hack format (ideally, some scripting language) would make the whole business much simpler. The key point is that it must be easy to hack, so integrating new features or support for new targets is simple. The output should be simply GLSL/HLSL code, so it can be trivially inspected and further optimized if necessary.

So much for this time, hopefully someone out there finds this list helpful and writes a kick-ass project :)

Canonical include files for frameworks & libraries

My research framework has gotten quite big over the years; the version my clients use at the moment comes with over 400 header files containing nearly 1000 classes. All of this is organized into several libraries (Core, Engine, Image, etc.)

So far, I used one folder for each library, so the user would have to #include "Core/inc/io/Stream.h" or #include "Image/inc/PngDecoder.h". The main problem with this is that I directly exposed the header structure I used for development for the users; if I renamed a file or folder, all users would have to immediately adjust their code in the next release. Second, it's difficult to discover where a file lives; if I need the AxisAlignedBoundingBox3, is it in Core or in Engine?

The solution is surprisingly simple and similar to what Qt does or WinRT: Providing "canonical" headers which are stable and easy to discover. While deploying the SDK, I generate an additional set of headers to the users, witch merely redirects to the correct header. Additionally, I flatten the directory structure so all generated headers are in one folder and look like this: "niven.Core.IO.Stream.h", or "niven.Image.PngEncoder.h". The fun fact here is that Visual Studio uses string-matching when you type #include, so it's enough to type #include "PngE to get to the "niven.Image.PngEncoder.h" header file. As usual, there ain't no problem that an additional layer of indirection can't solve :) It would be better if Visual Studio would do a fuzzy search similar to Sublime Text, but just having any search is already a large improvement.

Having now the infrastructure in place to generate those headers, there's a bunch of cool stuff that is easy to do now:

  • Add #pragma once to each header: Trivial, and saves some compile time for the clients.
  • Deprecate headers and provide a message where/why something is deprecated.
  • Providing "bundle" headers, for instance, "niven.Core.IO.h" can now trivially include everything related to IO, without having to write and maintain the header manually.
  • Exclude detail and implementation headers: Throughout my research framework, there's a bunch of auto-generated detail and implementation headers, which are not designed to be consumed directly but must be shipped as other headers use them. Those are now completely excluded from the generated set, so they are not accidentally used.

Right now, there are only two downsides to this approach: Right-clicking and opening sends you to the redirection header, so you need more clicking; this can be easily resolved with an IDE macro but that needs to be deployed as well then. The second issue is that I don't have the super-fast-include-goodness when developing the library itself, as they are generated only during deployment. I have a few ideas how to hook this up as a pre-process step -- I'm definitely not afraid of generating more code & files during build -- but making this clean and keeping the advantages requires some careful thinking.

Having this deployed now, I was really surprised how big the improvement actually is: Typing all required includes is really super-fast now; it's much easier to find the required functionality and finally it's also trivial to see which includes are related to my library and which are system includes. The user response was really positive and was quickly adopted for new code.

Overall, a huge win for one-and-a-half hours of effort to get the generation & packaging going!

Minimal setup screen space quads: No buffers/layouts required

A neat trick that Rory Driscollmentioned on Twitter is to use vertex IDs to generate screen space quads. I just recently came around to use that in my stuff, so here's some ready-to-use code. The advantage of using the vertex id is that you don't need a vertex layout, vertex buffer or index buffer.

Here's the complete vertex shader:

float4 VS_main (uint id : SV_VertexID) : SV_Position
    switch (id) {
    case 0: return float4 (-1, 1, 0, 1);
    case 1: return float4 (-1, -1, 0, 1);
    case 2: return float4 (1, 1, 0, 1);
    case 3: return float4 (1, -1, 0, 1);
    default: return float4 (0, 0, 0, 0);

All you need for rendering now is to issue a single draw call which renders a triangle strip containing four vertices, and you're done. Note: This quad is for RH rendering, if your culling order is reversed, you'll need to swap the second and third vertex. In the pixel shader, you can now use SV_Position, or alternatively, you can also generate UV coordinates in the vertex shader.

[Update] As noted in the comments by Martins, there is also a way to do it with a single triangle that spans the whole screen. Here's the code for that, again for a RH system:

// Without the explicit casts, this does not compile correctly using the
// D3D Compiler (June 2010 SDK)
float x = float ((id & 2) << 1) - 1.0;
float y = 1.0 - float ((id & 1) << 2);

return float4 (x, y, 0, 1);

There's a weird compiler bug that requires the explicit float cast, otherwise, the D3D compiler from the June 2010 SDK fails to compile this code.