Skip to main content

Codegen for fast Vulkan

Codegen for fast Vulkan

If you're using Vulkan, you might have come across this this document explaining how to get the best calling performance. The gist is that instead of using the entry points provided by the loader, you should query the real entry points using vkGetDeviceProcAddr and use those instead. This can yield significant performance gains when CPU limited, as it avoids an indirection through the loader. Querying all entry points doesn't sound too bad in theory. The problem is there's quite a few of them, so instead of typing this up manually, let's use some code generation to solve the problem!

Where to start?

If we want to auto-generate things, we need to find something machine readable first which we can parse and then use as the data source. Fortunately, the Vulkan specification is also available as an Xml file, as part of the normal repository. Let's grab the vk.xml and see what we can do with that! Now this looks quite promising: We see all types in there as well as all entry points. I'm going to use Python for the script we're about to write, and if you see something like ./types/type, that's XPath syntax to specify the path to the element(s) we're looking at. If you've never used XPath before, don't worry, we'll use very simple XPath only!

Our task is find a all functions that can be loaded using vkGetDeviceProcAddr, stuff them into a structure, and provide some method to query them off the device. Easy enough, let's type up some example code so we know how our result is supposed to look like:

#ifndef VK_DIRECT_4E2E4399D9394222B329DDA74C76DD869EC8B8359E3626DD5706CDEE595FCB2C
#define VK_DIRECT_4E2E4399D9394222B329DDA74C76DD869EC8B8359E3626DD5706CDEE595FCB2C 1

#include <vulkan/vulkan.h>

struct VkDirect
    using FT_vkAllocateMemory = VkResult (VkDevice device, const VkMemoryAllocateInfo* pAllocateInfo, const VkAllocationCallbacks* pAllocator, VkDeviceMemory* pMemory);
    FT_vkAllocateMemory* vkAllocateMemory = nullptr;

    using FT_vkFreeMemory = void (VkDevice device, VkDeviceMemory memory, const VkAllocationCallbacks* pAllocator);
    FT_vkFreeMemory* vkFreeMemory = nullptr;

    // many more functions here

    void Bind (VkDevice device)
        vkAllocateMemory = (FT_vkAllocateMemory*)vkGetDeviceProcAddr (device, "vkAllocateMemory");
        vkFreeMemory = (FT_vkFreeMemory*)vkGetDeviceProcAddr (device, "vkFreeMemory");

        // many more functions here


We see that we need a couple of things to succeed:

  • The functions which can be queried
  • The function signatures

Let's get started with getting the functions!

Getting the types

We want to use vkGetDeviceProcAddr, and according to its documentation, this function is only valid for specific types. Quoting the specification here:

The function pointer must only be called with a dispatchable object (the first parameter) that is device or a child of device.

All right, so we need to find all handle types which are somehow derived from VkDevice. Looking at the Xml, we can see this bit:

<type category="handle" parent="VkDevice"><type>VK_DEFINE_HANDLE</type>(<name>VkQueue</name>)</type>
<type category="handle" parent="VkCommandPool"><type>VK_DEFINE_HANDLE</type>(<name>VkCommandBuffer</name>)</type>

That's quite close to what we want. We note that the name is the handle name, and then we can check the parent until we arrive at VkDevice. If VkDevice is a parent or the type itself is VkDevice, then the type matches our definition and should be included.

Unfortunately, there are two problems: The parents are not necessarily in order in the Xml (so we can't link while we parse), and some objects have multiple parents. Finally, there are also some alias types which don't have a parent at all! To solve this, we're going to build a dictionary of the type and the set of its parents; and at the end we're going to walk the parents recursively for every type. If any of the parents ends up being equal to VkDevice, we have a winner! Let's start typing:

def FindDeviceDispatchableTypes (tree):
    # We search for all types where the category = handle
    handleTypes = tree.findall ('./types/type[@category="handle"]')

    # Ordered dict for determinism
    typeParents = OrderedDict ()

    # for each handle type, we will store the type as the key, and the set of
    # the parents as the value
    for handleType in handleTypes:
        # if it's an alias, we just duplicate
        if 'alias' in handleType.attrib:
            name = handleType.get ('name')
            alias = handleType.get ('alias')

            # This assumes aliases come after the actual type,
            # which is true for vk.xml
            typeParents [name] = typeParents [alias]
            name = handleType.find ('name').text
            parent = handleType.get ('parent')

            # There can be more than one parent
            if parent:
                typeParents [name] = set (parent.split (','))
                typeParents [name] = set ()

    def IsVkDeviceOrDerivedFromVkDevice (handleType, typeParents):
        if handleType == 'VkDevice':
            return True
            parents = typeParents [handleType]
            if parents is None:
                return False
                # If we derive from VkDevice through any path, we're set
                return any ([IsVkDeviceOrDerivedFromVkDevice (parent, typeParents) for parent in parents])

    deviceTypes = {t for t in typeParents.keys () if IsVkDeviceOrDerivedFromVkDevice (t, typeParents)}

    return deviceTypes

We now have the set of handle types. The next step is finding the functions using those.

Device functions

Find the functions could be really complicated if the dispatchable type could be everywhere, as we'd have to check all parameters then. Fortunately, Vulkan specifies that the dispatchable type always comes as the first argument, so we only have to check the first parameter, and if it's in the set we just computed, we're done. We're going to iterate over all ./commands/command entries -- those are the entry points. These look as following:

    <proto><type>VkResult</type> <name>vkAllocateMemory</name></proto>
    <param><type>VkDevice</type> <name>device</name></param>
    <param>const <type>VkMemoryAllocateInfo</type>* <name>pAllocateInfo</name></param>
    <param optional="true">const <type>VkAllocationCallbacks</type>* <name>pAllocator</name></param>
    <param><type>VkDeviceMemory</type>* <name>pMemory</name></param>

We can ignore most of that. What we need is the proto element, which contains the return type and the name, and then the first param element. To build the signature, we also have to flatten the parameters back into plain text. Everything else can be ignored. Let's wrap this into a function which returns the parsed data in an easy-to-digest list of dictionaries:

def FindAllDeviceFunctions (tree, deviceTypes):
    functions = []

    for command in tree.findall ('./commands/command'):
        parameters = command.findall ('param')
        if parameters:
            firstParameter = parameters [0]
            if firstParameter.find ('type').text in deviceTypes:
                function = {
                    'return_type' : command.find ('proto/type').text,
                    'name' : command.find ('proto/name').text,
                    'parameters' : []

                for parameter in parameters:
                    # This flattens ``<param>const <type>T</type> <name>N</name></param>``
                    # to ``const T N``
                    function ['parameters'].append (''.join (parameter.itertext ()))

                functions.append (function)

    return functions

You'd might think that's all we need to stamp them out, but there's one more thing we need to look at before we get going.

Handling #ifdef

If we just dump everything, we'll find out that it compiles fine on Windows (at least for 1.0.69), but on Linux, some entry points are not defined. Turns out, there's quite a few things protected by a platform #define. What we're going to do is to find all those entry points, and wrap them into an #ifdef block.

To find the protected bits, we have to look at the ./extensions. The way this they are structured is as following:

  • /extensions/extension[@protect] -- Each extension with protection has the protect attribute (which is selected using [@protect])
  • Extensions specify entry points in ./require/command

For example, here's one of those protected extensions:

<extension name="VK_KHR_external_memory_win32" number="74" type="device" requires="VK_KHR_external_memory" author="KHR" contact="James Jones @cubanismo" protect="VK_USE_PLATFORM_WIN32_KHR" supported="vulkan">
        <!-- various fields omitted -->
        <command name="vkGetMemoryWin32HandleKHR"/>
        <command name="vkGetMemoryWin32HandlePropertiesKHR"/>

We'll just iterate over all extensions which have some protection, and then invert the index so we're storing the function name as the key, and the protections as the value:

def GetFunctionProtection (tree):
    extensions = tree.findall (f'./extensions/extension[@protect]')

    result = {}

    for extension in extensions:
        protection = extension.get ('protect').split (',')
        for command in extension.findall ('./require/command[@name]'):
            result [command.get ('name')] = protection

    return result

Combining it all

Now we got everything in place, and the only remaining bit is to generate the code. We just iterate over the functions, create the type definitions and fields first. Then we iterate a second time to fill out the bind method. As a bonus, we take the file pointer to write into so we can redirect easily into a file:

def GenerateHeader (tree, functions, protection, outputStream):
    import hashlib
    def Write (s=''):
        print (s, file=outputStream)

    # Same tree will always result in the same hash
    includeUuid = hashlib.sha256(ElementTree.tostring (tree)).hexdigest().upper ()

    Write (f'#ifndef VK_DIRECT_{includeUuid}')
    Write (f'#define VK_DIRECT_{includeUuid} 1')
    Write ()
    Write ('#include <vulkan/vulkan.h>')
    Write ()

    Write ('struct VkDirect')
    Write ('{')

    def UnpackFunction (function):
        return (function ['name'], function ['return_type'], function ['parameters'])

    for function in functions:
        name, return_type, parameters = UnpackFunction (function)

        if name == 'vkGetDeviceProcAddr':

        protect = protection.get (name, None)

        if protect:
            Write (f'#ifdef {" && ".join (protect)}')

        Write (f'\tusing FT_{name} = {return_type} ({", ".join (parameters)});')
        Write (f'\tFT_{name}* {name} = nullptr;')
        if protect:
            Write ('#endif')
        Write ()

    Write ('\tvoid Bind (VkDevice device)')
    Write ('\t{')
    for function in functions:
        name, return_type, parameters = UnpackFunction (function)

        if name == 'vkGetDeviceProcAddr':

        protect = protection.get (name, None)

        if protect:
            Write (f'#ifdef {" && ".join (protect)}')

        Write (f'\t\t{name} = (FT_{name}*)vkGetDeviceProcAddr (device, "{name}");')
        if protect:
            Write ('#endif')

    Write ('\t}')
    Write ('};')
    Write ()
    Write ('#endif')

... and that's it for today. You can find the whole script here -- enjoy!

Dependency management 2018

Dependency management, 2018 edition

It's 2018 and C++ dependency management still remains a hot topic -- or an unresolved problem, depending on your point of view. I'm not sure any more it's actually "generally" fixable in the sense there's a one-size-fits-all solution, as C++ projects have some unique requirements. For instance, people like setting compiler options for individual packages, they like linking across compiler & language barriers (mostly to C libraries, but there's also stuff like ISPC), and all this dependency business needs to get integrated into the build system as well.

Given the C++ world couldn't agree on a build system yet, or an ABI, I think we'll be stuck with hand-rolled dependency management for the forseeable future. Is this a bad thing? Yes and no. Yes, because this means everyone has to solve it somehow, but no, as the solutions don't have to be huge and complex. Which brings me to today's topic: I've redone my dependency handling once more -- the third time -- and I think I've finally arrived at some local optimum. To understand how it can be optimal for my problem case, we need to understand the problem I'm facing first, so let's start with that.

C++ framework with dependencies

In my spare time, I'm hacking on a C++ framework since over ten years, and that framework has grown to decent size. Part of the value add from using my framework is the fact that it bundles up a lot of external libraries under one uniform interface. For instance, libjpeg, libpng, libtiff, are all nicely abstracted away so I can just load "any image" and under the hood the right thing happens. For many libraries, I ended up importing them into the source tree and building them as part of the compilation. That is, libpng gets built with the same compiler options as everything else, with the same compiler, which has lots of benefits when it comes to debugging and sanitizers.

At the same time, I got a couple of dependencies which are very rarely updated: Boost, primarily, but also (on Windows) OpenSSL, and ISPC which is a build-time dependency. I'm using CMake, so I have find modules for all of them, but it's still a pain to set up a dozen paths, more so as there are a few corner cases. For instance, for Boost & Clang, it's not enough to just set the directories, but you also need to specify a compiler suffix. Long story short, I compile them only once in a while, and store them on my server. That works, but it has a couple of downsides: What version did I compile against? Is the build compatible or not with the latest compiler? If I want to travel and take my engine with me, do I really need to copy those 10000 files?

Now you might say, well, just commit the binaries to the source repository, and be done with it. That's a reasonable approach and given I'm using Mercurial with largefiles, that wouldn't cause huge problems in terms of checkout size. The problem is not the storage, really.

The problem is that by tying your dependencies into your source control, you loose a lot of flexibility and comfort. The first problem you'll notice is that a version control system is not designed to check out subsets of the repository tree. If I'm on Linux, there's no use checking out the Windows binaries, so I need partial checkouts or dedicated repositories per OS. This doesn't scale, as I might need separate dependencies per compiler, per compiler settings, etc.

The second big problem is updating dependencies: Ok, so I have my engine compiled against Boost 1.65, and now Boost 1.66 comes out. How can I switch between those? It's not so simple if I have to do this in the source tree, especially if I want to try it on separate machines. Do I really want to add this to the repository history just for a test? Especially as it will require a new branch, and if it works, a merge of 10000 files with binaries? Sure, that works, but that's not what a version control system was designed for.

Finally, there's also a huge loss in comfort: Version control is optimized for source code, there's usually no provisioning to have something like a file vault for some files, there's no user friendly progress reporting when getting very large files, and generally you're working in a corner case when you throw in tons of binaries into your source tree. Let's say you want to store your dependencies in ceph -- where do you even start?

Solving it externally

There's no surprise that the solution to this is a customized package management solution. This is quite common in the big projects I saw -- and there are various presentations and blog posts about this. I can highly recommend looking at the EA's package management or other C++ package managers like Conan or Spack. All of them try to solve the problem of not having your dependencies directly in the source tree, with various levels of complexity.

For my own needs, I had a couple of goals:

  • A simple dependency specification -- similar to the pip requirements.txt
  • The ability to copy the whole repository around. This means it has to be few files (preferably archives) and no database attached.
  • The whole thing should integrate directly into CMake, i.e. generate a file I can include into CMake to find all my dependencies.

Turns out, that's not that hard to roll yourself. The way I structured this is as following:

  • A repository provides packages.
  • Each package contains releases, which all have a version number.
  • Each release contains builds: A particular release, built with a particular compiler.
  • A separate package definition repository provides package definitions. These describe packages -- more on this below.

This means that for some dependencies, you need to introduce proper version numbers, as OpenSSL for instance is notorious for using letters to denote releases. I've opted into semantic versioning, so all dependencies have a three component version number (and in fact OpenSSL is the only dependency which requires some care there.) For each package, the package definition repository stores additional metadata about a package. This includes the package type -- whether it's a tool, a library, but also things like which ABI it's using.

That would have been nearly enough if not for the problem mentioned at the beginning with Clang -- which conditionally needs some extra setting. I've solved this by implementing conditions, which are available in two places (for now): One is the CMake build integration description, which tells the package manager what variables need to be set up for CMake. The other place is the dependency definition itself, where I can conditionally create dependencies based on platform, compiler, and so on. A sample package definition can be seen below:

<PackageDefinition Name="boost" Type="library" ABI="C++">
        <CMake Key="BOOST_INCLUDEDIR" Value="$(Path)/include"/>
        <CMake Key="BOOST_LIBRARYDIR" Value="$(Path)/lib"/>
        <CMake Key="Boost_COMPILER" Value="-clang40" If="IsClang4"/>
        <Condition Name="IsClang4">
            <IsCompiler Name="Clang" Version="&gt;=4,&lt;5" />

All of the data is stored in a bunch of Xml files -- the repository is trivial to hand-edit when a new package comes in, the definition files practically never change, and the builds are just .xz archives. Moving the whole repository around means copying a folder, so that solves the simple to take with you requirement.

The dependency definition is a really simple Xml file:

    <Dependency Name="boost" Version=">=1.65" />
    <Dependency Name="ispc" Version="1.9" />
    <Dependency Name="openssl" Version="1.1" If="IsWindows" />
    <Condition Name="IsWindows">
        <IsOperatingSystem Name="Windows"/>

How big did this thing end up being? Roughly a thousand lines of Python -- so it's a rather small tool, and easy enough to hack on. You might wonder how I'm ok with yet another dependency to get the package manager, but it turns out that's a much simpler problem to solve. It's just some python, so I could sub-module this into my main repository, or build an executable with pynstaller, or just fetch and run it as part of my build script. That's really a very minor nuisance, as the code is -- and always will be -- tiny relative to the dependencies themselves.

Summing it up

If you were hoping for the one solution solving all C++ dependency problems, you'll leave disappointed. Sorry about that! However, if you were always wondering how to roll your own solution, I hope this post brought you some inspiration. It's really not that much work once you nail down your requirements -- and it's definitely worth doing. Note that I've omitted a bunch of problems, like building the dependencies themselves. That something which I envision could be solved long-term through the package definition, but it hasn't been enough of a problem yet to warrant writing it up. Other than that, I've yet to find some case where this breaks down. The next big step for me is integrating this with CMake 3.11's FetchContent functionality which should enable building everything with a single checkout and just using CMake, without any separate build scripts. But that's a topic for another blog post, most likely ...

From Google Test to Catch

If you ever met me, you'll probably know that I'm a big believer in automated testing. Even for small projects, I tend to implement some testing early on, and for large projects I consider testing an absolute necessity. I could ramble on for quite a while as to why tests are important and you should be doing them, but that's not the topic for today. Instead, I'm going to cover why I moved all of my unit tests from Google Test -- the previous test framework I used -- to Catch, and shed some light on how I did this as well. Before we start with the present, let's take a look back at how I arrived at Google Test and why I wanted to change something in the first place.

A brief history

Many many moons ago this blog post got me interested into unit testing. Given I had no experience whatsoever, and as UnitTest++ looked as good as any other framework, I wrote my initial tests using that. This was sometime around 2008. In 2010, I was getting a bit frustrated with UnitTest++ as development wasn't exactly going strong there, I was hoping for more test macros for things like string comparison, and so on. Long story short, I ended up porting all my tests to Google Test.

Back in the day, Google Test was developed on Google Code, and releases did happen regularly but not too often. Which was rather good as bundling Google Test into a single file required running a separate tool (and it still does.) I ended up using Google Test for all of my tests -- roughly 3000 of them total, with a bunch of fixtures. While developing, I run the unit tests on every build, so I also wrote a custom reporter so my console output would look like this:

SUCCESS (11 tests, 0 ms)
SUCCESS (1 tests, 0 ms)
SUCCESS (23 tests 1 ms)

You might wonder why the time is logged there as well: Given the tests were run on every single compilation, they better ran fast, so I always had my eye on the test times, and if something started to go slow, I could move it into a separate test suite.

Over the years, this served me well, but there were a few gripes with Google Test. First of all, it was clear this project was developed by and for Google, so the direction they were going -- death tests, etc. -- was not exactly making my life simpler. At the same time, a new framework appeared on my radar: Catch.

Enter Catch

Why Catch, you may ask? For me, mostly for two reasons:

  • Simple setup -- it's always just a single header, no manual combining needed.
  • No fixtures!
  • More expressive matchers.

The first reason should be obvious, but let me elaborate on the second one. The way Catch solves the "fixture problem" is by having sections in your code which contain the test code, and everything before that is executed once per section. Here's a small appetizer:

TEST_CASE("DateTime", "[core]")
    const DateTime dt (1969, 7, 20, 20, 17, 40, 42, DateTimeReference::Utc);

        CHECK (dt.GetYear () == 1969);

        CHECK (dt.GetMonth () == 7);

    // And so on

This, together with nicer matchers -- no more ASSERT_EQ macros, instead, you can use a normal comparison, was enough to convince me of Catch. Now I needed a couple of things, though:

  • Port a couple of thousand tests, with tens of thousands of test macros from Google Test to Catch.
  • Implement a custom reporter for Catch.


As I'm a rather lazy person, and because the tests are super-uniform in format, I decided to semi-automate the conversion from Google Test to Catch. It's probably possible to make a perfect automated tool, at least for the assertions, by building it on Clang and rewriting things, but I figured if I get 80% or so done automatically that should be still fine. On top of that, I'm porting tests, so I can easily validate if the conversion worked (as the tests still should pass.) The script is not super interesting, it does a lot of regular expression matching on the macros and then hopes for the best. While it's probably going to explode when used in anger, it still converted the vast majority of the tests in my code. In total, it took me less than a day of typing to finish porting all my tests over.

Before you ask why I'm not porting to some other framework like doctest which is supposed to be faster: In my testing, Catch is fast enough to the point that the test overhead doesn't matter. I can easily execute 20000 assertions in less than 10 milliseconds, so "faster" is not really an argument at this point.

What is interesting though is that there was a significant reduction in lines of code by moving over to Catch, most of which came from the fact that fixtures were gone, and some more code now used the SECTION macros and I could merge common code. Previously, I would often end up duplicating some small setup because it was still less typing than writing a fixture. Witch Catch, this is so simple that I ended up cleaning my tests voluntarily. To give you some idea, this is the commit for the core library: 114 files changed, 6717 insertions(+), 6885 deletions(-) (or -3%). For my geometry library, which has more setup code, the relative reduction was quite a bit higher: 36 files changed, 2342 insertions(+), 2478 deletions(-) -- 5%. A couple of percent here and there might not seem too significant, but they directly translate into improved readability due to less boilerplate.

There are a few corner cases where Catch just behaves differently from Google Test. Notably, a EXPECT_FLOAT_EQ with 0 needs to be translated into CHECK (a == Approx (0).margin (some_eps)) as Catch by default uses a relative epsilon, which becomes 0 when comparing to 0. The other one affects STREQ -- in Catch, you need to use a matcher for this, which turns the whole test into CHECK_THAT (str, Catch::Equals ("Expected str"));. The script wil try to translate that properly but be aware that those are the cases which are most likely to fail.

Terse reporter

The last missing bit is the terse reporter. This got changed again for Catch2, which is the current stable release. The reporter is part of a catch-main.cpp which I compile into a static library, which then gets linked into the test executable. The terse reporter is straightforward:

namespace Catch {
class TerseReporter : public StreamingReporterBase<TerseReporter>
    TerseReporter (ReporterConfig const& _config)
        : StreamingReporterBase (_config)

    static std::string getDescription ()
        return "Terse output";

    virtual void assertionStarting (AssertionInfo const&) {}
    virtual bool assertionEnded (AssertionStats const& stats) {
        if (!stats.assertionResult.succeeded ()) {
            const auto location = stats.assertionResult.getSourceInfo ();
            std::cout << location.file << "(" << location.line << ") error\n"
                << "\t";

            switch (stats.assertionResult.getResultType ()) {
            case ResultWas::DidntThrowException:
                std::cout << "Expected exception was not thrown";

            case ResultWas::ExpressionFailed:
                std::cout << "Expression is not true: " << stats.assertionResult.getExpandedExpression ();

            case ResultWas::Exception:
                std::cout << "Unexpected exception";

                std::cout << "Test failed";

            std::cout << std::endl;

        return true;

    void sectionStarting (const SectionInfo& info) override

        StreamingReporterBase::sectionStarting (info);

    void sectionEnded (const SectionStats& stats) override
        if (--sectionNesting_ == 0) {
            totalDuration_ += stats.durationInSeconds;

        StreamingReporterBase::sectionEnded (stats);

    void testRunEnded (const TestRunStats& stats) override
        if (stats.totals.assertions.allPassed ()) {
            std::cout << "SUCCESS (" << () << " tests, "
                << () << " assertions, "
                << static_cast<int> (totalDuration_ * 1000) << " ms)";
        } else {
            std::cout << "FAILURE (" << stats.totals.assertions.failed << " out of "
                << () << " failed, "
                << static_cast<int> (totalDuration_ * 1000) << " ms)";

        std::cout << std::endl;

        StreamingReporterBase::testRunEnded (stats);

    int sectionNesting_ = 0;
    double totalDuration_ = 0;

CATCH_REGISTER_REPORTER ("terse", TerseReporter)

To select it, run the tests with -r terse, which will pick up the reporter. This will produce output like this:

SUCCESS (11 tests, 18 assertions, 0 ms)
SUCCESS (1 tests, 2 assertions, 0 ms)
SUCCESS (23 tests, 283 assertions, 1 ms)

As an added bonus, it also shows the number of test macros executed. This is mostly helpful to identify tests running through some long loops.


Was the porting worth it? Having spent some time with the new Catch tests, and after writing some more tests in it, I'm still convinced it was worth it. Catch is really simple to integrate, the tests are terse and readable, and neither compile time nor runtime performance ended up being an issue for me. 10/10 would use again!

GraphQL in the GPU database

One thing I've been asked about is providing some kind of API access to the GPU database I'm running. I've been putting this off for most of the year, but over the last couple of days, I gave it yet another try. Previously, my goal was to provide a "classic" REST API, which would provide various endpoints like /card, /asic etc. where you could query a single object and get back some JSON describing it.

This is certainly no monumental task, but it never felt like the right thing to do. Mostly because I don't really know what people actually want to query, but also because it means I need to somehow version the API, provide tons of new routes, and then translate rather complex objects into JSON. Surely there must be some better way in 2017 to query structured data, no?


Turns out, there is, and it's called GraphQL. GraphQL is a query language where the user specifies the shape of the data needed, and the system then builds up tailor-made JSON. On top of that, introspection is also well defined so you can discover what fields are exposed by the endpoint. Finally, it provides a single end-point for everything, making it really easy to extend.

I've implemented a basic GraphQL endpoint which you can use to query the database. It does not expose all information, but provides access to hopefully the most frequently used data. I'm not exposing everything mostly due to the lack of pagination. If you use the allCards query, you can practically join large parts of the database together, and I don't want enable undue load on the server. As a small appetizer, here's a sample query executed locally through GraphiQL.


Using GraphiQL to query the GPU database programmatically.

If you want to see more data exported, please drop me a line, either by sending me an email or by getting in tough through Twitter.


What did I have to implement? Not that much, but at the same time, more than expected. The GPU database is built using Django, and fortunately there's a plugin for Django to expose GraphQL called graphene-django which in turn uses Graphene as the actual backend.

Unfortunately, Graphene and in particular, Graphene-Django is not as well documented as I was hoping for. There's quite a bit of magic happening where you just specify a model and it tries to map all fields, but those won't be documented. I ended up exposing things manually by restricting the fields I want using only_fields, and then writing at least a definition for each field, occasionally with a custom resolve function. For instance, here's a small excerpt from the Card class:

class CardType (DjangoObjectType):
    class Meta:
        model = Card
        name = "Card"
        description = 'A single card'
        interfaces = (graphene.Node, )
        only_fields = ['name', 'releaseDate'] # More fields omitted

    aluCount = graphene.Int (description = "Number of active ALU on this card.")
    computeUnitCount = graphene.Int (description = "Number of active compute units on this card.")

    powerConnectors = graphene.List (PowerConnectorType,
        description = "Power connectors")

    def resolve_powerConnectors(self, info, **kwargs):
        return [PowerConnectorType (c.count,, c.connector.power) for c in self.cardpowerconnector_set.all()]

    # more fields and accessors omitted

Here's also an interesting bit. The connection between a card and its power or display connector is a ManyToManyField, complete with custom data on it. Here's the underlying code for the link:

class CardDisplayConnector(models.Model):
    """Map from card to display connector.
    connector = models.ForeignKey(DisplayConnector, on_delete=models.CASCADE)
    card = models.ForeignKey(Card, on_delete=models.CASCADE, db_index=True)
    count = models.IntegerField (default=1)

    def __str__(self):
        return '{}x {}'.format (self.count, self.connector)

In the card class, there's a field like this:

displayConnectors = models.ManyToManyField(DisplayConnector, through='CardDisplayConnector')

Now the problem is how to pre-fetch the whole thing, as otherwise iterating through the cards will issue a query to fetch the display connectors, and then one more query per connector to get the data related to this connector… which led to a rather lengthy quest to figure out how to optimize this.

Optimizing many-to-many prefetching with Django

The end goal we want is that we perform a single Card.objects.all() query which somehow pre-fetches the display connectors (equivalently, the power connectors, but I'll keep using the display connectors for the explanation.) We can't use select_related though as this is only designed for foreign keys. The documentation hints at prefetch_related but it's trickier than it seems. If we just use prefetch_related ('displayConnectors'), this will not prefetch what we want. What we want to prefetch is the actual relationship, and from there on select_related the connector. Turns out, we can use the Prefetch to achieve this. What we're going to do is to prefetch the set storing the relationship (which is called carddisplayconnector_set), and provide the explicit query set to use which can then specify the select_related data. Sounds complicated? Here's the actual query:

return Card.objects.select_related ().prefetch_related(
    Prefetch ('carddisplayconnector_set',
        queryset=CardDisplayConnector.objects.select_related ('connector__revision'))).all ()

What this does is to force an extra query on the display connector table (with joins, as we asked for a foreign key relation there), and then caches that data in Python. Now if we ask for display connector, we can look it up directly without an extra roundtrip to the database. How much does this help? It reduces the allCard query time from anywhere between 4-6 seconds, with 500-800 queries, down to 100 ms and 3 queries!

Wrapping it up

With GraphQL in place, and some Django query optimizations, I think I can tick off the "programmatic access to the GPU database" item from my todo list. Just in time for 2017 :) Thanks for reading and don't hesitate to get in touch with me if you got any questions.

Version numbers

Version numbers

Version numbers are the unsung heroes of software development, and it still baffles me how often they get ignored, neglected or not implemented properly. Selfish as I am, I'd wish everyone would get it right, and today I'm going to try to convince you why they are really important!

The smell of a release process

Having version numbers indicates some kind of release process, assuming you don't add a version to every single commit you do in your repository. This means you've reached a point where you think it's useful for your clients to update, otherwise there's no need yet to assign a new number. That's reason number one to have it -- communication with your downstream clients. It might sound stupid, but just by assigning version numbers to your commits, a client can learn a lot about your project:

  • Size of each release -- seeing how many commits go into every single version gives an idea of how much churn there is.
  • Release frequency -- do you assign a new number once a week? Once a month? This gives a good idea on how quickly you're going to react to pull requests, issues, and more. This is also critical information for any system level application as an administrator may have to install the update. Knowing the frequency and size of every release is critical to allocate the correct resources.
  • Bug fix check -- you fixed a bug, how does the client know it got fixed? Obviously, by presenting a version number to the user which can be queried.
  • Change logs -- assigning a version number is a good moment to sit back and think about what was added, writing up some documentation along the way.

You can encode even more information if you use semantic versioning, which in theory provides guarantees to clients when it's safe to update and more. While I like it in theory, I think that semantic versioning is mostly useful for libraries, less for large applications and frameworks as you'll typically end up incrementing the major version a lot. The only really large project I'm aware of that follows semantic versioning is Qt -- and they do a quite impressive job in regards to API and ABI compatibility. I think it's nice to have if you can enforce it, and I think it's worthing striving towards, but it's not the main value add.

But it's ... complicated!

I assume that most developers not using version numbers are aware of the reasons above, and didn't just "forget" them, but have a hard time versioning due to various reasons. Typically, there are two categories:

  • Continuous integration -- rapid releases, no formal release process.
  • Very branchy development process -- versions are branch-specific.

To point one, the continuous integration: No matter how you write software, your releases happen over time. You typically don't expect your clients to update to every single release you're doing, so how about using the ISO date (year-month-day.release) as your version number? Turns out, that will usually work just fine, and it still allows people to refer to things with a common naming system instead of referencing your code drops with a hash or some continuous integration commit. In fact, I'd argue you're set up for success already because the very same system you use for continuous integration can also assign version numbers.

The other problem is super branchy development, where you have multiple lines of code in development concurrently. Let's say you have one branch for stable releases, one branch for future releases, and one maintenance branch, and there's no good correlation. The trick here is to look at the problem from the client end -- for the client, there's only a single branch they see. It's your duty to fix it such that the useful properties outlined above are present for all your clients, which may mean that every branch gets versioned separately for instance, or that you treat your branches as separate products. This is something I noticed many people forget in software development -- we're not writing code for us, we're writing code for our users, and if our process makes their life harder, we've failed, because (at least, that's the theory) there will be many more users than us, so their time is more precious.

Version all the things

I hope I could shed some light on the value of version numbers and make you hesitate next time you're about to send an email which says "everyone should use commit #4237ac9b0f" or later :) Do yourself a favor, use that tag button in your revision control system, and make everyone's life simpler. Thanks!