Dependency-free autocomplete - a web development story

This week I've updated the search box on my GPU database. Previously, there were two search boxes -- one for ASICs, one for cards -- and the autocompletion was done using jQuery UI. For that, I had a simple REST API hooked up was easy to reach from jQuery UI, and you'd get the results instantly while typing. Selecting one result and hitting enter would get you to the page, so everything was fine. Well, mostly. Turns out, the front page of the GPU database is really tiny in terms of HTML and script code. Pulling in jQuery, jQuery UI, and the jQuery UI CSS is pretty huge compared to the rest of the page. Even a minimized jQuery UI with only the autocomplete ticked is around 45 KiB for the CSS and JS alone, and jQuery adds another 32 KiB on top. For comparison, the GPU database landing page is 0.94 KiB, the CSS is 1.15 KiB, and the (new script with autocomplete) is 12.41 KiB (and that is not minified). That's less than 15 total - compared to 77 for jQuery and its dependencies. Moreover, there's no access to external web pages to fetch jQuery from a CDN, which increases loading time even more.

The requirement list was pretty simple:

  1. Must work with keyboard (i.e. cursor up/down, enter)
  2. Must also work with mouse (and only one thing must be highlighted)
  3. Must be able to carry some additional data for each option (so I can actually use it instead of going to a generic search page)

Notice that 3. isn't a super-hard requirement -- it does simplify life however a bit.

And here's how the final solution looks like, before we go onto a detour on our way towards it!


The final product -- user types into the input field, autocomplete shows up, can be navigated using the cursor keys, etc.

The <datalist> fail

Technically, <datalist> is exactly what I should need. It's the HTML5 way of providing autocomplete hints to the user. The way it works is you provide a <input> element, set the list attribute on it and make it point to a <datalist>. Inside the <datalist>, you provide the options. The list itself can get populated using AJAX, so everything seems easy enough.

And in fact, I got that one working pretty quickly. But it turns out there's a few ugly things about <datalist>. To start with, it's designed for completion only -- that is, expanding text. It's not designed to provide additional data. The way I solved this is to add an event handler to select on the input, and once that happened, I'd go through the list of items and search for a matching entry, and if I found one, I assumed that's the selected one and would pull the data from there.

Well, all good, except select doesn't trigger on Chrome, only Firefox. Wrap a form around it, handle submit, and there you go. Except that Chrome (as of today) has a quite ... erm, unhelpful implementation of the completion.

Let's assume for a moment I have something like this:

<input list="autocomplete">
<datalist id="autocomplete">
    <option>R9 Fury</option>
    <option>R9 Fury X</option>

If I type Fury into the input field, what would you expect? If you want to give it a shot on your browser, try this Plunker demo. Turns out, on Firefox, it does autocomplete no matter where the string appears in an option, but Chrome only autocompletes if the option starts with the string. Which makes it pretty much useless, as graphics card names tend to have a boring prefix ...

Ok, so let's roll it on our own -- how hard can it be, after all?

100% custom

Given the fail with <datalist>, we're going to do it "old-school". Below the input field, a <div> is placed, matching the size exactly. We hook input on the input field (as well as focus) and run the autocomplete core loop there. Which is pretty simple:

  1. Send a XMLHttpRequest to get suggestions
  2. Build a list
  3. Clear the <div> and put the freshly built list into it

That gives us a basic autocomplete. Well, not so fast. It only solves 2. -- selecting by mouse. Selecting by cursor keys doesn't work, the list is very long, it does not disappear when the user clicks elsewhere ... let's get on those problems.

Focus & blur

On blur, we just hide the input list. Done. Except for one small thing -- this does break on Chrome when you click on a link inside the autocomplete suggestions. What happens is that mousedown triggers, then blur triggers -- hiding the list -- and then mouseup and click follow. By the time you get to click, which would follow the link, there's nothing left to click. Of course, it works in Firefox ... the solution is pretty simple though, add a new event listener on the element for mousedown and just call preventDefault on the event. That fixes it for Chrome, and we got this part working!

Keyboard handling

While we're on the list, we want to navigate through it using the keyboard. This means we need to hook up the cursor keys and enter. It's pretty straightforward, just hook up to keydown event and then we're going to store currently selected item somewhere and add/remove classes as needed.

inputBox.addEventListener('keydown', function (ev) {
    switch (ev.key) {
        case 'ArrowDown':
                // Update index
        // Similar for up
}, true);

And indeed, this would work, if not for Edge, who decided that ArrowDown (as the spec hints at) is not the string that should be used for the arrow down, but instead, Down. Of course the answer is to use the keyCode or just add another case. I used another case, to improve maintainability (if you want to look how different browsers interpret key codes, check out this huge table of which code maps to which key on which browser).

So this is solved now. We can navigate using the keyboard. Let's hook up the mouse next.

Mouse handling

Handling the mouse might seem easy to do by hooking up mouseenter but that's not going to fly. If the list scrolls due to keyboard events, then the mouse event will fire, and suddenly you'll have two items selected. This is not an uncommen problem: As of today, the Angular autocomplete has this issue. Select an entry using the keyboard, mouse over another, and you have two selected, and god knows where you'll go if you press enter.

The solution I use is to handle mousemove on my element, and also checking for the mouse move delta. Only if the cursor was actually moved, the selection changes, so basically whatever was used last (mouse or keyboard) decides what's selected.


Wait a moment, I just said when an entry scrolls under the mouse. That's another missing bit. So far, the autocomplete <div> was basically the size of the autocomplete list, but what if we add a max-height on the <div> to ensure it doesn't go crazy long?

Now we need to make sure that if we navigate using the keyboard, the entry is actually visible. Thank god there's scrollIntoView which is all we need. Except it's not quite what we want. The problem with scrollIntoView is that it's not 100% standardized and at least the Firefox implementation scrolls the container down until the item you called scrollIntoView is at the top of the container (or the container is already scrolled to bottom). This is pretty bad for two reasons. First of all, if you have 5 items in view, and your selection moves from the first to the second one, it will scroll even though no scrolling is needed (so you loose context). Second, if you mouse over, and it scrolls to the item selected by the mouse, hilarity ensues ... so you'd have to special case it for selections triggered by mouse and those by keyboard.

The way I solved it is to scroll "on-demand", that is, only if the item would go out of scope, and only from keyboard. The assumption is that you can always see the item you want to work on, so scrolling one item up or down is safe.

Wrapping up

Whew, that was a lot of stuff for a seemingly simple feature. The lesson here is that a lot of the user experience things are not obvious right away. On the desktop, we're used to good user experience because it's all taken care of by the OS or GUI toolkit. On the web, we only got a few basic things to work with and building a more complex user experience requires quite some work and attention to detail. Even then, my solution as presented here has some accessibility shortcomings which I hope to fix at some point. For instance, a user with a screen reader will probably fail to use the up/down keys to select an item. This means I should probably have a fallback path -- for example, a form submit handler which goes to a separate page which then redirects if the entry did match directly, instead of relying on the fact that the user selected the item from the list in one way or another.

Interestingly, the complete solution so far clocks in at roughly 150 lines of TypeScript. This seems quite reasonable for being able to get rid of jQuery and improve the loading times significantly. It also resolves all three requirements that I have, which the previous solution didn't, so there's that that :)

Mapping between HLSL and GLSL

It's 2016 and we're still stuck with various shading languages - the current contenders being HLSL for Direct3D, and GLSL for OpenGL and as the "default" front-end language to generate SPIR-V for Vulkan. SPIR-V may become eventually the IL of choice for everything, but that will take a while, so right now, you need to convert HLSL to GLSL or vice versa if you want to target both APIs.

I won't dig into the various cross-compilers today - that's a huge topic - and focus on the language similarities instead. Did you ever ask yourself how your SV_Position input is called in GLSL? Then this post is for you!


This is by no means complete. It's meant as a starting point when you're looking to port some shaders between GLSL to HLSL. I'm omitting functions which are the same for instance.

System values & built-in inputs

Direct3D specifies a couple of system values, GLSL has the concept of built-in variables. The mapping is as following:

SV_ClipDistance gl_ClipDistance
SV_CullDistance gl_CullDistance if ARB_cull_distance is present
SV_Coverage gl_SampleMaskIn & gl_SampleMask
SV_Depth gl_FragDepth
SV_DepthGreaterEqual layout (depth_greater) out float gl_FragDepth;
SV_DepthLessEqual layout (depth_less) out float gl_FragDepth;
SV_DispatchThreadID gl_GlobalInvocationID
SV_DomainLocation gl_TessCord
SV_GroupID gl_WorkGroupID
SV_GroupIndex N/A
SV_GroupThreadID gl_LocalInvocationID
SV_GSInstanceID gl_InvocationID
SV_InsideTessFactor gl_TessLevelInner
SV_InstanceID gl_InstanceID & gl_InstanceIndex (latter in Vulkan with different semantics)
SV_IsFrontFace gl_FrontFacing
SV_OutputControlPointID gl_InvocationID
N/A gl_PatchVerticesIn
SV_Position gl_Position in a vertex shader, gl_FragCoord in a fragment shader
SV_PrimitiveID gl_PrimitiveID
SV_RenderTargetArrayIndex gl_Layer
SV_SampleIndex gl_SampleID
The equivalent functionality is available through EvaluateAttributeAtSample gl_SamplePosition
SV_StencilRef gl_FragStencilRef if ARB_shader_stencil_export is present
SV_Target layout(location=N) out your_var_name;
SV_TessFactor gl_TessLevelOuter
SV_VertexID gl_VertexID & gl_VertexIndex (latter Vulkan with different semantics)
SV_ViewportArrayIndex gl_ViewportIndex

This table is sourced from the OpenGL wiki, the HLSL semantic documentation and the GL_KHR_vulkan_glsl extension specification.

Atomic operations

These map fairly easily. Interlocked becomes atomic. So InterlockedAdd becomes atomicAdd, and so on. The only difference is InterlockedCompareExchange which turns into atomicCompSwap.

Shared/local memory

groupshared memory in HLSL is shared memory in GLSL. That's it.


GroupMemoryBarrierWithGroupSync groupMemoryBarrier and barrier
GroupMemoryBarrier groupMemoryBarrier
DeviceMemoryBarrierWithGroupSync memoryBarrier, memoryBarrierImage, memoryBarrierImage and barrier
DeviceMemoryBarrier memoryBarrier, memoryBarrierImage, memoryBarrierImage
AllMemoryBarrierWithGroupSync All of the barriers above and barrier
AllMemoryBarrier All of the barriers above
N/A memoryBarrierShared

Texture access

Before Vulkan, this is bundled and not trivial to emulate. Fortunately, this changes with Vulkan, where the semantics are the same as in HLSL. The main difference is that in HLSL, the access method is part of the "texture object", while in GLSL, they are free functions. In HLSL, you'll sample a texture called Texture with a sampler called Sampler like this:

Texture.Sample (Sampler, coordinate)

In GLSL, you need to specify the type of the texture and the sampler, but otherwise, it's similar:

texture (sampler2D(Texture, Sampler), coordinate)
CalculateLevelOfDetail & CalculateLevelOfDetailUnclamped textureQueryLod
Load texelFetch and texelFetchOffset
GetDimensions textureSize, textureQueryLevels and textureSamples
Gather textureGather, textureGatherOffset, textureGatherOffsets
Sample, SampleBias texture, textureOffset
SampleCmp samplerShadow
SampleGrad textureGrad, textureGradOffset
SampleLevel textureLod, textureLodOffset
N/A textureProj

General math

GLSL and HLSL differ in their default matrix interpretation. GLSL assumes column-major, and multiplication on the right (that is, you apply \(M * v\)) and HLSL assumes multiplication from left (\(v * M\)) While you can usually ignore that - you can override the order, and multiply from whatever side you want in both - it does change the meaning of m[0] with m being a matrix. In HLSL, this will return the first row, in GLSL, the first column. That also extends to the constructors, which initialize the members in the "natural" order.

Various functions

atan2(y,x) atan, with parameters swapped
ddx dFdx
ddx_coarse dFdxCoarse
ddx_fine dFdxFine
ddy dFdy
ddy_coarse dFdyCoarse
ddy_fine dFdyFine
EvaluateAttributeAtCentroid interpolateAtCentroid
EvaluateAttributeAtSample interpolateAtSample
EvaluateAttributeSnapped interpolateAtOffset
frac fract
lerp mix
mad fma
saturate clamp(x, 0.0, 1.0)

Anything else I'm missing? Please add a comment and I'll update this post!

Setting up your own mailserver

Today we're going to look at yet another use for your own home server - handling all your emails. There's a couple of reasons why you want to do this, so let's start with some motivation. I always used to download my emails from my email account using POP3. That works fine as long as you only use one machine to access it, as the emails get immediately deleted after fetching. This didn't use to be an issue at first for me, but once you get access to your mail account from your mobile phone and notebook it starts to become an issue.

The easy solution is to change it to IMAP and just let all emails on the server. While that works, I did accumulate a lot of emails over the years - my inbox contains several tens of thousands of emails, and uses a couple GiB of disk space. Even though space is not so much of an issue these days, it's still slow to sync with an inbox that has thousands of emails, and there's not much reason for me to keep a lot of the old emails around.

With a home server in place, it was time to consolidate this. My goal was to achieve the following:

  • Have all emails on my server, so I don't need to worry about backups. That means the server needs to pull them from my web accounts.
  • Have all emails in an easy to process format - something where I can search through them manually, if ever needed.
  • Remove old emails from my web accounts after a grace period.

Turns out, all of this is possible, with the help of a couple tools and some scripting. Let's dive in how to do this!

The mail server

First we need a mail server. The mail server is the thing we'll connect to in the future - my Thunderbird at home doesn't connect to my web account any more, but to my home server instead. My server of choice is dovecot which you can install on Ubuntu using apt install dovecot-imapd. Next we need to configure it. There's a couple of files we need to edit. In the following, I'll assume we're going to store our emails on /tank/mail in a per-user directory.

In /etc/dovecot/dovecot.conf, add protocols = imap as the last line to enable IMAP. Next, you'll need to edit /etc/dovecot/conf.d/10-mail.conf where you specify the mail_location. I'm using mail_location = maildir:/tank/mail/%u/.maildir - maildir ensures the emails are stored in the maildir format, which you can for instance read using the Python mailbox module.

We also need to setup a way to connect to the server, so we edit /etc/dovecot/conf.d/10-auth.conf and enable the passwdfile authentication - just remove the # at the beginning of the !include auth-passwdfile.conf.ext line.

Before we continue, I'm going to set up a new user which will own all the emails - this user will be called vmail. This is straightforward:

adduser --disabled-password --shell=/bin/false vmailadduser --disabled-password --shell=/bin/false vmail
chown -R vmail /tank/mail

We also need the user-id and the group-id of that user - check it using:

id vmail

With that, we can finally add users to access our email server. I have a couple of mailboxes and one user per mailbox. I'll be using a simple password file here. In /etc/dovecot/conf.d/10-auth.conf, check that auth_mechanisms = plain is set, as well as disable_plaintext_auth = no. By default, dovecot only allows secure connections, but here we're connecting within our local network only, so we don't have to bother setting up SSL. Now we're ready to add the users - put them into /etc/dovecot/users, with one line per user like this:


vmail-user-id and vmail-group-id is the user and group id of the vmail user, respectively. Phew! You can actually try to connect now using the username and (plain-text) password you just specified.


If you're not comfortable with plain because you're not the only root on your server, you can also use encrypted passwords.

Getting email

The next part is to load the emails from your web account into dovecot. The easiest solution I've found is to stuff them directly into dovecot using getmail. getmail can write into a maildir directly, and dovecot - as we've set it up above - stores all emails in a maildir, so let's have getmail fetch it right in there!

Installing getmail is simple, just use apt install getmail, and then we need to set up one file per mail account we want to fetch. Those files go into /etc/getmail, and look like this (let's call it your-email_example_com.conf):

type = SimpleIMAPSSLRetriever
server =
username =
password = swordfish

type = Maildir
path = /tank/mail/your-email_example_com/.maildir/
user = vmail

verbose = 2
delete = false
messsage_log_syslog = true
read_all = false
delivered_to = false
received = false

All that is left is to set up a cron job to fetch the emails. I'm running it every 5 minutes - this is how my /etc/cron.d/get-mail job looks like:


*/5 * * * * root getmail --rcfile your-email_example_com.conf --getmaildir /etc/getmail/

You can pass in as many --rcfile options into getmail as you want. Notice that it's also possible to use IMAP idle to get instant emails - it's a bit tricky to set up and I didn't bother with it as I don't need this.

All right, emails are coming in, now we only need to clean up the mail folder somehow!

Cleaning up

Unfortunately, I didn't find a solution for mail cleanup, so I ended up writing my own script for it. It works very similar to getmail: For every account, you set up a configuration file. For the account above, we'd put the following content into /etc/delmail/your-email_example_com.conf:

server =
username =
password = swordfish

type = Maildir
path = /tank/mail/your-email_example_com/.backup/

min_age = 28

min_age specifies the age of emails before they get deleted. I'm also erring on the side of safety, so all emails the script is about to delete will be backed up into the path specified in the [backup] section.

All that is left is now a cron job for this, which doesn't have to run with the same frequency. Here's the contents of my /etc/cron.d/delete-mail:


2 */4 * * * vmail -config /etc/delmail/your-email_example_com.conf

Notice that the server only handles receiving emails, not sending them. For sending, I still go directly to my web email server, but I store the copy locally if sending from home.

And that's it! I've been using this setup since a couple of years now, and so far, I'm very pleased. While traveling, I can always look up my recent emails. At home, I have access to all emails I have ever received, and I have every backed up nicely. My web accounts run with 100 MiB of storage instead of several GiB as they used, and if I fire up my mobile phone or notebook occasionally, there's only a couple of emails it needs to synchronize because the web inbox is nearly empty all the time.

Moving from Wordpress to Nikola

If you're a regular reader, you will have noticed that this blog looks remarkably different since today. What happened? Well, I moved the blog from Wordpress to a static page generator -- Nikola. That might come as a surprise as I've been using Wordpress for 11 years now, so let's dig into the reasons.

Quo vadis, Wordpress?

The most important one for me is that I often add code snippets into my blog posts, and the experience of adding code to Wordpress is horrible. I ended up writing my own plugin for Wordpress to teach it code highlighting the way I wanted, but even then, moving between the "visual" and the "text" editor would mess up source formatting. Also, I would frequently run into HTML escaping problems, where my code would end up being full of &gt;.

The other issue I had is backing up my blog. Sure, there's the Export as XML function, which does work, but it doesn't backup your images and so on, so you need to grab a dump of the wp-content folder in addition to the XML export. I'm not sure how many of you do this regularly -- for me it was just a hassle.

Next up is theming. Wordpress used to be an easy to theme system when I started -- but over time, making a theme became more and more complicated. With any update, Wordpress would add minor tweaks which required upgrading the theme again. Eventually, I ended up using the sub-theming capabilities, but even then, theme development in Wordpress remained a hassle. My "solution" was a local docker installation of Wordpress and MySQL. That worked, but it was a big hassle to set up, and frankly, I lost all motivation to do theme related changes.

The final nail in the coffin were encoding issues which left a lot of my posts with “ and other funny errors. Fixing this in Wordpress turned out to be a huge pain, especially as there is no easy way to bulk-process your posts, or for the matter, do anything with your posts.

Not everything is bad with Wordpress though. The visual editor is pretty slick, and when you stick to the WSYIWYG part of Wordpress, it is actually an awesome tool. It's just failing short on my use cases, and judging from the last years of development, it's moving further away into a direction which has very little value add for me. Recently, they added for instance another REST based web-editor, so you have now two visual editors for Wordpress. Unfortunaly, source code is still not a "solved" issue, as is authoring files in something else than HTML.

Going static

All of those problems became big enough over time to make me bite the bullet and go to a static page generator. I do like static page generation -- having even written my own static page generator a couple of years back. What I didn't want to do though is to write an importer for Wordpress, deal with auto-updating, theming, and so on, so I looked for a static page generator which would suite me. I'm partial to reStructuredText, and Python, so I ended up with Nikola, which is a Python based static page generator with first-class support for reStructuredText.

It comes with an import_wordpress command, which seemed easy enough, but it turns out you need a bit more post-processing before you can call it a day. Let's start with the importing!

Import & cleanup

Ingesting everything through import_wordpress will give you the content as HTML. Even though the files are called .md, they just contain plain HTML (which is valid Markdown ...). To convert them to "proper" Markdown, I used pandoc:

find . -name *.md | xargs -I {} pandoc --from html --to markdown_strict {} -o {}

That cleanups up most of it, but you'll end up with weird source code. My source code was marked up with [source lang=""], so I had to go through all files with source code and fix them up manually. Sounds like a lot of work, but it's usually quite straightforward as you can just copy & paste from your existing page.

In retrospective, converting everything to reStructuredText might have been a better solution, but frankly, I don't care too much about the "old" content. For new content, I'm using reStructuredText, for old content -- I don't care.


Next up is redirecting your whole blog so your old links continue to work. I like to have "pretty" urls, that is, for a post named /my-awesome-post, I want an URL like /blog/my-awesome-post. This means there has to be a /blog/my-awesome-post/index.html page. By default, the imports will be however /posts/my-awesome-post.html. In order to solve this, you need to do two things:

  • Turn on pretty URLs using: PRETTY_URLS = True
  • Fix up the redirection table, which is stored in REDIRECTIONS in

To fix the redirections table, I used a small Python script to make sure that old URLs like /my-awesome-post were redirected to /blog/my-awesome-post -- I also used the chance to move all blog posts to a /blog subdirectory. Nikola will then generate /my-awesome-post/index.html with a redirection to the new URL.


Finally, the comments - I had a couple hundred in Wordpress, and Nikola, being a static page generator, doesn't have any idea of comments. The solution here is to import them to Disqus which is straightforward. First, you create an account at Disqus, install the Disqus Wordpress plugin, and import your comments into Disqus. Be aware: This will take a while. Finally, you need to teach Disqus the new URLs. This is done using an URL remapping, which is a simple CSV file that contains the original URL and the new one. Again, same exercise as above -- you'll probably want to reuse the REDIRECTIONS for this and dump it out into a CSV.

Closing remarks

Voilà, there you go -- you've ported your blog from Wordpress to Nikola. The remaining steps you'll want to do:

  • Set up some revision control for your blog. I just imported it wholesale into Mercurial with the largefiles extension to store all attachments. Backups: Check!
  • Set up a rsync to upload the blog. By its nature, Nikola generates all files, and you need to synchronize -- some scripting will be handy for this.
  • Fix up all URLs to use / as the prefix. I just did a search-and-replace for everything which didn't continue with wp-content, redirected that to /blog/, and then fixed up the /wp-content ones.

That's it -- let's see if Nikola will serve me for the next 11 years, just like Wordpress did :)

Designing C APIs in 2016

It's 2016, and C APIs are as popular as always. Many libraries are written in C, or provide C APIs, and there are tons of bindings for any language making C the de-facto standard for portable APIs. Yet, a lot of C APIs fail basic design guidelines, and there doesn't seem to have been much progress in recent years in the way we design those APIs. I've been working on a modern C API recently, and then there's also Vulkan with a few fresh design ideas. High time we take a look at what options we have when designing a C based API in 2016!

Design matters

API design matters a lot. I've written about it before, and I still get to use a lot of APIs where I'd like to get onto a plane and have a serious chat with the author. Today we're not going to talk about the basic issues, which are ABI compatibility, sane versioning, error handling and the like - instead we'll look at ways you can expose the API to the client.

My assumption here is you're designing an API which will live in a shared object/DLL. The basic approach to do this in C is to expose your API entry points directly. You just mark them as visible, decide on some naming scheme, and off you go. The client either links directly against your library, or loads the entry points manually, and calls them. This is practically how all C APIs have been designed so far. Look at sqlite, libpng, Win32, the Linux kernel - this is exactly the pattern.

Current problems

So what are the problems with this approach? Well, there's a couple:

  • API versioning
  • API loading
  • Extensibility

Let's tackle those one-by-one.

API versioning

For any API, you'll inevitably run into the issue that you're going to update a function signature. If you care about API and ABI compatibility, that means you need to add a new entry point into your API - the classic reason we see so many myFunctionEx or myFunctionV2. There's no way around this if you expose the entry points directly.

It also means you can't remove an entry point. Client applications can solve that issue if you provide a way to query the API version, but then we're going to run into the next problem - API loading.

In general, a direct C API has no really good way to solve this problem, as every version bump means either new entry points or more complicated loading. Adding a couple new entry points doesn't sound like a big issue, but over time, you'll accumulate lots of new versions and it'll become unclear for developers which one to use.

API loading

API loading covers the question how a user gets started with your API. Often enough, you just link directly against an import library, and then you expect a shared object or DLL exporting the same symbols. This makes it hard to dynamically use the library (i.e. if you want to use it only if needed.) Sure, you can do lazy loading tricks using the linker, but what if you don't have the import library to start with? In this case, you'll end up loading some kind of dispatch library which loads all entry points of your API. This is for instance what the OpenCL loader does, or GLEW. This way, your client is 100% isolated from the library, but it's quite some boilerplate to write.

The solutions for this aim at reducing that boilerplate. GLEW generates all load functions from an XML description, OpenCL just mandates the clients expose a single entry point which fills out a dispatch table. Which brings us to the last topic, extensibility.


How do you extend your API? That is, how can someone add something like a validation layer on top of it? For most C APIs, extensions mean just more entry point loading, but layering is completely ignored.

Vulkan explicitly attacks the layering problem. The solution they came up with allows layers to be chained, which is, layers call into underlying layers. To make this efficient, the chaining can skip several layers, so you don't pay per layer loaded, just per layer that is actually handling an API call. Extensions are still handled using the normal way of querying more API entry points.

Vulkan also has a declarative API version stored in the vk.xml file, which contains all extensions, so they can generate the required function pointer definitions. This reduces the boilerplate a lot, but still requires users to query entry points - though it would be possible to autogenerate a full loader like GLEW does.

Dispatch & generation focused APIs

Thinking about the issues above, I figured that ideally, what we want is:

  • As few entry points as possible, ideally one. This solves the dynamic loading issue, and makes it easy to have one entry point per version.
  • A way to group all functions for one version together. Switching a version would then result in compile-time errors.
  • A way to layer a new set of functions on top of the original API - i.e. the possibility to replace individual entry points.

If you think C++ classes and COM, you're not far off. Let's take a look at the following approach to design an API:

  • You expose a single entry point, which returns the dispatch table for your API directly.
  • The dispatch table contains all entry points for your API.
  • You require clients to pass in the dispatch table or some object pointing to the dispatch table to all entry points.

So how would such an API look like? Here's an example:

struct ImgApi
    int (*LoadPng) (ImgApi* api, const char* filename,
        Image* handle);
    int (*ReadPixels) (ImgApi* api, Image* handle,
        void* target);
    // or
    int (*ReadPixels) (Image* handle, void* target);

    // Various other entry points

// public entry points for V1_0
int CreateMyImgIOApiV1_0 (ImgApi** api);
int DestroyMyImgIOApiV1_0 (ImgApi* api);

Does this thing solve our issues? Let's check:

  • Few entry points - two. Yes, that works, for dynamic and static loading.
  • All functions grouped - check! We can add a ImgApiV2 without breaking older clients, and all changes become compile-time errors.
  • Layering - what do you know, also possible! We just instantiate a new ImgApi, and link it to the original one. In this case, the only difficulty arises from chaining through objects like Image, for which we'll need a way to query the dispatch table pointer from them.

Looks like we got a clear winner here - and indeed, I recently implemented a library using such an API design and the actual implementation is really simple. In particular if you use C++ lambdas, you can fill out a lot of the redirection functions in-line, which is very neat. So what are the downsides? Basically, the fact that you need to call through the dispatch table is the only one I see. This will yield one more indirection, and it's a bit more typing.

Right now my thinking is that if you really need the utmost performance per-call, your API is probably too low-level to start with. Even then, you could still force clients to directly load that one entry point, or provide it from the dispatch table. The more typing issue is generally a non-issue: First of all, any kind of autocompletion will immediately identify what you're doing, and if you really need to, you can auto-generate a C++ class very easily which inlines all forwarding and is simply derived from the dispatch table.

This is also where the generation part comes into play: I think for any API going forward, a declarative description, be it XML, JSON or something else, is a necessity. There's so much you want to auto-generate for the sake of usability that you should be thinking about this from day one.

Right now, this design, combined with a way to generate the dispatch tables, looks to me like the way to go in 2016 and beyond. You get an easy to use API for clients, a lot of freedom to build things on top, while keeping all of the advantages of plain C APIs like portability.