Graphics APIs done wrong
This is going to be partly a rant post, but I think we’re really moving into the wrong direction with current graphics APIs. Especially as the GPGPU revolution is waiting for us, current graphics APIs put a huge unnecessary burden on the developers, and if they are meant to survive, changes will have to come.
Current problems
So where are the current problems, and where do they come from? Back in the old days, when the first graphics hardware was designed, everything was very low-level. No validation nowhere, no usage hints, but little flexibility. Later, as hardware started to become programmable, the APIs added new functions and abstractions for the new functionality. We got vertex buffers, index buffers, textures, and shaders.
The problems became obvious when things started to mismatch. If a shader would ask for a different format than the vertex buffer provided, the DX9 driver would do a fix-up to make them match. More and more validation was put into the runtime.
This turned out to be a large performance problem, so DX10 requires everything to be set in stone at creation time. If you create a buffer, you have to specify whether you want to read or write to it, whether you want it to be immutable, and if you want read/write access, you have to say how often so the driver can decide where to put the texture.
The same happened for shaders, now you have to create them along with the vertex format they are going to be used with, or you get an error. However, you can now reinterpret buffers, so you can for example render into a vertex buffer.
With DX11, you can also bind most buffers as sources in the compute shaders, and you get “unstructured data” while you still have to specify the usage etc. up front.
So even while we are getting more and more abstractions, we have to specify more and more data, and the whole pipeline is very rigid, despite the flexibility. If you want to take an index buffer, and simply treat it as a texture, you have to specify lots of stuff before this is allowed. If you map a write-only texture for reading, you will get an error, even if you specified that the texture was to be placed on the host. These and other things start to sum up. With DX11, we can create a display list now, which translates all the commands we issue into the “real low-level driver commands which are ready to be run on the GPU, without any validation whatsoever.
The road to the future
So where are the problems? Basically, all this added validation and abstractions have moved us so far away from the hardware that people are willing to write a renderer from scratch, for example for Larrabee native. This leads us to the main problem: We actually need two new graphics APIs.
At the high-level, we need an API which makes it easy to do stuff. If you need a texture, it gives you a texture, which you can access from GPU or CPU, for both read and write. However, it logs whether you have actually written to, or read from it (ideally with callstack if wanted). Later on, you can query this information to optimize the usage. You should be also allowed to treat this texture as a vertex buffer, or whatever you want to, and freely pass it to compute shaders. The performance of such an API might be horrible, but it will increase developer productivity enormously. If you think about it, it might even do run-time optimization based on the observed usage (as long as the texture is not modified on the CPU side for example, it does not have to be copied to the GPU)
The other one requires a fresh look at what we are actually doing: If I create an index buffer, or a vertex buffer, or a texture, all I do is to allocate some memory. How I am going to use this memory should be of no concern for the graphics card whether it’s going to be a 32-bit float shadow map, or a 2x 16 bit integer input to the compute shader does not make a difference how it is stored. What’s important is where it is stored, whether it’s GPU or CPU memory. Ideally, each time I allocate GPU memory, I can pass along a structure which I can query later on. For instance, if I want a vertex buffer, I get a pointer to GPU memory, and I store the type etc. on my own in a custom structure. No need for the GPU/driver to know this as well, all the GPU knows is the size of the buffer. By default, the GPU treats all data as opaque blobs.
Moreover, I can specify where to put the buffer (GPU, CPU, both), and optionally some hints if the GPU guys really need them. When rendering, I’m free to do what I want with this. If I bind it as a vertex buffer, I have to take care that the data can be interpreted correctly by the shader. All the GPU checks is that the number of requested shader invocations times the size of the shader input is less or equal the buffer size.
There are some special cases I can think of where a GPU will require some type hints to provide good performance especially when creating textures, due to padding issues. But these should be made explicit to the users, just like CUDA makes padded allocations explicit. This low-level API allows me to get top performance (as I can tune everything) and gives me the possibility to write parts of the pipeline using compute shaders – as the GPU does not know what’s in the buffers anyway.
Conclusion
Unfortunately, what we currently have is the worst of both worlds. We have to do low-level specification of how we are going to use our data, while not gaining really low-level access, on the other hand, we don’t get an abstraction which frees us from all this trouble. So why is this becoming a problem with GPGPU? Mainly, because GPGPUs shows us that internally graphics cards are really general. There is no special “index buffer memory” for index buffers. In CUDA, they are just memory. There is also no problem to reinterpret data as something different. It’s really all up to us, the developers.
I do hope that in the future we will stop doing highly constrained graphics on GPUs, but treat them as a library of tools which we can use to do graphics (among other things). A GPU is not something built with index buffers as primitive data types, but a large graphics toolbox, which can do several tasks very well” like creating primitives from a buffer containing integers, and another one containing weird blobs of random data. We should get more or less direct access to the GPU command buffer, and issue what we need to get the work done, harnessing the full flexibility of the GPU. After all, we are used to this on the CPU side! We have general memory, we have pointers to data, so please hardware vendors, put us graphics developer on equal footing on the GPU.
If you’re doing graphics, what’s your opinion? I’d be really interested to hear whether I’m the only one who has these problems, or whether this is really common.