This is going to be a fairly technical post on a single issue which arises if you port an DirectX11 graphics engine to OpenGL 4.2. The problem I’m going to talk about is shader resource binding, in particular, how to map the Direct3D model of textures and samplers to the OpenGL model of sampler objects, texture units and uniform samplers.
First of all, a word on naming. I’ll be using texture unit from here on for the thing called uniform sampler in GLSL, otherwise, it’ll be never clear if I mean a sampler object or a sampler in GLSL.
Ok, so where is the problem, actually? There are two differences between Direct3D and OpenGL that we need to work on. The first is that OpenGL was originally designed to have a single shader program which covers multiple stages, while Direct3D always had those separate shader stages with their own resources. The second is that Direct3D10 separated samplers from textures, so you can decide inside the shader which texture you want to sample using which sampler. In OpenGL, the sampler was historically bound to the texture itself.
Let’s look at the first problem: Having a separate shader for each stage is easy with OpenGL 4.2, as the separate shader objects extension has become core. We can now create shader programs for each stage just like in Direct3D, so nothing special to see here. The only minor difficulty is we need to keep a program around to attach our separate programs to and then we have to enable the right stages, but this is no more difficult than in Direct3D.
The real problem comes when we try to bind textures. Unlike Direct3D, OpenGL does not separate textures from samplers cleanly. Even with the sampler object extension, which allows you to specify a sampler independent of the texture, you still have to connect them before sending them to a shader. Inside the shader, every texture unit (remember, this means every uniform sampler) is a combination of texture data and a sampler.
The way I solve this issue is to do a shader pre-process with some patching. I expose the same system as in Direct3D to the user, that is, a number of texture slots and sampler slots for a shader stage. While parsing the shader, I record which texture, sampler combinations are actually used, and those get assigned at compile time to a texture unit. I simply enumerate the units one-by-one. The users have to use a custom macro for texture sampling, otherwise, there is no real difference to HLSL here. HLSL? Yes, as the user has to write the names of the textures and samplers into the GLSL code – the user never writes a uniform sampler statement though.
For each texture and sampler slot, I record which texture units it is bound to, and when the user sets a slot, all associated texture units get updated. Each shader stage gets a different texture unit range, so if the fragment program is changed, all vertex program bindings remain unaffected. So far, so good, but there is one tricky issue left making this not as great as it could be.
The easy life of a graphics developer
Let’s think about this system for a moment. What it means is that each texture, sampler pair occupies a new texture unit. This is probably not an issue, as any recent graphics card supports 32 texture units per stage, so we should be fine, right? The problem here is called
GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS. This is the maximum number of texture units you can bind at the same time for all shader stages combined. On AMD hardware, this is currently 32 – if you statically partition them across 5 shader stages, you get 6 units per stage (and 8 for the fragment shader.) This is not that great. Ideally, you would like to have at least 32 slots per stage (or 160 in total), and then make sure to not use more than the maximum combined number of texture image units across all stages (unless the hardware limit is indeed 160 texture units. But hey, if you sample from 160 textures across 5 shader stages, you’re not doing realtime graphics anymore … I disgress.)
Intel and NVIDIA seem to expose just that number (160 on NVIDIA, 80 for Intel), which makes for easier porting. I’m not sure why AMD actually exposes only 32 there (even on my HD7970), as those texture image units don’t have a real meaning. It’s not like the hardware actually has textures associated with samplers. Instead, the sampler state is passed along with each texture fetch. If you don’t trust me, check the AMD instruction set reference :) In section 8.2.1, you can see the various image instructions, which can take a 128 bit sized sampler definition along with them. That’s where all that is necessary for sampling is stored. It’s simply four consecutive scalar registers, so in theory, you should be able to define them even from within the shader code (and I’m 100% sure someone will do this on the PS4!)
The correct solution which should work in all cases is to deferr the binding until the next draw call, and do the texture/sampler slot to texture unit assignment there. This gives you the flexiblity to assign all 32 texture units to a single shader stage, at the expense of having to (potentially) change the mappings on every shader change.
Right now, 6 texture units per stage (and 8 for the fragment shader) is plenty enough for me, and I guess that’s true for most games as well. Remember that a texture unit can be bound to a texture array as well, so the 8 is not really a practical limit. If you’re ok with this limits, my solution seems to be as good as any and allows for a nice and clean mapping between Direct3D11 and OpenGL. I would be curious to hear though how other people solve this problem, as there are surely more clever solutions out there!