Skip to main content

Shadow mapping basics

Welcome to a different kind of blog post! This time, I'll be writing an educational piece about shadow mapping for all of you who are just getting started with real-time graphics and shader programming. If you want to see the implementation, please follow the OctoAwesome live-coding project where this will be included over time.


Let's start right away by trying to define what problem we want to solve. As of today, OctoAwesome is an outdoor game without any kind of shadows but with some kind of fixed-function illumination. For every pixel that is shaded, the lighting is evaluated even if the point is occluded by some geometry. What we want to add is the ability for a point to query whether it is occluded or not.

Some of you might shout ray-tracing. And you're absolutely right, we will use this as our mental model to derive shadow mapping! So how would the lighting code work? If we had something like a shadow() API call in the shader, we could use that to trace a ray from the point being shaded to the light source. shadow() returns a boolean, which indicates whether the path is clear or not.

Three shadow rays cast by the sun. Two hit the green occluder before hitting the blue object which is currently shaded.

The problem here is how to implement the shadow() call. For efficient ray-tracing, we're going to need some kind of acceleration structure and then a rather involved kernel to do the real tracing. In the case of OctoAwesome, we would probably want to trace a binary volume for maximum efficiency. We'd also need some special way to handle transparent blocks like the trees and animated objects. Not impossible but a lot of work.

We'll notice pretty quickly that for each frame, we have to trace a lot of rays and that gets expensive due to the traversal. It's even worse as we trace the same rays over and over, at least as long the light source and the camera is not moving. This is surely not the most efficient way; it feels as if we should be able to store and reuse the results of the shadow() calls somehow.


And indeed, there is a way to reuse shadow() calls. The key insight is that we can store one value per ray to resolve the shadow() query for all points along the ray. The value we store is the distance to the closest hit, and our new shadow() call now just checks the distance of the query point against the closest point. If further away, it is in shadow.

Ray-tracing with a cache. The blue/green cell stores the values for all rays passing through it. The two top points on the blue object are tested against the blue cell, and one of them is classified as lit even though it should be in shadow. Nitpick: If done with utmost precision, actually all points would be classified as occluded as all three have a larger depth value. Generally, a small epsilon (bias) is introduced to avoid self-shadowing; in the example above, it's large enough to fix the upper point on the blue object from getting shadowed by itself.

Now the only problem is how to store the "per-ray" data, which are now cast from arbitrary points. For a distant light source, imagine we place a grid orthogonal to the light direction and quantize rays into small "cells". That is, all rays which are emitted "nearby" will go into the same cell. This introduces a bit of error, depending on how big our cells are and other factors, but in general it's quite acceptable.


What we need to do now is to produce a grid of distance values from the light source, store this somehow, and during shading, project the points into that grid and compare the values. Turns out the GPU is perfectly suited for this. Producing distance values is exactly what we do when writing into the depth buffer. Storing is equally easy, we can re-use a depth buffer as a texture. The only remaining problem is to project the points into the shadow map and compare the values.

Let's tackle the problems one-by-one. First of all, we need a new camera, which captures the scene as seen from the light into a depth map. This means we need to create a new render target, which has only a depth buffer bound. We also need to set up the camera correctly: It should cover the complete view frustum and nothing else, to maximize the effective resolution. Finally, when rendering, we should turn off all pixel shaders to improve performance. Of course, for alpha-tested geometry, we need the shaders, but if possible we should use a simplified version which only calls discard() while generating the shadow map.

The next step is the normal render pass in which we need to implement the shadow() call. For this to work, we have to project the point being shaded into the shadow map, that is, it has to go from world-space into the light-space using the same projection as we used to generate the shadow map. This means we need to pass-through the world-space position somehow through the vertex shader, the easiest way is to simply forward it and then multiply with the light projection in the pixel shader. One division, one adjustment for the -1..1 to 0..1 coordinate system difference, and a comparison later and we know if the point is in shadow!


The solution above will work in practice but result in quite ugly, blocky shadows. We can achieve higher quality by using a comparison sampler and linear filtering, which will enable hardware-accelerated percentage closer filtering.

I've also omitted lots of other problems we're going to run into. For instance, the shadow map resolution should be improved by using cascaded shadow maps, we should use some kind of contact hardening shadows to make the shadows softer further away, and so on. Good shadow mapping is actually really hard to implement, due to the fixed precision and resolution of shadow maps, but right now, it's the best we can do until the hardware becomes fast enough to trace soft-shadows in real-time. This will be hopefully the topic of a future blog post :)