Thursday, August 9, 2012

Lighting transparent surfaces with deferred shading. Part I

As the reader hopefully knows (if not, there are multiple good explanations of deferred shading and it's not hard to find them - just read them all), rendering transparent surfaces in deferred shading (DS) pipeline is kind of problematic. Well, the main point of DS is to enable optimization of lighting process by evaluating lights only on visible surfaces. The way it is traditionally realized (using depth buffering - at every pixel only the closest surface is considered) doesn't really get along with transparent objects. Transparency causes multiple surfaces affect a single pixel outcome, so we have to cheat.

Developers have come up with many workarounds to this problem. First I will briefly describe those that I am aware of. Next (in part II), I will explain in more detail the method that may be useful in specific situations (small amount of transparent objects that may get away with low-frequency lighting) which is based on compute shaders. I haven't seen anyone mention this method before, however it is quite specific and closely related to other techniques (it can be described as a well-known technique used in a slightly unconventional way), so I consider it to be a very humble contribution. :)
I will also present comparison of said method with forward rendering technique.

Forward rendering

The simplest way to light transparent surfaces with deferred shading is not to. :) Instead, use traditional forward pipeline for this. It means having to deal with all forward-related goodies:
  • necessity to store multiple shadows maps in GPU memory (instead of render-and-forget DS way)
  • CPU code to find closest/most important lights for a given object
  • possibly multi-pass rendering
Lots of fun, in a word. Opportunity to avoid all that mess was an important reason for developers to turn to deferred shading in the first place.

Multipass deferred shading

I've just made up that name, but this one is very simple. DS works as long as you need to light only a single surface at a given pixel. So if you render only one object at a time, you should be OK.

First render opaque objects as usual. Then, for all transparent objects:
  1. render object to G-buffer and mark affected pixels in stencil buffer
  2. accumulate lighting into a separate buffer
  3. blend result into final buffer (the one that opaque objects were rendered to)
You may need to adjust it here and there to get the blending right, but basic idea stands. More accurate description can be found here: Deferred Rendering, Transparency & Alpha Blending.

The good news is that we reuse the deferred shading pipeline. But notice that step 2 involves going through all the lights to process a G-buffer that contains single object! So in terms of big Oh notation we're back to Oh(number of objects * number of lights) instead of Oh(number of objects + number of lights). Oh-oh. That's bad news. More bad news:
  • as with forward rendering, you can't reuse shadow map memory (you need to keep them since they are being used over and over)
  • later stages that use G-buffer (like SSAO) might not like that you overwrote all useful data with that window glass on the foreground

 

Deep G-buffer

Again, simple idea. Have a G-buffer with multiple layers. If you create four layers, you can have correct rendering of as much as three transparent surfaces in front of opaque surface. Unfortunataly, memory consumption is huge. On the other hand, processing overhead doesn't need to be very high, as Humus describes.

Smart deep G-buffer

I haven't seen anyone mention this one and I haven't tried it yet, but I think it might not be completely unfeasible.

In deep G-buffer method, on average a lot of memory is wasted. Usually, you're not looking at opaque surfaces through exactly three (or whatever number of G-buffer layers) transparent objects covering entire screen. Big part of view will only be covered by opaque stuff, and at those places additional layers of G-buffer will be wasted. On the other hand, in rare cases, you might need more than 3-4 transparency layers in some small area of screen and you can't have them, while the rest of G-buffer in unused! Thus, it would be great if we could use memory in a smarter way.

There's been recently some hype on order-independent transparency using linked lists in compute shaders. The idea is to use a GPU memory buffer (RWStructuredBuffer in DirectX) to render transparent objects into per-pixel linked lists, storing color, alpha, depth and possibly blending type for each transparent pixel drawn. Then, resolve pass sorts and blends those lists and final pixel colors are obtained.

Now, since you already store so many data per transparent sample, you could add few more attributes (normals, texture samples) and use those linked lists as deep G-buffer with dynamic amount of memory per pixel! So, geometry stage of transparent objects would fill linked lists. Then, in light shader, lists would be traversed, samples shaded, and light accumulation of each sample updated accordingly. Light accumulation would probably need to be stored along the rest of the sample data in G-buffer. This means that light shader cannot be pixel shader, but a compute shader (pixel shaders can't write to RWStructured buffer I believe) which unfortunately excludes optimization based on rendering light geometries to minimize amount of pixels processed. Apparently pixel shaders can write to arbitrary buffers accessed as UAVs after all.

I realize that my "thought implementation" probably missed some important issues that might make this idea more inconvenient or even impossible. I'd be glad to hear about such issues in comments.

Creative Assembly method

I don't really know how to call this one, because it would have to sound something like "Unfolding transparent objects into textures for lightmaps generation". This method has been described very recently by Creative Assembly guys. I have only seen slides and it's possible that I misunderstood some details, but general idea is clear, though might be hard to explain in words. Open the slides now, look at the pictures and you'll know everything.

Each transparent object surface is "unfolded" into low resolution texture that stores object-space positions of corresponding surface points. These positions basically become sampling points in the lighting process. So, to perform the lighting, all such maps are packed into one big map, and deferred shading is just run on it. Light is accumulated into light probes. When rendering transparent object they sample proper area in lightmap, intepolate between light probes and evaluate lighting. The outcome is probably rather low-frequency lighting.

This idea is pretty decent and made me a little embarassed, since the method I've been working on is similar in some ways (light probes, low-frequency lighting) but apparently not as smart as theirs.


As I said before, in part II I am going to describe in more detail another method that is based on compute shaders and compare it to forward rendering method.

No comments:

Post a Comment