Overview

Overview • Instant radiosity • Tiled-deferred lighting • Results and future work

Instant radiosity • Instant radiosity algorithm [Keller97] makes use of hardware-accelerated virtual points lights (VPLs) rendering to accurately produce GI solution for diffuse materials • shoot N random particles from light, randomly absorbing particles at the intersections with objects, and spawning VPLs at the intersections; • [LHK07]: shoot N particles according to a pattern; after the light was moved, reproject VPLs back to the light, and discard either non-valid ones or those that distort the pattern too much.

Instant radiosity (2) • Pros: • Any engine able to render shadow mapped lights can utilize IR; • No pre-computed or artists-generated data; • Indirect occlusions: • No indirect lighting leaks; • No need for a separate AO term; • Effortless dynamic objects: • No more SH probes inconsistencies. • Cons: • Many VPLs are needed to construct accurate GI in complex scenes; • Having tons of VPLs requires highly efficient lighting rendering; • Shadow maps rendering is very expensive DIPs-wise • Incremental IR reduces the number of shadow maps updates, but real-time applications are still limited to (almost) static scenes.

Instant radiosity without indirect shadows.

Instant radiosity with indirect shadows.

VPLs placement • IR originates from the domain of ray tracing, most papers assume the engine being able to trace rays intersecting with render geometry. • Reflective shadow maps algorithm [DS05] uses shadow map with additional data (albedo and normals) allowing interpreting shadow map pixels as sources of reflected light • RSM rendering uses (almost) the same shaders as regular G-buffer rendering => any deferred engine can do RSM. • Treating each RSM pixel as VPL will result into millions of VPLs. • A method for reducing number of VPLs is required. • Clustering: replacing many small VPLs with a single representative VPL.

RSM clustering • Replace all RSM pixels placed on the same face with single VPL • threshold-based face detection based on depth and normals variations over a group of pixels. • Use flood fill algorithm for finding large clusters: • construct a map of sharp discontinuities; • perform 8-ways iterative flood fill stopping at the discontinuities. • The algrotihm is very simple and well-suited for GPUs. Movie from Wikipedia http://en.wikipedia.org/wiki/Flood_fill

Our implementation test: clustering pixels with similar colors.

Resulting clusters.

Averaging colors over clusters.

RSM clustering (2) • Once the clusters have been found, construct one VPL per cluster: • Every VPL parameter (position, intensity, color) is computed as a weighted average over all individual pixels in the cluster, with weights equal to pixels area in world space. • Sort resulting VPLs by intensities, choose N the brightest ones.

Scene’s VPLs.

Indirect lighting only.

Tiled-deferred lighting • Tiled-deferred is a method for fast rendering of large number of VPLs. • I’m only going to give the details of our implementation: • Requires 4.0 compute shaders-capable hardware, and CS 4.1 is needed for optimal performance (cube maps arrays support); • Uses 8x8 tiles; • Typically, 300-500 lights per frame; • Almost all lights cast shadows; • Almost all lights are hemispherical lights coming from instant radiosity.

add r0.xyz, r0, -v1 texld r1, r0, s3 add r3.xyz, r1.x, -c6_abs.wyxw cmp r3.xyz, -r3_abs, -c6.y, -c6.w add r4.xyz, r1.x, c5 cmp r4.xyz, -r4_abs, c6.y, c6.w add r3.xyz, r3, r4 add r4.xyz, r1.y, -c6_abs.wyxw cmp r4.xyz, -r4_abs, -c6.y, -c6.w add r5.xyz, r1.y, c5 cmp r5.xyz, -r5_abs, c6.y, c6.w add r4.xyz, r4, r5 mul r5.xyz, r3.zxyw, r4.yzxw mad r5.xyz, r3.yzxw, r4.zxyw, -r5 mul r4.xyz, r0.y, r4 mad r0.xyw, r0.x, r3.xyzz, r4.xyzz mad r0.xyz, r0.z, r5, r0.xyww mul r3, r0.y, c13 mad r3, c12, r0.x, r3 mad r0, c14, r0.z, r3 add r0, r0, c15 rcp r0.w, r0.w mul r0.xyz, r0.w, r0 mad r1.xy, r1.zwzw, c16.zwzw, r0 mul r1.zw, r0.z, -c6.xyyw texldl r0, r1, s4 mul r0.x, r0.x, c4.z mul oC0.xyz, r0.x, r2 mov oC0.w, -c6.w from Shaders/PointLight.shader, ps 3.0 texldp r0, v0, s2 mad r0.xyz, r0, c6.x, c6.y mul r1, c1, vPos.y mad r1, c0, vPos.x, r1 texldp r2, v0, s1 mad r1, c2, r2.x, r1 add r1, r1, c3 rcp r0.w, r1.w mad r2.xyz, r1, -r0.w, v1 mul r1.xyz, r0.w, r1 dp3 r0.w, r2, r2 rsq r1.w, r0.w mad_sat r0.w, r0.w, v3.x, v3.y mul r2.xyz, r1.w, r2 rcp r1.w, r1.w mad r1.w, r1.w, c4.x, c4.y texldp r3, v0, s0 mad r3.xyz, r3, c6.x, c6.y add r2.w, r3.w, c6.z dp3 r2.x, r3, r2 cmp r2.y, r2.x, c6.w, c6.y cmp r2.y, r2.w, c6.w, r2.y cmp r0.xyz, r2.y, r0, -r0 cmp_sat r2.x, r2.y, r2.x, -r2.x mul r2.xyz, r2.x, v2 mul r2.xyz, r0.w, r2 mad r0.xyz, r1.w, r0, r1 Tiled-deferred lighting: why? • Dramatic reduction of work per light, e.g. for point light with shadows: from Shaders/Compute/Lighting.shader, cs 4.1 ld_structured r2.w, r0.w, l(0), g0.xxxx ishl r3.x, r2.w, l(1) add r5.xyz, -r2.xyzx, cb1[r3.x + 0].xyzx dp3 r3.z, r5.xyzx, r5.xyzx mad_sat r3.w, -r3.z, cb1[r3.x + 0].w, l(1.000000) rsq r5.w, r3.z mul r6.xyz, r5.wwww, r5.xyzx dp3_sat r5.w, r1.xyzx, r6.xyzx mul r3.w, r3.w, r5.w sqrt r3.z, r3.z mad r5.xyz, r0.xyzx, r3.zzzz, -r5.xyzx dp3 r3.z, r5.xyzx, r5.xyzx mul r3.z, r3.z, cb2[r2.w + 0].x mov r5.w, cb2[r2.w + 0].y sample_c_lz_indexable(texturecubearray)(float,float,float,float) r2.w, r5.xyzw, t4.xxxx, s0, r3.z mul r2.w, r2.w, r3.w mad r4.xyz, cb1[r3.x + 1].xyzx, r2.wwww, r4.xyzx - Some work is done only once for a batch of lights: * position reconstruction from depth; * G-buffer reads. - Some memory accesses shared through TGSM * light index buffer access. - Accumulation of results in-shader, no need for light buffer access.

Tiled-deferred lights culling • Compute shaders: no more bounding volumes rendering • cannot rely on rasterization unit to determine affected pixels • Project light bsphere onto screen to get 2D bounding primitive • rectangle vs. circle vs. ellipsoid • we stick to bounding circle: diameter = elipsoid’s main axis length, center = projection of bounding sphere’s center

Light light’s range Tiled-deferred lights culling (2) • Compute shaders: no more bounding volumes rendering • no z-culling, have to do one ourselves tile’s frustum LZmax LZmin TZmax Scene object Scene object TZmin Our depth culling test: - Compute TZminand TZmax from depth buffer; - Test if [LZmin; LZmax] and [TZmin; TZmax] are overlapping.

Tiled-deferred lights culling (3) • Finally, our culling algorithm: • Additional test for hemispherical lights: reject the tiles completely below hemisphere’s plane. • Additional test for shadow casting lights: reject the tiles completely in shadow (c.f. next slide). for each light for each tile check if tile is intersecting with light’s 2D bounding circle; depth culling; if a tile passes both test, write light’s index to tile’s index buffer

s00 s10 s11 s10 s01 s00 Light index 31 26 21 16 11 0 s01 s11 Tiled-deferred sparse shadows • Sample shadow map in the corners of the tile at the lights culling stage. • Convert shadow factors to 0.5 fixed point numbers, and pack them into the light index buffer: • When computing lighting, unpack the factors and use bilinear interpolation to get shadow factors for all tile’s pixels • thus, only 4 shadow map samples per tile instead of 64! • => if four shadow factors are zeros, we can skip the light altogether. Light index buffer entry (32 bits)

Tiled-deferred sparse shadows (2) • Sparse shadows are very important for our instant radiosity, they make indirect occlusion acceptable performance-wise. • Cannot use sparse shadows on just any tile, only on tiles which represent approximately planar regions. • Whether sparse shadows applicable or not is determined based on analysis of face normals and depth variation over the tile • simple thresholding is used at the moment.

Black tiles are where sparse shadows should not be applied.

Results Sponza time!

Future work • VPLs level-of-details via clustring e.g. Lightcuts [WFA05] • Light buffer post-filtering as shown in [LHK07]

Overview

Overview

Presentation Transcript

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

OVERVIEW

Overview

Overview

Overview

Overview

Overview

OVERVIEW

Overview

Overview

OVERVIEW

Overview

Overview

Overview