Enhancing and Optimizing Render Cache for Interactive Rendering

Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics

Background • Render Cache • “Interactive Rendering using the Render Cache”, Rendering Workshop 1999 • Goal • Interactive Rendering • Exploit frame-to-frame coherence • Decouple renderer from display framerate • Reuse “expensive” rendering results

Background • Goal: Interactive rendering Ray tracing Path tracing

image renderer display user application Background • Modified Visual • Feedback Loop Asynchronous interface

Background • Reproject rendered points Original view New view

Background renderer Displayprocess Update Points Project/Z-Buffer DepthCull image Interpolate Sampling renderer

Background • Results after each stage Projection Depth cull Interpolation

Background • Sampling Displayed image Priority image Requested pixels

Related Work • Faster ray engines • Optimize and parallelize • E.g., Wald et al • Hardware-based display • Mesh-based • E.g., Tapestry, Holodeck, Tole et al • Texture-based • E.g., Corrective textures

Motivation • Render Cache works well • Can enable interactive use of higher quality ray-based renderers. • … but needs improvement • Images too small (256x256) • Gaps often visible during camera motion • Not fast enough in tracking shading changes

Enhancements • Tiled Z-Buffer • Better scalability and memory coherence • Larger Interpolation Prefilter • Can fill larger gaps between points • Predictive Sampling • Improved quality during camera motion • Point Eviction • Faster update of shading changes

Enhancements • Code Optimization • Use of SIMD (MMX/SSE/SSE2) • Data layout, branch conversions, etc. • Publicly Available • For evaluation, comparison, or use • Non-commercial binary release • URL is in the paper

Memory Coherence • Change from R10K to Pentium 4 • Cache reduced from 4MB to 256K • Clock increased from 195MHz to 1.7GHz • Cache misses much more expensive • Change from 256x256 to 512x512 • Point data ~ 5MB, Image data ~ 3MB • Much bigger than cache • Projection and Z-Buffer problematic

Projection and Z-Buffer • Random order memory access • Read/modify/write operation is memory latency limited Point Cloud 5MB Image - 3MB

Tiled Projection and Z-Buffer • Divide image into tiles • Tiles sized to fit in cache Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

Tiled Projection and Z-Buffer • Project and bucket sort by tile Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

Tiled Projection and Z-Buffer • Z-Buffer each tile separately Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

Tiled Projection and Z-Buffer • Uses more memory and instructions • But it is faster (25ms instead of 42ms) Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

Interpolation Filters • Larger filters • Fill larger gaps in point data • Generally more expensive • Result in more blurring of the image • The previous Render Cache • Used a 3x3 weighted filter • Can only fill very small gaps • Introduces only a small amount of blurring

Prefilter • Add a larger “backup” filter • Results used only when 3x3 filter fails • Uses a uniform 7x7 filter • Can be computed cheaply • Can fill in much larger gaps • Does not affect sampling priorities • Actually executed first then overwritten • Hence the name “prefilter”

Prefilter 3x3 filter only 7x7 prefilter only Both filters

Predictive Sampling • Sampling is purely reactive • Helps to guide sparse sampling • Samples returned in later frame • Problem when large new regions become visible • Predict large gaps ahead of time • Project using a predicted camera • Request samples before they are needed

Predictive Sampling • Projection is expensive • 47% of original render cache cost • Use simplified projection • No Z-Buffer • Only need to find regions with no points • Reduced resolution • 1/4 width and height (1/16 # of pixels) • Store only 1 byte per pixel • Occupancy image fits easily in cache

Predictive Sampling • Example during rapid camera rotation No Prediction With Prediction

Algorithm Overview Update Points renderer Prediction Project/Sort Z-Buffer DepthCull image Prefilter Interpolate Sampling renderer

Point Eviction • Stale data can be worse than no data • Points may live a long time at high ratios • Not enough new samples to overwrite old • Color change detection already exists • Enhances sampling in regions of change • Works by aging nearby points • Evict points beyond an age limit • Speeds image convergence

SIMD Optimizations • Utilize MMX/SSE/SSE2 instructions • Project four points at once • Process R,G,B channel simultaneously • Add memory prefetches • Automatic prefetch works well for linear access • Convert branches to data dependencies • Compares set masks of zeroes or ones • Use boolean operations instead of branches • Roughly a factor of two total speedup

Results • Single 1.7GHz processor - rotating camera Ray trace only (1.8 fps) Render Cache (9 fps)

S a m p l i n g U p d a t e P o i n t s F i l t e r / S m o o t h P r e d i c t i o n P r e f i l t e r D e p t h C u l l P r o j e c t Z - B u f f e r Results • Timing: 62.1 ms (up to 16 fps) • 512x512 image, render cache only • 1.7GHz Pentium 4 processor

Scalability with Image Size 1600000 1200x1200 1400000 1200000 1000000 800000 600000 Frame Size (Pixels) 400000 512x512 200000 0 0 50 100 150 200 250 300 350 Frame Time (ms)

Results • Try it for yourself • Download publicly available binary • Includes Render Cache and simple Ray Tracer • Requires a Pentium 4 and Java Web Start • Free for evaluation and internal use • Http://www.graphics.cornell.edu/research/interactive/rendercache • Demo

The End

Enhancing and Optimizing Render Cache for Interactive Rendering

Enhancing and Optimizing Render Cache for Interactive Rendering

Presentation Transcript

Interactive Rendering using the Render Cache

Optimizing Communication and Capacity in 3D Stacked Cache Hierarchies

Sim, Render, Repeat

Render Cache

Chrono::Render

3DS MAX Render

The Render Chain And You

Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware

Optimizing Graph Algorithms for Improved Cache Performance

Sim, Render, Repeat

Render Farm

Blender Render Farm

Interactive Rendering using the Render Cache

SimpleGeo render plugins

Silicon Render Sheffield

The Importance Of Render Cleaning

Optimizing Cache Performance in Matrix Multiplication

Memory and cache

Exterior Render

Render Companies

Render Company

Scratch render Solihull