320 likes | 422 Vues
This paper explores enhancements for an interactive rendering system by improving memory coherence, scalability, and quality during camera motion. It covers predictive sampling, point eviction, SIMD optimizations, and more.
E N D
Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics
Background • Render Cache • “Interactive Rendering using the Render Cache”, Rendering Workshop 1999 • Goal • Interactive Rendering • Exploit frame-to-frame coherence • Decouple renderer from display framerate • Reuse “expensive” rendering results
Background • Goal: Interactive rendering Ray tracing Path tracing
image renderer display user application Background • Modified Visual • Feedback Loop Asynchronous interface
Background • Reproject rendered points Original view New view
Background renderer Displayprocess Update Points Project/Z-Buffer DepthCull image Interpolate Sampling renderer
Background • Results after each stage Projection Depth cull Interpolation
Background • Sampling Displayed image Priority image Requested pixels
Related Work • Faster ray engines • Optimize and parallelize • E.g., Wald et al • Hardware-based display • Mesh-based • E.g., Tapestry, Holodeck, Tole et al • Texture-based • E.g., Corrective textures
Motivation • Render Cache works well • Can enable interactive use of higher quality ray-based renderers. • … but needs improvement • Images too small (256x256) • Gaps often visible during camera motion • Not fast enough in tracking shading changes
Enhancements • Tiled Z-Buffer • Better scalability and memory coherence • Larger Interpolation Prefilter • Can fill larger gaps between points • Predictive Sampling • Improved quality during camera motion • Point Eviction • Faster update of shading changes
Enhancements • Code Optimization • Use of SIMD (MMX/SSE/SSE2) • Data layout, branch conversions, etc. • Publicly Available • For evaluation, comparison, or use • Non-commercial binary release • URL is in the paper
Memory Coherence • Change from R10K to Pentium 4 • Cache reduced from 4MB to 256K • Clock increased from 195MHz to 1.7GHz • Cache misses much more expensive • Change from 256x256 to 512x512 • Point data ~ 5MB, Image data ~ 3MB • Much bigger than cache • Projection and Z-Buffer problematic
Projection and Z-Buffer • Random order memory access • Read/modify/write operation is memory latency limited Point Cloud 5MB Image - 3MB
Tiled Projection and Z-Buffer • Divide image into tiles • Tiles sized to fit in cache Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Tiled Projection and Z-Buffer • Project and bucket sort by tile Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Tiled Projection and Z-Buffer • Z-Buffer each tile separately Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Tiled Projection and Z-Buffer • Uses more memory and instructions • But it is faster (25ms instead of 42ms) Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Interpolation Filters • Larger filters • Fill larger gaps in point data • Generally more expensive • Result in more blurring of the image • The previous Render Cache • Used a 3x3 weighted filter • Can only fill very small gaps • Introduces only a small amount of blurring
Prefilter • Add a larger “backup” filter • Results used only when 3x3 filter fails • Uses a uniform 7x7 filter • Can be computed cheaply • Can fill in much larger gaps • Does not affect sampling priorities • Actually executed first then overwritten • Hence the name “prefilter”
Prefilter 3x3 filter only 7x7 prefilter only Both filters
Predictive Sampling • Sampling is purely reactive • Helps to guide sparse sampling • Samples returned in later frame • Problem when large new regions become visible • Predict large gaps ahead of time • Project using a predicted camera • Request samples before they are needed
Predictive Sampling • Projection is expensive • 47% of original render cache cost • Use simplified projection • No Z-Buffer • Only need to find regions with no points • Reduced resolution • 1/4 width and height (1/16 # of pixels) • Store only 1 byte per pixel • Occupancy image fits easily in cache
Predictive Sampling • Example during rapid camera rotation No Prediction With Prediction
Algorithm Overview Update Points renderer Prediction Project/Sort Z-Buffer DepthCull image Prefilter Interpolate Sampling renderer
Point Eviction • Stale data can be worse than no data • Points may live a long time at high ratios • Not enough new samples to overwrite old • Color change detection already exists • Enhances sampling in regions of change • Works by aging nearby points • Evict points beyond an age limit • Speeds image convergence
SIMD Optimizations • Utilize MMX/SSE/SSE2 instructions • Project four points at once • Process R,G,B channel simultaneously • Add memory prefetches • Automatic prefetch works well for linear access • Convert branches to data dependencies • Compares set masks of zeroes or ones • Use boolean operations instead of branches • Roughly a factor of two total speedup
Results • Single 1.7GHz processor - rotating camera Ray trace only (1.8 fps) Render Cache (9 fps)
S a m p l i n g U p d a t e P o i n t s F i l t e r / S m o o t h P r e d i c t i o n P r e f i l t e r D e p t h C u l l P r o j e c t Z - B u f f e r Results • Timing: 62.1 ms (up to 16 fps) • 512x512 image, render cache only • 1.7GHz Pentium 4 processor
Scalability with Image Size 1600000 1200x1200 1400000 1200000 1000000 800000 600000 Frame Size (Pixels) 400000 512x512 200000 0 0 50 100 150 200 250 300 350 Frame Time (ms)
Results • Try it for yourself • Download publicly available binary • Includes Render Cache and simple Ray Tracer • Requires a Pentium 4 and Java Web Start • Free for evaluation and internal use • Http://www.graphics.cornell.edu/research/interactive/rendercache • Demo