Memory Management and Parallelization
E N D
Presentation Transcript
Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin
Overview • Uniprocessor Coherent Ray Tracing • Pharr et al., 1997 • Parallel Ray Tracing Summary • Chalmers, et al. 2002 • Demand-Driven Ray Tracing • Wald, et al. 2001 • Hybrid Scheduling • Reinhard, et al. 1999
Background: Reyes [Cook et al. 87] Inspirations • Texture cache, CATs • Programmable shader • Single primitive type • Dicing • Memory effects of scan-line architecture
Pharr: System • Use both texture and geometry ‘cache’ • Lazy loading, LRU replacement • One internal primitive – triangles • Optimize ray intersection calculation • Known space requirements to represent • Tessellation of other primitives increases space reqs • Procedurally generated geometry
Pharr: Geometry Cache • Geometry grids – regular grid of voxels • Few thousand triangles per voxel • Acceleration grid of few hundred triangles for ray intersection calculation • All geometry of voxel stored in contiguous block of memory, independent of geometry in other voxelsspatial locality in scene tied to spatial locality in mem • Different voxel sizes causes memory fragmentation • Adaptive voxel sizes? Voxel size bounded by cache size for hardware impl?
Pharr: Ray Grouping • Scheduling grid -- Queue all rays inside voxel • Dependencies in ray tree prevent perfect scheduling • Store all information needed for computation with rayeach ray can be independently calculated (parallelism!) • Exploits coherence from beam of rays, disparate rays that move through same space • Superior to: fixed-order traversal of ray tree; ray clustering
Pharr: Radiance Calculation • Outgoing radiance is emitted radiance plus weighted average of incoming radiances • fr is bidirectional reflectance distribution function (BRDF) • At intersection, weights calculated for each spawned secondary ray • Final weight is product of all BRDF values of all surfaces on path from point on ray to the image plane
Pharr: Voxel Scheduling • Naïve – iterate across voxels • Better – weight voxels by cost and benefit • Cost: how expensive to process the rays in the voxel? • High geometry in voxel has higher cost • Much voxel geometry not in memory has higher cost • Benefit: how much progress to completion from voxel? • Many rays in voxel yields more benefit • Large weights on rays yields more benefit
Pharr: Discussion • Parallelization • Ray independence, load balanced geometry, lazy geometry loading helps • Will cache results hold in distributed model? • Modern architecture • Testing on 190 MHz MIPS R 10000 w/ 1GB RAM • Can modern arch hold scenes in memory (no secondary storage usage) • Hardware Acceleration • Use memory/cache/GPU rather than disk/memory/CPU
Chalmers: Parallel Ray Tracing • Demand Driven • Scene divided into subregions, or tasks • Processors given tasks statically or by a master • Balance with task balancing or adaptive regions[Fig 3.4] • Data Parallel • Object data distributed across processors • Distribute objects according to spatial locality; a hierarchical spatial subdivision; or randomly [Fig 3.7] • Hybrid Scheduling • Run demand-driven and data-parallel tasks on same processors • DD ray traversal/DP ray-object intersect [Scherson and Caspary 88] • DD intersection/DP ray generation [Jevans 89] • Ray coherence [Reinhard and Jansen 99]
Wald: Demand Driven Ray Tracing[Wald et al. 01] • Exploit cache and space coherence with modern processors (Dual Pentium III 800 MHz, 256 MB) • Use SIMD instruction set to achieve data-parallelism (e.g., Barycentric coordinate test)
Reinhard: Hybrid Scheduling [Reinhard et al. 99] • Data-parallel approach with demand-driven subtasks to load balance • Data-parallel tasks preferred, DD subtasks requested from master when no DP tasks are available