Ray Tracing on Programmable GPUs
Ray Tracing on Programmable GPUs. Application. Command. Geometry. Rasterization. Texture. Fragment. Display. Graphics Pipeline. Fragment Input. Textures. Fragment Program. Registers. Fragment Output. Traditional Pipeline. Programmable Fragment Pipeline.
Ray Tracing on Programmable GPUs
E N D
Presentation Transcript
Application Command Geometry Rasterization Texture Fragment Display Graphics Pipeline Fragment Input Textures Fragment Program Registers Fragment Output Traditional Pipeline Programmable Fragment Pipeline
Fragment Processing Features • Rich instruction set • No branching yet (see PS 3.0 spec) • Floating point • Arithmetic • Texture memory • Dependent texturing • Multipass rendering flow control • NV_OCCLUSION_QUERY
Ray Engine – Main Idea • Ray-traingle intersection done by GPU • CPU-based renderer does everything else
Ray Engine Algorithm • Renderer sends ray textures to GPU • Ray origin and direction • Renderer sends ‘triangles’ down pipeline • Vertex interpolants of a screen aligned quad • GPU performs ray-triangle intersection tests • Short fragment program • Framebuffer stores closest hit point • Renderer reads back closest hit
Pixel Shader 1.4 Implementation Fixed Point Precision Problems
Ray Engine Results • Radeon 8500 fixed point implementation • 114 M ray-triangle intersections / s • Full precision simulator • 115K – 200K rays / s
Ray Engine Summary • GPU performs ray-triangle intersection • CPU-based renderer does everything else • Raw ray-triangle intersection rate is faster than CPU based approach • Total rays processed per second is slower than CPU • Readback limited
Streaming Ray Tracer – Main Ideas • Entire ray tracing computation can be done efficiently on the GPU • Minimal host interaction • Stream processor abstraction for programmable fragment processor
Streaming Ray Tracer Generate Eye Rays Camera Traverse Acceleration Structure Grid Intersect Triangles Triangles Shade Hits and Generate Shading Rays Materials
GPU Abstraction • Texture memory is memory • Think of dependent texture fetches as pointer dereferencing • Programmable fragment processor is a programmable stream processor • Think of multipass rendering as stream and kernel programming
Texture Memory Organization Uniform Grid 3D Luminance Texture vox0 vox1 vox2 vox3 vox4 vox5 voxM 0 3 11 38 … 564 Triangle List 1D Luminance Texture vox0 vox2 0 3 1 3 7 21 216 … tri0 tri1 tri2 tri3 tri4 tri5 triN Triangles 3x 1D RGB Textures xyz xyz xyz xyz xyz xyz … xyz v0 v1 xyz xyz xyz xyz xyz xyz … xyz xyz xyz xyz xyz xyz xyz … xyz v2
input record stream kernel globals kernel globals output record stream Stream Programming Model Programmable fragment processor is essentially a stream processor • Kernels and streams • Stream is a set of data records • Kernels operate on records • Streams connect kernels together • Kernels can read global memory
Streaming Flow Control Application and Geometry Stages Rasterization Fragments (Input Stream) Fragment Program (Kernel) Texture (Globals) Fragment Program Output (Output Stream)
Multiple Rendering Passes Pass 1 Generate Eye Rays Draw quad Rasterize
Multiple Rendering Passes Pass 1 Generate Eye Rays Run fragment program
Multiple Rendering Passes Pass 1 Generate Eye Rays Save to offscreen buffer (rays)
Multiple Rendering Passes Pass 2 Traverse Draw quad Rasterize
Multiple Rendering Passes Pass 2 Traverse Run fragment program Restore (rays)
Multiple Rendering Passes Pass 2 Traverse Save to offscreen buffer (ray voxel pr)
Demos Rendered using a Radeon 9700 Pro
Demos Rendered using a Radeon 9700 Pro
Demos Rendered using a Radeon 9700 Pro
Demos Rendered using a Radeon 9700 Pro
Streaming Ray Tracer Results • Simulations • 50M – 200M ray-triangle intersections/s • Radeon 9700 Pro Implementation • 100M ray-triangle intersections/s • 300K – 4.0M rays/s
Streaming Ray Tracer Summary • Entire ray tracing computation can be mapped efficiently to the GPU • Stream processor is a good abstraction for a programmable fragment processor
Ray Tracing in Hardware • Volume Rendering • [Meissner98],[Pfister99] • Offline Rendering • [ART01],[ART02] • Interactive Rendering • [Schmittler02]
SaarCOR – Main Idea • Scalable and efficient real time hardware ray tracer • Implementation based on Saarland RTRT
SaarCOR Implementation • Packet based ray tracer • Several custom cores • Computational units • Traversal, intersection, ray generation and shading • Memory units • Memory controller, caches, routers • Multithreaded • Standard DRAM memory on board • Virtual memory support for large scenes • Support for programmable shading
Simulated Performance 137 fps 59 fps Standard 4-pipeline SaarCOR 100M – 400M rays/s 44 fps 170 fps
Simulated Bandwidth Usage No VMA With VMA PCI 1.9MB 2.5MB 0.03MB 26.6MB 34.1MB 0.91MB 2.1MB 2.6MB 0.02MB 6.1MB 7.7MB 0.14MB
SaarCOR Summary • Scalable and efficient • Requires fewer FP units than GeForce3 • Low bandwidth requirements • Hides latency through multithreading • Fast frame rates
Conclusions • Real time ray tracing advantages • Physically correct renderings • High geometric complexity • Shading flexibility • Several options for real time ray tracing • Software, GPU, Hardware
Acknowledgments • Ian Buck, Bill Mark, Pat Hanrahan • James ‘RTD’ Percy, Pradeep Sen, Eric Chan • Matt Papakipos, Kurt Akeley - NVIDIA • Bob Drebin, Mark Peercy – ATI • Sponsors • ATI, MERL, NVIDIA, Sony, Sun • DARPA