Ray Tracing on Programmable GPUs

Ray Tracing on Programmable GPUs

Application Command Geometry Rasterization Texture Fragment Display Graphics Pipeline Fragment Input Textures Fragment Program Registers Fragment Output Traditional Pipeline Programmable Fragment Pipeline

Fragment Processing Features • Rich instruction set • No branching yet (see PS 3.0 spec) • Floating point • Arithmetic • Texture memory • Dependent texturing • Multipass rendering flow control • NV_OCCLUSION_QUERY

The Ray Engine [Carr02]

Ray Engine – Main Idea • Ray-traingle intersection done by GPU • CPU-based renderer does everything else

Ray Engine Algorithm • Renderer sends ray textures to GPU • Ray origin and direction • Renderer sends ‘triangles’ down pipeline • Vertex interpolants of a screen aligned quad • GPU performs ray-triangle intersection tests • Short fragment program • Framebuffer stores closest hit point • Renderer reads back closest hit

Pixel Shader 1.4 Implementation Fixed Point Precision Problems

Full Precision Simulations

Ray Engine Results • Radeon 8500 fixed point implementation • 114 M ray-triangle intersections / s • Full precision simulator • 115K – 200K rays / s

Ray Engine Summary • GPU performs ray-triangle intersection • CPU-based renderer does everything else • Raw ray-triangle intersection rate is faster than CPU based approach • Total rays processed per second is slower than CPU • Readback limited

Streaming Ray Tracer [Purcell02]

Streaming Ray Tracer – Main Ideas • Entire ray tracing computation can be done efficiently on the GPU • Minimal host interaction • Stream processor abstraction for programmable fragment processor

Streaming Ray Tracer Generate Eye Rays Camera Traverse Acceleration Structure Grid Intersect Triangles Triangles Shade Hits and Generate Shading Rays Materials

GPU Abstraction • Texture memory is memory • Think of dependent texture fetches as pointer dereferencing • Programmable fragment processor is a programmable stream processor • Think of multipass rendering as stream and kernel programming

Texture Memory Organization Uniform Grid 3D Luminance Texture vox0 vox1 vox2 vox3 vox4 vox5 voxM 0 3 11 38 … 564 Triangle List 1D Luminance Texture vox0 vox2 0 3 1 3 7 21 216 … tri0 tri1 tri2 tri3 tri4 tri5 triN Triangles 3x 1D RGB Textures xyz xyz xyz xyz xyz xyz … xyz v0 v1 xyz xyz xyz xyz xyz xyz … xyz xyz xyz xyz xyz xyz xyz … xyz v2

input record stream kernel globals kernel globals output record stream Stream Programming Model Programmable fragment processor is essentially a stream processor • Kernels and streams • Stream is a set of data records • Kernels operate on records • Streams connect kernels together • Kernels can read global memory

Streaming Flow Control Application and Geometry Stages Rasterization Fragments (Input Stream) Fragment Program (Kernel) Texture (Globals) Fragment Program Output (Output Stream)

Multiple Rendering Passes Pass 1 Generate Eye Rays Draw quad Rasterize

Multiple Rendering Passes Pass 1 Generate Eye Rays Run fragment program

Multiple Rendering Passes Pass 1 Generate Eye Rays Save to offscreen buffer (rays)

Multiple Rendering Passes Pass 2 Traverse Draw quad Rasterize

Multiple Rendering Passes Pass 2 Traverse Run fragment program Restore (rays)

Multiple Rendering Passes Pass 2 Traverse Save to offscreen buffer (ray voxel pr)

Demos Rendered using a Radeon 9700 Pro

Streaming Ray Tracer Results • Simulations • 50M – 200M ray-triangle intersections/s • Radeon 9700 Pro Implementation • 100M ray-triangle intersections/s • 300K – 4.0M rays/s

Streaming Ray Tracer Summary • Entire ray tracing computation can be mapped efficiently to the GPU • Stream processor is a good abstraction for a programmable fragment processor

Dedicated Hardware Ray Tracing

Ray Tracing in Hardware • Volume Rendering • [Meissner98],[Pfister99] • Offline Rendering • [ART01],[ART02] • Interactive Rendering • [Schmittler02]

SaarCOR – Main Idea • Scalable and efficient real time hardware ray tracer • Implementation based on Saarland RTRT

SaarCOR Implementation • Packet based ray tracer • Several custom cores • Computational units • Traversal, intersection, ray generation and shading • Memory units • Memory controller, caches, routers • Multithreaded • Standard DRAM memory on board • Virtual memory support for large scenes • Support for programmable shading

SaarCOR Architecture

SaarCOR Test Scenes

Simulated Performance 137 fps 59 fps Standard 4-pipeline SaarCOR 100M – 400M rays/s 44 fps 170 fps

Simulated Bandwidth Usage No VMA With VMA PCI 1.9MB 2.5MB 0.03MB 26.6MB 34.1MB 0.91MB 2.1MB 2.6MB 0.02MB 6.1MB 7.7MB 0.14MB

SaarCOR Summary • Scalable and efficient • Requires fewer FP units than GeForce3 • Low bandwidth requirements • Hides latency through multithreading • Fast frame rates

Conclusions for Part I

Conclusions • Real time ray tracing advantages • Physically correct renderings • High geometric complexity • Shading flexibility • Several options for real time ray tracing • Software, GPU, Hardware

Backup

Acknowledgments • Ian Buck, Bill Mark, Pat Hanrahan • James ‘RTD’ Percy, Pradeep Sen, Eric Chan • Matt Papakipos, Kurt Akeley - NVIDIA • Bob Drebin, Mark Peercy – ATI • Sponsors • ATI, MERL, NVIDIA, Sony, Sun • DARPA

Ray-Triangle Intersection as a Crossbar

Rasterization as a Crossbar

Ray Tracing on Programmable GPUs

Ray Tracing on Programmable GPUs

Presentation Transcript

Ray Tracing

Ray Tracing on GPU

Ray Tracing and Photon Mapping on GPUs

Ray-tracing

Ray Tracing

Ray Tracing

Ray Tracing

Ray Tracing

Ray Tracing

RPU: A Programmable Ray Processing Unit for Realtime Ray Tracing

More on Ray Tracing

Ray Tracing

Ray Tracing using Programmable Graphics Hardware

Ray Tracing

Ray Tracing

Ray Tracing

Ray Tracing

Ray Tracing

Ray Tracing

Sea Ice

Sea Ice