1 / 34

Ray Tracing in CUDA

Ray Tracing in CUDA. Andrei Monteiro Marcelo Gattass Assignment 3 June 2010. Topics. Motivation Related Work Grid Construction Ray Tracing in CUDA Results Conclusion References. Motivation.

Télécharger la présentation

Ray Tracing in CUDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ray Tracing in CUDA Andrei Monteiro Marcelo Gattass Assignment 3 June 2010

  2. Topics • Motivation • Related Work • Grid Construction • Ray Tracing in CUDA • Results • Conclusion • References

  3. Motivation • Ray Tracing is a technique for generating an image by launching rays for each pixel and calculating its intersections with the scene objects. • Simulates several effects naturally, such as reflection and refraction, producing a very high degree of virtual realism. • Computationally expensive. • Can use different acceleration structures. (Kd-trees, Uniform grid, BVH) • Why CUDA? • Designed for General-Purpose Computing. • Construction of Grid is faster than other structures (e.g. Kd-trees, BVH). • Provides natural compactness, avoiding memory waste in contrast with the stencil routing algorithm using GLSL. • Use of shared memory speed up construction. • Fast data transfers. • Atomic operations

  4. Motivation

  5. Related Work • Uniform Grid • A Parallel Algorithm for Construction of Uniform Grids, Kalojanov, J. • GPU-Accelerated Uniform Grid Construction for Ray Tracing Dynamic Scenes, Ivson, P., Duarte, L., Celes, W. • Ray Tracing • Understanding the Efficiency of Ray Traversal on GPUs, Aila, T. NVIDIA. • Ray Tracing on Programmable Graphics Hardware, Purcell, T. • Ray Tracing Animated Scenes using Coherent Grid Traversal, Wald et al.

  6. Grid Construction • Uniform Grid • Speed Up simulation, avoids going through every scene object to test intersection. • Supports Dynamic Scenes. • Each voxel contains a list of primitives • The ray traverses the grid. • Grid Resolution

  7. Grid Construction • Algorithm in CUDA: • Insert triangles in voxels • Calculate grid hash table • Sort the pairs • Write cell start and end • Reorder particles

  8. Grid Construction • Insert triangles in voxels • Bounding Box (corners) • Check triangle plane Intersection • Avoids more than the same reference of the triangle inside the voxel. Contained in same voxel 4 times Contained in more than one voxel

  9. Grid Construction 9 3 5 • Grid Hash Table • Pair Cell Index – Particle Index. • E.g. Cell Dimension = 3, Grid resolution = 3x3 6 7 8 6 0 2 4 3 4 5 3 7 1 6 8 0 1 2 0 0 3 6 9 PARTICLES: 1 2 0 3 4 5 6 7 8 4 4 0 1 2 0 8 4 0 5 7 0 0 3 0 8 8 0 3 1 0 8 1 0 7 1 0 HASH: 4 0 5 7 3 8 1 2 2 Cell Index 0 1 2 3 4 5 6 7 8 Particle Index

  10. Grid Construction • Sorting the Pairs • In order to calculate the cells´start and end, it is necessary to order particles in respect to cell indices which they belong. • Actually, the application sorts the previous hash table with respect to their keys, or cell indices. • Use of Radix Sort from CUDA SDK.

  11. Grid Construction • Sorting the Pairs • Sort Hash table by key values (cell indices). HASH: 4 0 5 7 3 8 1 2 2 Cell Index 0 1 2 3 4 5 6 7 8 Particle Index Sorted HASH: 0 1 2 2 3 4 5 7 8 Cell Index 6 1 7 8 4 0 2 3 5 Particle Index

  12. Grid Construction • Finding Cell Start/End and Reordering Particles. Sorted HASH: 0 1 2 3 4 5 6 7 8 ... Current Thread 0 0 1 1 1 2 2 2 1 Cell Index CellStart [CellIndex[thread_id]] = 2 ≠ Cell 0: end = 2 CellIndex [thread_id - 1] CellIndex [thread_id] CellEnd [Cell Index [thread_id -1]] = 2 Cell 1: start = 2 Cell Start/End: 0 1 2 3 4 5 6 7 8 ... Cell Index 0/2 2/6 6/... Cell Start / End

  13. Ray Tracing in CUDA • Can be easily parallelized • Each thread is responsible for one pixel / ray intersection. • Problems that slow performance: • Internal Loops • Cause threads to diverge. • Random memory access • Causes bank conflicts, non-coalesce reading

  14. Ray Tracing in CUDA • Kernels • Build grid (if scene changes) • Setup rays (if camera moved) • Lauch Rays • Get Hits • Get Shadow Hits • Get Reflection Hits (repeat) • Shade

  15. Ray Tracing in CUDA • Setup Rays • Calculate the ray equation for each pixel. • One thread per pixel

  16. Ray Tracing in CUDA • Lauch Rays • Calculates the ray-grid intersection, if any. For rays that do not intersect the grid, they are discarded for the next steps. • Returns the first cell intersection and the parameters for traversing the grid. p(t)

  17. Ray Tracing in CUDA • Get Hits • The most expensive steps of the simulation. • Typical algorithm: • Problem: Causes too much thread divergency. • Solution: Use while-while algorithm • Causes less divergency While (not hit or ray inside grid) { Traverse cell; if (! Cell empty) { for each triangle in cell { get hit (); } } }

  18. Ray Tracing in CUDA • while- while algorithm while-while trace(): while ray not terminated while node does not contain primitives traverse to the next node while node contains untested primitives perform a ray-primitive intersection test

  19. Ray Tracing in CUDA • Get Hits • Traversal Algorithm: 3D DDA if (nextx < nexty) nextx += deltax X += 1; else nexty += deltay Y += 1; process_grid(X, Y);

  20. Ray Tracing in CUDA • Get Hits • Triangle Intersection • Möller • Enables face culling. • Greatly increased performance • Careful: • Triangles can be in more than one voxel, so it´s necessary to check if the intersection point is in the current voxel.

  21. Ray Tracing in CUDA • Increase efficiency • The internal loops make threads diverge and thus lower performance. • To contour this problem, NVIDIA researcher T. Aila included a method called Persistent Threads in CUDA. • The idea is to keep threads busy while at least one of them is not done. • Increased performance depends on the GPU. • 9800 GX2: 2.2x increase • GTX 480: 3.0x increase

  22. Ray Tracing in CUDA • Persistent threads implementation code

  23. Ray Tracing in CUDA • Shade • Linear Interpolation using baricentric coordinates • Normal • Texture • Texture • Used CUDA 3D Texture to support variable number of scene textures. • Phong Shading

  24. Results • Real-Time Ray Tracing Performance • Depends on: • Grid resolution • Number of primitives • Camera in/outside grid • Shadow Pass • Reflection Passes (1 or more times) • Scenes with reflections and many primitives vary about 20~30 fps

  25. Results

  26. Results

  27. Results

  28. Results

  29. Results

  30. Results

  31. Results

  32. Results

  33. Conclusion • The user was able to replicate physical effects. • CUDA is slower compared to other languages (e.g. GLSL) if not optimizing and use its maximum optimization resources. • There are still several optimizations pending in this work. • Math • CUDA threads and kernels • Too much memory used

  34. References • Kalojanov, J. A Parallel Algorithm for Construction of Uniform Grids. High Performance Graphics, 2009. Retrieved in Apr 21 2010. • Ivson, P., Duarte, L., Celes, W.,GPU-Accelerated Uniform Grid Construction for ray Tracing Dynamic Scenes. • Understanding the Efficiency of Ray Traversal on GPUs, Aila, T. NVIDIA Research. Retrieved May 23, 2010. • Ray Tracing on Programmable Graphics Hardware, Purcell, T. Stanford University. Retrieved May 28, 2010. • Ray Tracing Animated Scenes using Coherent Grid Traversal, Wald et al. SCI Institute, University of Utah. Retrieved May 25, 2010 • NVIDIA CUDA Programming Guide. V. 2.0, 2008. Retrieved Mar 29, 2010.

More Related