1 / 62

Maximizing Parallelism in the Construction of BVHs, Octrees , and k -d Trees

Maximizing Parallelism in the Construction of BVHs, Octrees , and k -d Trees. Tero Karras NVIDIA Research. Trees. Trees. Better. Faster. Ageia. . Collision detection. Pharr & Humphreys. NVIDIA. Path tracing. NVIDIA. Real-time ray tracing. Particle simulation. Crassin et al.

wylie-keith
Télécharger la présentation

Maximizing Parallelism in the Construction of BVHs, Octrees , and k -d Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research

  2. Trees

  3. Trees Better Faster Ageia  Collision detection Pharr & Humphreys NVIDIA Path tracing NVIDIA Real-time ray tracing Particle simulation Crassin et al. Amenta et al. Voxel-based global illumination Uchida Surface reconstruction Photon mapping

  4. Outline • Fastest existing methods are sequential • Parallelize within each hierarchy level • But not between levels

  5. Outline • Fastest existing methods are sequential • Parallelize within each hierarchy level • But not between levels • Lack of parallelism • Small workloads bottlenecked by top levels • Sub-linear scaling of performance

  6. Outline • Novel way to build the entire tree in parallel • Two algorithmic “building blocks” • Fast, scalable

  7. Outline • Novel way to build the entire tree in parallel • Two algorithmic “building blocks” • Fast, scalable • Main focus: BVHs • Point-based octrees and k-d trees also covered in the paper

  8. Bounding volume hierarchy

  9. Bounding volume hierarchy ?

  10. LBVH - Lauterbach et al. [2009] • Assign Morton codes • Sort primitives • Generate hierarchy • Fit bounding boxes p 1 0 1 0 px = 0. py = 0. 0 1 1 1 1 1 0 0 pz = 0.

  11. LBVH - Lauterbach et al. [2009] • Assign Morton codes • Sort primitives • Generate hierarchy • Fit bounding boxes p 1 0 1 1 0 0 1 0 px = 0. py = 0. 0 1 1 0 1 1 1 1 1 1 0 0 1 1 0 0 pz = 0.

  12. LBVH - Lauterbach et al. [2009] • Assign Morton codes • Sort primitives • Generate hierarchy • Fit bounding boxes p 1 0 1 0 px = 0. py = 0. code = 1 0 1 0 1 1 1 1 0 0 1 0 1 1 0 0 pz = 0.

  13. LBVH - Lauterbach et al. [2009] • Assign Morton codes • Sort primitives • Generate hierarchy • Fit bounding boxes

  14. LBVH - Lauterbach et al. [2009] • Assign Morton codes • Sort primitives • Generate hierarchy • Fit bounding boxes

  15. Binary radix tree 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  16. Binary radix tree 00 0 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  17. Binary radix tree 00 000 0 00 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  18. Binary radix tree 00 1 11 000 0010 1100 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  19. Binary radix tree 00 1 n-1 11 000 0010 1100 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110 n

  20. Longest common prefix 00 1 11 000 0010 1100 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  21. Longest common prefix δ(5,6) = 4 00 δ(5,6) = 4 4 1100 4 0 1 2 3 4 5 5 6 6 7 00001 00010 00100 00101 10011 11000 11000 11001 11001 11110

  22. Longest common prefix δ(0,3) = 2 δ(0,3) = 2 δ(0,3) = ? 00 2 δ(5,6) = 4 2 1100 4 0 0 1 2 3 3 4 5 6 7 00001 00001 00010 00100 00101 00101 10011 11000 11001 11110 

  23. Garanzha et al. [2011] Level 0 1 node 0 Level 1 2 nodes 1 2 3 nodes Level 2 4 5 3 Level 3 1 node 6 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  24. Our method ? 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  25. Our method • Define a numbering scheme for the nodes • Gain some knowledge of their identity • Establish a connection with the keys

  26. Our method • Define a numbering scheme for the nodes • Gain some knowledge of their identity • Establish a connection with the keys • Find the children of a given node • Only look at node index and nearby keys

  27. Our method • Define a numbering scheme for the nodes • Gain some knowledge of their identity • Establish a connection with the keys • Find the children of a given node • Only look at node index and nearby keys • Do this for all nodes in parallel

  28. Numbering scheme 0 3 4 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  29. Numbering scheme 0 3 4 5 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  30. Numbering scheme 0 0 3 3 4 4 1 1 2 2 5 5 6 6 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  31. Numbering scheme 0 3 4 1 2 5 6 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  32. Algorithm 0 3 4 1 2 5 6 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  33. Algorithm ?  δ(2,3) = 4 δ(3,4) = 0 δ(2,3) = 4 3 0 1 2 2 3 3 3 4 4 5 6 7 00001 00010 00100 00100 00101 00101 00101 10011 10011 11000 11001 11110

  34. Algorithm  δ(1,3) = 2 δ(3,4) = 0 δ(0,3) = 2 δ(2,3) = 4 ? 3 0 1 2 3 3 4 4 5 6 7 00001 00010 00100 00101 00101 10011 10011 11000 11001 11110

  35. Algorithm δ(0,3) = 2 3 ?   δ(2,3) = 4 δ(1,3) = 2 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  36. Algorithm δ(0,3) = 2 3 1 2 2 2 0 1 2 3 4 5 6 7 00001 00010 00100 00101 10011 11000 11001 11110

  37. Algorithm For each node i=0..n-2 in parallel: • Determine direction of the range • Expand the range as far as possible • Find where to split the range • Identify children Binary search

  38. Duplicate keys • The algorithm only works with unique keys • Duplicates are common in practice

  39. Duplicate keys • The algorithm only works with unique keys • Duplicates are common in practice • Trick: Augment each key with its index • Distinguishes between duplicates • Keys are still in lexicographical order

  40. Duplicate keys • The algorithm only works with unique keys • Duplicates are common in practice • Trick: Augment each key with its index • Distinguishes between duplicates • Keys are still in lexicographical order • Tie-break when evaluating δ(i,j)

  41. LBVH • Assign Morton codes • Sort primitives • Generate hierarchy • Fit bounding boxes

  42. Lauterbach et al. [2009]

  43. Our method • Need a different approach • How many levels are there? • Which nodes are located on a given level?

  44. Our method • Need a different approach • How many levels are there? • Which nodes are located on a given level? • Traverse paths in the tree in parallel • Start from leaves, advance toward the root • Terminate threads using per-node atomic flags

  45. Our method             

  46. Results • Evaluate performance on GTX 480 (Fermi) • CUDA, 30-bit Morton codes

  47. Results • Evaluate performance on GTX 480 (Fermi) • CUDA, 30-bit Morton codes • Compare against Garanzha et al. [2011] • Identical tree (top-level SAH splits disabled)

  48. Results • Evaluate performance on GTX 480 (Fermi) • CUDA, 30-bit Morton codes • Compare against Garanzha et al. [2011] • Identical tree (top-level SAH splits disabled) • Simulate large GPUs • N times as many cores • N times the memory bandwidth

  49. Fairy Forest 174K triangles Results milliseconds 1 × cores 2 × 4 × Our method Garanzha et al.

  50. Fairy Forest 174K triangles Results milliseconds 12.5 × 33.6 × 1.7 × 1.3 × 1.1 × 2.4 × Our method Garanzha et al.

More Related