1 / 35

Aaron Lefohn University of California, Davis

A Dynamic Adaptive Multi-resolution GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver. Aaron Lefohn University of California, Davis. Problem Statement. Goal Dynamic, adaptive, multi-resolution GPU data structure Efficient read, data-write, structure change

javen
Télécharger la présentation

Aaron Lefohn University of California, Davis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Dynamic Adaptive Multi-resolution GPU Data StructureAdaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Aaron Lefohn University of California, Davis

  2. Problem Statement • Goal • Dynamic, adaptive, multi-resolution GPU data structure • Efficient read, data-write, structure change • Adaptive shadow maps, octree 3D paint, adaptive PDE solver • Challenges • All operations must be data-parallel • Trees difficult to update and cause incoherent accesses • Solution • Leverage virtual memory research from architecture • Page-table based structure • Decouple levels of indirection from resolution levels • Easy implementation with the Glift template library

  3. Collaborators • Joe KnissUniversity of Utah • Robert StrzodkaCAESAR Research Institute • Shubhabrata SenguptaUniversity of California, Davis • John OwensUniversity of California, Davis

  4. Assumptions • This talk heavily relies on the contents of the “Glift” generic data structure talk

  5. Is This GPGPU Programming? • Yes • Inseparable mix of GPGPU stream programming and traditional graphics • High-quality interactive rendering • Updating complex GPU data structures

  6. Previous Work • Binotto et al. • Carr et al. • Coombe et al. • Ertl et al. • Lefebvre et al. • Purcell et al.

  7. Why A New Structure? • What’s Missing? • Fully GPU-based adaptive multi-resolution structure • GPU based address translator • GPU based updates of address translator • Trilinear/Quadlinear mipmap filtering support • Uniform, coherent memory accesses

  8. Application Adaptive Shadow Maps • Fernando et al., ACM SIGGRAPH 2001 • Elegant solution to shadow map aliasing • Quadtree of small shadow maps • Shadow maps need resolution only on shadow boundary • Required resolution determined by projected area of screen space pixel into light space

  9. Application Adaptive Shadow Maps • Why Adaptive Shadow Maps? • Many recent (2004) shadow papers cite ASMs as high quality solution but not possible on graphics hardware • Algorithm is simple. Data structure is hard.

  10. Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase

  11. Application ASM Data Structure • Start with page table address translator • Coarse, uniform discretization of virtual domain • O(N) memory O(1) insert • O(1) computation O(1) erase • Uniform consistency • Partial mapping (sparse)

  12. ppa = pageTable(vpn) vpn = va / pageSize off = va % pageSizepa = ppa + off Application ASM Data Structure • Page table example Virtual Domain Page Table Physical Memory

  13. Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase

  14. ppa = pageTable(vpn).ppa() vpn = va / pageSize s = pageTable(vpn).s()off = (va * s) % pageSizepa = ppa + off Application ASM Data Structure • Adaptive Page Table • Map multiple virtual pages to single physical page Virtual Domain Page Table Physical Memory

  15. Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase

  16. Application ASM Data Structure • Multiresolution Page Table MipmapPage Table Virtual Domain Physical Memory

  17. Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase

  18. Application ASM Data Structure Requirements • How support bilinear filtering? • Duplicate 1 column and 1 row of texels in each page • Mipmapped trilinear? • “By-hand” interpolation between mipmap levels

  19. Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase

  20. Application How Define ASM Structure in Glift? • Start with generic page table AddrTrans • Use mipmapped PhysMem for page table • Change template parameter to add adaptivity • Write page allocator • alloc_pages, free_pages • Finally… typedef PageTableAddrTrans<…> PageTable;typedef PhysMemGPU<vec2f, vec1s> PMem2D;typedef VirtMemGPU<PageTable, PMem2D> VPageTable;typedef AdaptiveMem<VPageTable, PageAllocator> ASM;

  21. Application ASM Data Structure Usage float4 main( uniformVMem2D asm, float3 shadowCoord, float4 litColor ) : COLOR { float isInLight = asm.vTex2Ds( shadowCoord ); return lerp( black, litColor, isInLight ); } asm.bind_for_read( … ); asm.bind_for_write( … ); asm.alloc_pages( … ); asm.free_page( … ); …

  22. Application Adaptive Shadow Map Algorithm • Faithful to Fernando et al. 2001 • Refinement algorithm • Identify shadow pixels w/ resolution mismatch (GPU) • Compact pixels into small stream (GPU) • See “Stream Reduction Operations for GPGPU Applications” Daniel Horn, GPU Gems II, Ch. 36 • CPU reads back compacted stream (GPUCPU) • Allocate pages • Draw new PTEs into mipmap page tables (CPUGPU) • Draw depth into ASM for each new page (GPU)

  23. ASM: Effective resolution 131,0722 (37 MB); SM: 20482 [Thanks to Yong Kil for the tree model]

  24. Application “Octree” 3D Paint • Interactive painting on unparameterized 3D surfaces • 3D version of ASM data structure • Differs from previous work: • Quadrilinear filtering • O(1), uniform access • Interactive witheffectiveresolutionsbetween643 and 20483

  25. Movie Demo

  26. Application ASM Results • Effective shadow map resolution up to 131,0722 162 - 642 page size5122 - 20482 page table20482 - 40962physical memory20 - 80 MB • Performance (45k polygon model) • 15 fps while moving camera (including refinement) • 5-10 fps while moving light • Lookup time compared to 20482 shadow map: • Bilinear filtered: 90% performance of traditional • Trilinear filtered mipmapped: 73%

  27. ASM Results • Bottlenecks and Limitations • Stream compaction (50% - 85% of frame rate) • Missing frustum culling (1-2fps w/1M poly model) • Need framerate guarantee • Missing large PCFs • Better cache replacement strategy • 2-level page table

  28. Page Table Memory Coherency • 1- and 2-level page tables bandwidth bound below 8 x 8 page RGBA8 textures, NVIDIA GeForce 6800 GT, NVIDIA driver 75.22, Cg 1.4a

  29. Application Static Instruction Counts • Static instruction results • With Cg program specialization Glift By-Hand • ASM 9 9 • Octree 10 9 • ASM + offset 10 9 • Conclusion : Glift structures within 1 instr of hand-coded Cg Measured with NVShaderPerf, NVIDIA driver 75.22, Cg 1.4a

  30. Overview • Motivation and previous work • Abstraction • Glift template library • Case study • Adaptive shadow maps and octree 3D paint • Conclusions

  31. Conclusions • Dynamic adaptive multires data structure • Coherent accesses if pages are larger than 8 x 8 • Decouple levels of indirection from levels of resolution • Page table literature • Continuum all the way from 1-level to full tree • Based on assumption that accesses are coherent within page

  32. Conclusions • Adaptive Shadow Maps • Interactive camera movements at 10 – 20fps • Can rebuild entire ASM each frame at 5 – 10fps • Trilinear mipmap filtering removes “popping” • Support up to 131,0722 with 1 level of indirection • Add levels of indirection to reduce page table size, reduce page size, or get higher effective resolutions • How fast can ASMs get with full optimizations and hardware support for stream compaction?

  33. Conclusions • Octree 3D Paint • Fully interactive 3D painting on unparameterized models • Frame rates bound by geometry, not octree texture • Support up to 20483 with 1 level of indiretion • Add indirections to reduce page table memory or support higher effective resolutions • Quadlinear filtering on octree 3D texture • Future : Add support for “normal flags”

  34. Acknowledgements • Craig Kolb, Nick Triantos NVIDIA • Fabio Pellacini Cornell/Pixar • Adam Moerschell, Yong Kil UCDavis Serban Porumbescu, Chris Co, …. • Ross Whitaker, Chuck Hansen, Milan Ikits U. of Utah • National Science Foundation • Department of Energy

  35. More Information • Upcoming ACM Transactions on Graphics paper • “Glift : An Abstraction for Generic, Efficient GPU Data Structures” • Predicted, October 2005 • ASM and Octree submitted as SIGGRAPH sketches • Google “Lefohn GPU”

More Related