1 / 30

Fragment-Parallel Composite and Filter

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis. Parallelism in Interactive Graphics. Well-expressed in hardware as well as APIs Consistently growing in degree & expression More and more cores on upcoming GPUs

kaleb
Télécharger la présentation

Fragment-Parallel Composite and Filter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

  2. Parallelism in Interactive Graphics • Well-expressed in hardware as well as APIs • Consistently growing in degree & expression • More and more cores on upcoming GPUs • From programmable shaders to pipelines • We should rethink algorithms to exploit this • This paper provides one example • Parallelization of composite/filter stages

  3. A Feed-Forward Rendering Pipeline Primitives Geometry Processing Rasterization Composite Filter Pixels

  4. Composite & Filter Sample Locations Pixel • Input: • Unordered list of fragments • Output • Pixel colors • Assumption • No fragments are discarded

  5. Basic Idea Processors Pixel-Parallel

  6. Basic Idea Processors Insufficient parallelism Fragment-Parallel Irregularity

  7. Motivation • Most applications have low depth complexity • Pixel-level parallelism is sufficient • We are interested in applications with • Very high depth complexity • High variation in depth complexity • Further • Future platforms will demand more parallelism • High depth-complexity can limit pixel-parallelism

  8. Motivation

  9. Related Work 1 Maximum MSAA samples per pixel 2 Maximum render targets Order-Independent Transparency (OIT) • Depth-Peeling [Everitt 01] • One pass per transparent layer • Stencil-Routed A-buffer [Myers & Bavoil 07] • One pass per 8 depth layers1 • Bucket Depth-Peeling [Liu et al. 09] • One pass per up to 32 layers2

  10. Related Work Order-Independent Transparency (OIT) • OIT using Direct3D 11 [Gruen et al. 10] • Use fragment linked-lists • Per-pixel sort and composite • Hair Self-Shadowing [Sintorn et al. 09] • Each fragment computes its contribution • Assumes constant opacity

  11. Related Work Programmable Rendering Pipelines • RenderAnts[Zhou et al. 09] • Sort fragments globally • Per-pixel composite/filter • FreePipe[Liu et al. 10] • Sort fragments globally • Per-pixel composite/filter

  12. Pixel-Parallel Formulation Pi P(i+1) P(i+2) Sj j S(j+1) (j+1) (j+2) S(j+2) S(j+3) (j+3) S(j+4) (j+4) (j+5) S(j+5) (j+6) S(j+6) Thread IDs P: Pixel S: Subsample

  13. Fragment-Parallel Formulation Pi P(i+1) P(i+2) Sj S(j+1) S(j+2) S(j+3) S(j+4) S(j+5) S(j+6) j j+1 j+2 j+3 j+4 j+5 j+6 j+7 j+8 j+9 j+10 j+11 j+12 j+13 j+14 j+15 j+16 j+17 j+18 j+19 j+20 j+21 j+22 j+23 P: Pixel S: Subsample P: Pixel S: Subsample Thread IDs

  14. Fragment-Parallel Formulation fragment 1 fragment 2 … background Cs = α1C1 + (1-α1){α2C2+(1-α2)(…(αN+(1-αN)CB)…} Cs = 1.α1.C1 + (1-α1).α2.C2 + (1-α1)(1-α2).α3.C3 + … + (1-α1)(1-α2)…(1-αk-1).αi.Ck + … + (1-α1)(1-α2)…(1-αN).CB Local Contribution Lk Global Contribution Gk How can this behavior be achieved? Revisit the composite equation

  15. Fragment-Parallel Formulation Cs = G1.L1 + G2.L2 + G3.L3 … GN.LN Gk = (1-α1).(1-α2)…(1-αk-1) Lk = αk.Ck • Lk is trivially parallel (local computation) • Gk is the result of a scan operation (product) • For the list of input fragments • Compute G[ ] and L[ ], multiply • Perform reduction to add subpixel contributions

  16. Fragment-Parallel Formulation • Cp = Cs1.κ1 + Cs2.κ2 + … + CsM.κM • Filter, for every pixel: • This can be expressed as another reduction • After multiplying with subpixel weights κm • Can be merged with previous reduction

  17. Fragment-Parallel Composite & Filter Final Algorithm • Two-key sort (Subpixel ID, depth) • Segmented Scan (obtain Gk) • Premultiply with weights (Lk, κm) • Segmented Reduction

  18. Fragment-Parallel Formulation Pi P(i+1) P(i+2) Segmented Scan (product) Segmented Reduction (sum) P: Pixel S: Subsample P: Pixel S: Subsample

  19. Implementation • Hardware used: NVIDIA GeForce GTX 280 • We require fast Segmented Scan and Reduce • CUDPP library provides that • Restricts implementation to NVIDIA CUDA • No direct access to hardware rasterizer • We wrote our own

  20. Example System – Polygons • Applications • Games • Depth Complexity • 1 to few tens of layers • Suited to pixel-parallel • Fragment-parallel software rasterizer

  21. Example System – Particles • Applications • Simulations, games • Depth Complexity • Hundreds of layers • High depth-variance • Particle-parallel sprite rasterizer

  22. Example System – Volumes • Applications • Scientific Visualization • Depth Complexity • Tens to Hundreds of layers • Low depth-variance • Major-axis-slice rasterizer

  23. Example System – Reyes • Applications • Offline rendering • Depth Complexity • Tens of layers • Moderate depth variance • Data-parallel micropolygon rasterizer

  24. Performance Results

  25. Performance Variation

  26. Limitations • Increased memory traffic • Several passes through CUDPP primitives • Unclear how to optimize for special cases • Threshold opacity • Threshold depth complexity

  27. Summary and Conclusion • Parallel formulation of composite equation • Maps well to known primitives • Can be integrated with filter • Consistent performance across varying workloads • FPC is applicable to future rendering pipelines • Exploits higher degree of parallelism • Better related to size of rendering workload • A tool for building programmable pipelines

  28. Future Work • Performance • Reduction in memory traffic • Extension to special-case scenes • Hybrid PPC-FPC formulations • Applications • Integration with hardware rasterizer • Cinematic rendering, Photoshop

  29. Acknowledgments • NSF Award 0541448 • SciDACInsitute for Ultrascale Visualization • NVIDIA Research Fellowship • Equipment donated by NVIDIA • Discussions and Feedback • ShubhoSengupta (UC Davis), Matt Pharr (Intel), Aaron Lefohn (Intel), Mike Houston (AMD) • Anonymous reviewers • Implementation assistance • Jeff Stuart, ShubhoSengupta

  30. Thanks!

More Related