1 / 26

Programmable Pipelines for Mainstream Graphics: Irregular and Heterogeneous Parallelism

Programmable Pipelines for Mainstream Graphics: Irregular and Heterogeneous Parallelism. John Owens Associate Professor, Electrical and Computer Engineering University of California, Davis Intel Science and Technology Center for Visual Computing 2 May 2011.

parker
Télécharger la présentation

Programmable Pipelines for Mainstream Graphics: Irregular and Heterogeneous Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programmable Pipelines for Mainstream Graphics: Irregular and Heterogeneous Parallelism • John Owens • Associate Professor, Electrical and Computer Engineering • University of California, Davis • Intel Science and Technology Center for Visual Computing • 2 May 2011

  2. Building Efficient Graphics Pipelines

  3. Frag. Shading Vertex Shading Geom. Shading Input Assembly Tess. Shading Ray-Primitive Intersection Raster Compose Ray Generation Ray Traversal Shading Composition Sampling Split Shading Dice The Programmable Pipeline

  4. Composition Split Dice Shading Sampling Reyes • What we’ve looked at: surface subdivision, rasterization (not described), composite, tasking (preliminary) • What we’re not doing: • Designing a new micropolygon pipeline • Wedging a micropolygon pipeline into a current hw pipeline

  5. Input surface Is bound > threshold ? Evaluate screen-space bound Split surface Split Yes No Dice Output micropolygons Reyes-Style Surface Subdivision Patney, Ebeida, and Owens. “Parallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces”. HPG ’09.

  6. Sample-Parallel Composition Anjul Patney, Stanley Tzeng, and John D. Owens.Fragment-Parallel Composite and Filter. EGSR, 29(4):1251–1258, June 2010.

  7. Frag. Shading Vertex Shading Geom. Shading Input Assembly Tess. Shading Ray-Primitive Intersection Raster Compose Ray Traversal Ray Generation Shading Composition Split Dice Shading Sampling The Programmable Pipeline ISTC-VC: Bricks & mortar: how do we allow programmers to build stages without worrying about assembling them together?

  8. Irregular Parallelism

  9. Recursive Subdivision is Irregular Patney, Ebeida, and Owens. “Parallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces”. HPG ’09.

  10. Static Task List Input Input Input Input Input SM SM SM SM SM Atomic Ptr Output Daniel Cederman and Philippas Tsigas, On Dynamic Load Balancing on Graphics Processors. Graphics Hardware 2008, June 2008. restart kernel

  11. Private Work Queue Approach • Allocate private work queue of tasks per core • Each core can add to or remove work from its local queue • Cores mark self as idle if {queue exhausts storage, queue is empty} • Cores periodically check global idle counter • If global idle counter reaches threshold, rebalance work gProximity: Fast Hierarchy Operations on GPU Architectures, Lauterbach, Mo, and Manocha, EG ’10

  12. Work Stealing & Donating Lock Lock • Cederman and Tsigas: Stealing == best performance and scalability (follows Arora CPU-based work) • We showed how to do this with multiple kernels in an uberkernel and persistent-thread programming style • We added donating to minimize memory usage ... I/O Deque I/O Deque I/O Deque I/O Deque I/O Deque SM SM SM SM SM Stanley Tzeng, Anjul Patney, and John D. Owens. Task Management for Irregular-Parallel Workloads on the GPU. In HPG 2010, pages 29–37, June 2010.

  13. Extensions to GPU Tasking • Dependencies • Right picture: Dynamic (left) vs.static (right) scheduling for H.264intra encoding • Priorities • Context: multitasking ISTC-VC: What does a massively-parallel tasking system look like?

  14. Heterogeneity

  15. PS3 & Fast Communication • CPUs are good at creating & manipulating data structures? • GPUs are good at accessing & updating data structures? http://www.watch.impress.co.jp/game/docs/20060329/3dps303.jpg

  16. Path Tracing on Hybrid Systems • Design principle: reorder & cache work until sufficient coherence is present Budge et al., “Out-of-core Data Management for Path Tracing on Hybrid Resources”, Eurographics ‘09

  17. Scheduling tweaks • N : no static data required • G : data resides on scheduling GPU • O : data resides on other GPU • C : data resides in main RAM • T : data is being transferred to RAM • P : preferred to run on scheduling unit • S : like P, workload is too small to run efficiently Big picture: scheduling is a function of data properties

  18. GRAMPS concepts Big picture: scheduling is a function of topology • Generalizes real-time graphics pipelines –> graphs • Exposes execution model with fixed-function and programmable stages connected by queues • Target: processors >> computation phases Sugerman et al., “GRAMPS: A Programming Model for Graphics Pipelines”, ACM TOG January 2009

  19. Intel 4-Core Sandy Bridge ISTC-VC: What does a massively-parallel, heterogeneous tasking system look like? Belief: Scheduling is vital here. HW? SW? http://www.anandtech.com/show/4083

  20. Building Complex, Irregular, Heterogeneous Applications

  21. Reverse Acceleration

  22. Johan Andersson, Beyond Programmable Shading, Siggraph course ’09

  23. Task stealing/migration Heterogeneity? Massive parallelism? http://software.intel.com/en-us/blogs/2010/12/09/tbb-scheduler-clandestine-evolution/

  24. Toward Exascale • LANL visit 13–15 April • 7 candidates for “exascale design centers” • “Codesign”—put everyone in same room • Need for: • Redesign of abstractions / interfaces • Ability to reuse/port existing codes • Outside bounds of ISTC-VC but very interesting to me

  25. What I Want • Doug Carmean to give me the numbers he promised • Collaborators: • Hard parallel problems (data structures, algorithms) • Interest in helping to build plumbing • Interest in using the plumbing we build • Kavita & Ravi: rendering • Ron & Michael Lentine: PhysBAM • Your thoughts on future platforms & what Intel can help / lead/ support / deliver

More Related