1 / 20

Compiling Metaprogrammed Shaders to Stream GPUs

Compiling Metaprogrammed Shaders to Stream GPUs. Michael D. McCool Computer Graphics Lab University of Waterloo Graphics Hardware 2003. Topics. GPUs are “Stream Processors”… But what does that mean, exactly? Can general programs be compiled to GPUs? Can they run efficiently on GPUs?

kayla
Télécharger la présentation

Compiling Metaprogrammed Shaders to Stream GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiling Metaprogrammed Shaders to Stream GPUs Michael D. McCool Computer Graphics Lab University of Waterloo Graphics Hardware 2003

  2. Topics • GPUs are “Stream Processors”… • But what does that mean, exactly? • Can general programs be compiled to GPUs? • Can they run efficiently on GPUs? • How can GPUs be evolved to support more powerful programming models without negatively impacting performance? • What abstractions should programming languages for GPUs support?

  3. Imagine Stream Processor • SIMD kernel processing on streams containing homogeneous records • Memory hierarchy • Local registers • Stream register file • External memory • Streaming external memory access • Conditional read and write

  4. Stream GPU Architecture Vertex Shader New Rasterizer Optional Fragment Shader Display Compositor

  5. Stream GPU Architecture • Stream input to vertex unit • Array inputs to fragment unit • At least two stream outputs from fragment unit supporting conditional writes • Array output from fragment unit via compositor

  6. Sh Metaprogramming Library • Embedded metaprogramming • Both a library and a high-level programming language • Available from SourceForge: • http://libsh.sourceforge.net • Currently semantically “Cg-equivalent” • Adding control constructs, stream algebra in next phase…

  7. ShAttrib1f julia_max_iter = 20.0; ShAttrib1f julia_scale = 0.05; ShAttrib2f julia_c(1.0, -0.3); ShTexture2D<ShColor3f>julia_map(32,32); . . . ShProgram julia0 = SH_BEGIN_VERTEX_SHADER { ShInputTexCoord2f ui; ShInputPosition3f pm; ShOutputTexCoord2f uo(ui); ShOutputPosition4f pd; pd = (perspective|modelview) | pm; } SH_END_SHADER; ShProgram julia1 = SH_BEGIN_FRAGMENT_SHADER { ShInputTexCoord2f u; ShInputPosition2f pdxy; ShOutputColor3f fc; ShAttrib1f i = 0.0; SH_WHILE(((v|v) < 2.0)* (i < julia_max_iter)) { ShTexCoord2f v; v(0) = u(0)*u(0) - u(1)*u(1); v(1) = 2.0*u(0)*u(1); u =v + julia_c; i++; } SH_ENDWHILE; ShTexCoord2flookup(0.0,0.0); lookup(0) = julia_scale * i; fc = julia_map(lookup); } SH_END_SHADER; Julia Set: Sh Example

  8. Compiler: ControlFlowGraph

  9. Control Graph • Control flow graph from compiler also describes multipass stream program! • Need conditional write to avoid accumulation of “garbage records” • Iteration and conditionals may scramble order of records --- but can always sort by ID later if necessary.

  10. 56.25 Kwords (800 tris) 197.48 Kwords 197.48 Kwords 197.48 Kwords Rasterize Iterator Render 9450 Kwords Julia Set:Control Graph

  11. 4010 Kwords (42771 tris) 86.71 Kwords (800 tris) Stack arc 5748 Kwords Bump 3.46 Kwords Split Oracle Tess2 368.6 Kwords Tess3 1137 Kwords Tess4 5661 Kwords Adaptive Tessellation:Control Graph

  12. Scheduler • Local arcs are system-allocated stream buffers (ideally stream registers) • System picks kernel to run: • Has enough input data • Space available in available output buffer • Picks kernel that maximizes throughput • Repeat until no more data in input stream

  13. Observations: • True conditionals and iteration: • Implementable with conditional write to stream output • NEED NULL COMPRESSION! • Multiple stream outputs also desirable • Fragment scatter: • Implementable with render-to-vertex-array • F-buffer feedback also desirable

  14. Simulating Null Compression • Want conditional write to stream • No space wasted for nullified records • Can simulate on current GPUs: • Write to array • Use occlusion test to count number of non-null records • Sort array by mark bit (use depth channel to mark) • Discard null records (now at end of array) • Expensive, perhaps other ways…

  15. fsh 33 25 17 9 1 fsh 34 26 18 10 2 fsh 27 19 11 3 fsh 28 20 12 4 fsh 29 21 13 5 fsh 30 22 14 6 fsh 31 23 15 7 fsh 32 24 16 8 HW Stream Null Compression

  16. Stream Algebra ShProgram p; (a,b) = p(d,e,f); (a,b) = p << (d,e,f); (a,b) = p << d << e << f; (a,b) = p << q << (d,e,f); (a,b,u,v,w) = (p ** q) << (d,e,f,j,k,l); fb += p << r << q << (c,n,v)[i]; ShStream cq = optimize(q << (c,n,v)[i]); fb += p << r << cq; a += s * t; ShCampaign k = . . .

  17. Targets • GPUs (via Cg, OGL Slang, etc.) • SIMD • Multithreaded • MIMD • SSE, SSE2 (via Intel compiler) • Cluster computers • Shared-mem computers • PS2, PS3

  18. Issues: • Null compression can be simulated with sparse texture compression, but slow. H/W support would be useful. • On-chip stream registers… • Off-chip stream buffer compression… • On-GPU scheduler… • Compilation of recursive algorithms? • Virtualization: registers, stream record size, stream length, textures, array read-write, synchronization, etc. • Abstractions: streams, sequences, sets, indexes, arrays, programs, campaigns, shapes, etc.

  19. HF RastSplit HF + Wood Wood Material Mapping: Control Graph

  20. Predecessor IF (C) { A } ELSE { B } Successor A C S P B Control Construct Templates Predecessor WHILE (C) { A } Successor S P C A

More Related