1 / 23

Programmable Graphics Hardware

Programmable Graphics Hardware. Admin. Handout: Cg in two pages. Acknowledgement & Aside. The bulk of this lecture comes from slides from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology

Télécharger la présentation

Programmable Graphics Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProgrammableGraphics Hardware David Luebke 111/18/2014

  2. Admin • Handout: Cg in two pages David Luebke 211/18/2014

  3. Acknowledgement & Aside • The bulk of this lecture comes from slides from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology • For this reason, and because the lab is outfitted with GeForce 3 cards, we will focus on NVIDIA tech David Luebke 311/18/2014

  4. Outline • Programmable graphics • NVIDIA’s next-generation technology: GeForceFX (code name NV30) • Programming programmable graphics • NVIDIA’s Cg language David Luebke 411/18/2014

  5. GPU Programming Model CPU GPU VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures David Luebke 511/18/2014

  6. 32-bit IEEE floating-pointthroughout pipeline • Framebuffer • Textures • Fragment processor • Vertex processor • Interpolants David Luebke 611/18/2014

  7. Hardware supports several other data types • Fragment processor also supports: • 16-bit “half” floating point • 12-bit fixed point • These may be faster than 32-bit on some HW • Framebuffer/textures also support: • Large variety of fixed-point formats • E.g., classical 8-bit per component • These formats use less memory bandwidth than FP32 David Luebke 711/18/2014

  8. Vertex processor capabilities • 4-vector FP32 operations, as in GeForce3/4 • True data-dependent control flow • Conditional branch instruction • Subroutine calls, up to 4 deep • Jump table (for switch statements) • Condition codes • New arithmetic instructions (e.g. COS) • User clip-plane support David Luebke 811/18/2014

  9. Vertex processor has high resource limits • 256 instructions per program(effectively much higher w/branching) • 16 temporary 4-vector registers • 256 “uniform” parameter registers • 2 address registers (4-vector) • 6 clip-distance outputs David Luebke 911/18/2014

  10. Fragment processor has clean instruction set • General and orthogonal instructions • Much better than previous generation • Same syntax as vertex processor: MUL R0, R1.xyz, R2.yxw; • Full set of arithmetic instructions:RCP, RSQ, COS, EXP, … David Luebke 1011/18/2014

  11. Fragment processor hasflexible texture mapping • Texture reads are just another instruction(TEX, TXP, or TXD) • Allows computed texture coordinates,nested to arbitrary depth • Allows multiple uses of a singletexture unit • Optional LOD control – specify filter extent • Think of it as…A memory-read instruction,with optional user-controlled filtering David Luebke 1111/18/2014

  12. Additional fragment processor capabilities • Read access to window-space position • Read/write access to fragment Z • Built-in derivative instructions • Partial derivatives w.r.t. screen-space x or y • Useful for anti-aliasing • Conditional fragment-kill instruction • FP32, FP16, and fixed-point data David Luebke 1211/18/2014

  13. Fragment processor limitations • No branching • But, can do a lot with condition codes • No indexed reads from registers • Use texture reads instead • No memory writes David Luebke 1311/18/2014

  14. Fragment processor has high resource limits • 1024 instructions • 512 constants or uniform parameters • Each constant counts as one instruction • 16 texture units • Reuse as many times as desired • 8 FP32 x 4 perspective-correct inputs • 128-bit framebuffer “color” output(use as 4 x FP32, 8 x FP16, etc…) David Luebke 1411/18/2014

  15. NV30 CineFX Technology Summary VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures • FP32 throughout pipeline • Clean instruction sets • True branching in vertex processor • Dependent texture in fragment processor • High resource limits David Luebke 1511/18/2014

  16. Programming in assembly is painful Assembly …FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D;… … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … • Easier to read and modify • Cross-platform • Combine pieces • etc. David Luebke 1611/18/2014

  17. Quick Demo David Luebke 1711/18/2014

  18. Cg – C for Graphics • Cg is a GPU programming language • Designed by NVIDIA and Microsoft • Compilers available in beta versions from both companies David Luebke 1811/18/2014

  19. Design goals for Cg • Enable algorithms to be expressed… • Clearly, and • Efficiently • Provide interface continuity • Focus on DX9-generation HW and beyond • But provide support for DX8-class HW too • Support both OpenGL and Direct3D • Allow easy, incremental adoption David Luebke 1911/18/2014

  20. Easy adoption for applications • Avoid owning the application’s data • No scene graph • No buffering of vertex data • Compiler sits on top of existing APIs • User can examine assembly-code output • Can compile either at run time, or at application-development time • Allow partial adoption e.g. Use Cg vertex program with assembly fragment program • Support current hardware David Luebke 2011/18/2014

  21. Some points inthe design space • CPU languages • C – close to the hardware; general purpose • C++, Java, lisp – require memory management • RenderMan – specialized for shading • Real-time shading languages • Stanford shading language • Creative Labs shading language David Luebke 2111/18/2014

  22. Design strategy • Start with C(and a bit of C++) • Minimizes number of decisions • Gives you known mistakes instead of unknown ones • Allow subsetting of the language • Add features desired for GPU’s • To support GPU programming model • To enable high performance • Tweak to make it fit together well David Luebke 2211/18/2014

  23. How are current GPU’s different from CPU? • GPU is a stream processor • Multiple programmable processing units • Connected by data flows VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures David Luebke 2311/18/2014

More Related