1 / 18

Some Things

Some Things. Jeremy Sugerman 22 February 2005. Topics. Quick GPU Topics Conditional Execution GPU Ray Tracing. PCI-Express. PCI-Express solves data transfer problems…. 3DLabs Realizm 100 AGP. Mediocre Fill Rate (About half a 9800XT) Reasonable Texture Bandwidth

nhu
Télécharger la présentation

Some Things

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Things Jeremy Sugerman 22 February 2005

  2. Topics • Quick GPU Topics • Conditional Execution • GPU Ray Tracing Jeremy Sugerman, FLASHG 22 February 2005

  3. PCI-Express • PCI-Express solves data transfer problems… Jeremy Sugerman, FLASHG 22 February 2005

  4. 3DLabs Realizm 100 AGP • Mediocre Fill Rate (About half a 9800XT) • Reasonable Texture Bandwidth • Variable Cost Instructions • 6 GFLOPS ADD – 0.5 GFLOPS LG2 • Remarkable Readback • But, No GL_TEXTURE_RECTANGLE_EXT Jeremy Sugerman, FLASHG 22 February 2005

  5. Conditional Execution • Depth and Stencil are classic tools • Only effective early • All shaders support predication and KIL • No savings in execution time • KIL does gruesome things to the pipeline • Pixel Shader 3.0 has true branching • If-Then-Else, Data dependent loops • NV4x currently, no ATI until R500 Jeremy Sugerman, FLASHG 22 February 2005

  6. Clear Z to 1.0 Draw Depth-Only at Z = 0.3 KIL where computation will happen Draw Color at Z = 0.7 Very Effective When it Works Fragile, Easily Disabled Stays Disabled Until glClear! Compute Mask – Z Buffer Jeremy Sugerman, FLASHG 22 February 2005

  7. Compute Mask - EarlyZ NV41 X800 Random 2x2 Blocks 3x3 Blocks 4x4 Blocks Wavefront Jeremy Sugerman, FLASHG 22 February 2005

  8. Compute Mask – PS3.0 • Rasterize Normally a shader like: If (pixel is live) { … MOV result.color, <output> } else { MOV result.color, <placeholder> // Or KIL } • Easy to Write • Must shade all fragments • Must write a value or KIL for all fragments Jeremy Sugerman, FLASHG 22 February 2005

  9. Compute Mask – PS 3.0 Random 64x64 Blocks 32x32 Blocks 16x16 Blocks Wavefront Jeremy Sugerman, FLASHG 22 February 2005

  10. Pixel Shader 3.0 • Not (yet?) a replacement for Early-Z • What about loops? • What about state machines? If (fragment is in state a) { // Computation 1 } else { // Computation 2 } • Will execution time be MAX(a, b) or a + b? Jeremy Sugerman, FLASHG 22 February 2005

  11. GPU Ray Tracing • Tim Purcell left us a Brook raycaster • Tim (Foley) et al. beat on it for DARPA Line-of-Sight • Early-Z, 2D Addressing • Tim and I have forked it again • Explore new hardware features • Explore new algorithm options • Mature, maintainable source base Jeremy Sugerman, FLASHG 22 February 2005

  12. Demo • Break for demo… Jeremy Sugerman, FLASHG 22 February 2005

  13. GPU Ray Tracing – Brute Force • Initialize Scene Parameters, Geometry (CPU) • Generate Eye Rays • Foreach( triangle in the scene ) • Intersect with all rays • Record if it hits closer than any prior triangle • Shade Hits • Ray-Triangle kernel is 39 instructions • Over 100 million intersections per second Jeremy Sugerman, FLASHG 22 February 2005

  14. GPU Ray Tracing – Uniform Grid • Initialize Scene Parameters, Geometry (CPU) • Generate Eye Rays • While (Any Rays Are Live) • Traverse the traversing rays • Intersect the intersecting rays • Shade Hits • Equivalent to ~14 million ray-triangles per second on our scenes. Jeremy Sugerman, FLASHG 22 February 2005

  15. “Any Live Rays?” • Fundamentally a reduction • Sum across all rays • Readback to CPU • Many passes to do a GPU reduction • Could try occlusion query • Kernel that just KIL’s on dead rays • Still an extra pass • GPU global counter registers would be cool • Equivalent to 24 million ray-triangles per second when skipped. Jeremy Sugerman, FLASHG 22 February 2005

  16. Ping Ponging Buffers • No read-modify-write causes copies: intersectTriangle(in ray, in oldHit, in tri, out hit) { if (ray hits tri closer than oldHit) { hit = <where ray hits tri>; } else { hit = oldHit; No RMW } • Memory and Bandwidth Hungry • Add conditionals / predication to kernels • Complicates Early-Z compute masking Jeremy Sugerman, FLASHG 22 February 2005

  17. Render to Texture • DirectX has it, OpenGL does not • DirectX raytracer bluescreens NV4x drivers • Every shader draws its results to a pbuffer • Copied back to a texture each time • Superbuffers offered a fix • ATI supported them (broken now) • ARB killed them • Framebuffer Objects made it through the ARB • Only drivers are preliminary NV4x drivers Jeremy Sugerman, FLASHG 22 February 2005

  18. GPU Ray Tracer Enhancements • 2D Addressing (duh) • kD-Tree Accelerator • Early-Z and/or PS3.0 for the Accelerators • Tuning Traverse vs. Intersect vs. Shade • Occlusion Queries / Fast Reductions • Shadows • Tuning Bandwidth • Shading… Jeremy Sugerman, FLASHG 22 February 2005

More Related