Evolution and Innovation in GPU Technology
130 likes | 177 Vues
Explore the history of GPU development, from pre-GPU graphics acceleration to modern computational resources like GPGPU. Learn about the significant advancements, applications, and computational resources driving the evolution of GPUs in the computer graphics industry.
Evolution and Innovation in GPU Technology
E N D
Presentation Transcript
GPU • Precision, Power, Programmability • CPU: x60/decade, 6 GFLOPS, 6GB/sec • GPU: x1000/decade, 20 GFLOPs, 25GB/sec • Arithmetic heavy (read OR write): faster hardware • Parallelization • Multi-billion $ entertainment market drives innovation • 32-bit Floating point • Programmable (graphics, physics, general purpose data-flow) • Can’t simply “port” CPU code to GPU David Luebke et al. GPGPU, SIGGRAPH 2004
History of the 3D graphics industry • 60s: • Line drawings, hidden lines, parametric surfaces (B-splines…) • Automated drafting & machining for car, airplane, and ships manufacturers • 70’s: • Mainframes, Vector tubes (HP…) • Software: Solids, (CSG), Ray Tracing, Z-buffer for hidden lines • 80s: • Graphics workstations ($50K-$1M): Frame buffers, rasterizers , GL, Phigs • VR: CAVEs and head-mounted displays • CAD/CAM & GIS: CATIA, SDRC, PTC • Sun, HP, IBM, SGI, E&S, DEC • 90s: • PCs ($2K): Graphics boards, OpenGL, Java3D • CAD+Videogames+Animations: AutoCAD, SolidWorks…, Alias-Wavefront • Intel, many board vendors • 00s: • Laptops, PDAs, Cell Phones: Parallel graphic chips • Everything will be graphics, 3D, animated, interactive • Nvidia, Sony, Nokia
History of GPU • Pre-GPU Graphics Acceleration • SGI, Evans & Sutherland. Introduced concepts like vertex transformation and texture mapping. Very expensive! • First-Generation GPU (-1998) • Nvidia TNT2, ATI Rage, Voodoo3. Vertex transformation on CPU, limited set of math operations. • Second-Generation GPU (1999-2000) • GeForce 256, Geforce2, Radeon 7500, Savage3D. Transformation & Lighting. More configurable, still not programmable. • Third-Generation GPU (2001) • Geforce3, Geforce4 Ti, Xbox, Radeon 8500. Vertex Programmability, pixel-level configurability. • Fourth-Generation GPU (2002-) • Geforce FX series, Radeon 9700 and on. Vertex-level and pixel-level programmability.
Architecture Application Vertex Shader transformed vertices, normals, colors Geometry Shader Rasterizer fragments (surfels per pixel) texture Fragment Shader pixel color, depth, stencil Compositor Display
Buffers • Color: 8-bit index to color table, float/16-bit true color… • Depth: 24-bit or float (0 at back plane) • Back and front: display front, update back, swap • Stereo: Shutter glasses, HMD. Alternate frames • Auxiliary: off-screen working space. Helps reduce passes. • Stencil: 8 bits (left-over of depth buffer). <,>… mask, ++ • Accumulation: sum, scale (supersampling, blur) • P-buffer, superbuffers: Render to texture
Fragment operations • Depth tests: <, <=, >, <=, ==, Zdepth-interval • Stencil test: mask?, counter, parity. • Alpha tests: compare to reference alpha • Alpha blending: + max, min, replace, blend
Data Parallelism in GPUs • Data flow: vertices > fragments > pixels • Parallelism at each stage • No shared or static data (except textures) • ALU-heavy (multiple ALUs per stage in pipe) • Fight memory latency with more computation
GPGPU • Stream: collection of records (pixels, vertices…) • Stored in Textures (a computational grid) • Kernel: Function applied to each element in stream • Transform, evolve (no dependency between records) • Matrix algebra • Image/volume processing • Physical simulation • Global illumination • Ray tracing • Photon mapping • Radiosity
Computational Resources • Programmable parallel processors • Vertex & Fragment pipelines • Rasterizer • Mostly useful for interpolating addresses (texture coordinates) and per-vertex constants • Texture unit • Read-only memory interface • Render to texture (or Copy to texture) • Write-only memory interface
Vertex Processor • Fully programmable (SIMD / MIMD) • Processes 4-vectors (RGBA / XYZW) • Capable of scatter but not gather (A[i,j]=x;) • Can change the location of current vertex • Cannot read info from other vertices • Can only read a small constant memory • Vertex Texture Fetch • Random access memory for vertices • Arguably still not gather
Fragment Processor • May be invoked at each pixel by drawing a full screen quad • Fully programmable (SIMD) • Processes 4-vectors (RGBA / XYZW) • Random access memory read (textures) • Capable of gather(x=A[i+1,j];) and some scatter • RAM read (texture), but no RAM write • Output address fixed to a specific pixel • But can change that address • Typically more useful than vertex processor • More fragment pipelines than vertex pipelines • Gather • Direct output (fragment processor is at end of pipeline)
Branching • Not supported or expensive • Avoid, replace by math • Depth test • Stencil test • Occlusion query (conditional execution) • Pre-computation (region of interest, use to set stencil mask)