Evolution of Graphics Hardware: From Rasterization to Programmable GPUs

Image Synthesis GP-GPU

Graphics hardware • Current performace – PlayStation 3 • CPU: Cell Prozessor (3,2 GHz) • 512 kB L2-Cache • ~200 GFLOP/s • GPU (Graphics Processing Unit) • Nvidia RSX Reality Synthesizer (550 MHz, ~300 MTransistors • ~ 1,8 TFLOP/s • ~ 20 GPixels/s • ~ 2 GTriangles/s

Graphics hardware - history • 80: simple rasterization • Windows, lines, polygons, text-fonts • 90-95: „Geometry-Engines“ only on High-End-Workstations • e.g. SGI O2 vs. Indigo2) • 95: newrasterizationfunctionality • Realismbytexturing, e.g: SGI Infinite Reality • 98: Geometryprocessor (T&L) on PC-Graphics • 2000: PC-Graphics achievessimilarperformanceto High-End-Workstations • 3D isbecomingstandard in Aldi-PC • 2001: PC-Graphics offersnewfunctionality • Multitextures, Vertex- andPixel-Shader • 2002: DirectX Level 9.0 Hardware • High Level ShaderLanguages • 2006: DirectX Level 10.0 Hardware • Geometry – Shader

Trends in graphics hardware Numberoftransistorsdoublesevery 6 months Advances in performanceandfunctionality ATI R520 300 GeForceFX / ATI Radeon 9800 150 60 50 GeForce3 (57M) R200 (60M) 40 30 Transistors (Mi) Riva 128 (3M) 20 10 0 Time (month/year) 9/97 3/98 9/98 3/99 9/99 3/00 9/00 3/01 9/02

Graphics CPU Performance Network Time Trends in graphics hardware • Grows faster than Moore‘s law predicts

Parallel graphics hardware • Graphics hardware has always been parallel • Internal on chip or board • Multiple rasterizer serve one frame buffer • Multi-Pipe • Multiple graphics cards in one system for one or multiple displays • Multiple geometry engines • Distributed graphics • Multiple knots in a connected cluster with one or multiple cards serve one or multiple displays driven by one application

Graphics architectures • State-of-the-Art GPUs • Highly parallel streamarchitecture • Stream ofvertices/fragmentsisprocessed • Pipelinedand SIMD parallel processing • SIMD: singlesetofinstructions on multiple streamelements • Specifiesnewrenderingpipeline • Additional stages a vertexor a fragmentispassingthrough • Specifiesnew (vendorspecific) OpenGLextensions • Allowsfornewclassesofalgorithms • Eventuallymakesprogramsplatformdependent

Graphics architectures State-of-the-Art GPUs (G80)

Graphics architectures • State-of-the-Art GPUs • Multiple (texture) render targets • Upto2GB videomemory • Floating pointtextures (4 x 32 Bit) • Internal computations in float /double precision • Z-cull: discardsfragments (beforeenteringthepixelpipelines) that will failthedepthtest • Dynamic flowcontrol: per-vertex/geometry/fragmentspecificoperations (ifthenelse) • PCIe: serial, pont2point protocol, dual channelstoallowforbandwidth in bothdirections (upload/download) • Fix fragment-to-pixelbound, i.e. a fragment (XY) can not bewrittento a pixel (X´Y´) • noscattering(at least not in DX/GL)– onlygathering

Graphics architectures State-of-the-Art programmable GPUs

GP-GPU Water

Programmable graphics hardware Displacementmapping Simulation generatesheight field texture static grid water surface Displacer Rendering

Programmable graphics hardware • GPU memory objects • Semantics can be specified for chunk of memory • Memory object can be a texture, a vertex array, a frame buffer object • What was a texture render target in the current pass becomes a vertex array in the upcoming pass • Texture elements can be interpreted as vertex attributes without any copying operations (not in OpenGL) • Same effect can be achieved with vertex texture fetch, but this fetch actually slows down performance

Programmable graphics hardware • Example • Computationofheightvaluesuatverticesof a 2D grid • Startingwith an initialdistribution, computeevolutionover time t y Pij+1 Pi-1j+1 Pi+1j+1 h Pij Pi-1j Pi+1j Pij-1 Pi-1j-1 Pi+1j-1 h x

Programmable graphics hardware Algorithm: • Load initial height values (NxxNy) as 2D texture (sGridPrev, sGrid) • Upload fragment shader (render to sGridNew): voidPerPixelSim ( float2 fragpos: TEXCOORD0, out height : COLOR0) { centerPrev = tex2D(sGridPrev, fragpos); float2 leftIndex = float2(-1.0/TexSize, 0.0); left = tex2D(sGrid, fragpos + leftIndex); // same forright, upper, lower, center height = f(left, right, upper, lower, center, centerPrev); }

Programmable graphics hardware Algorithm contd.: • Simulation: • Render a Quad that covers Nx x Ny pixelswith appropriate texture coords. • Nx x Ny fragments will be generated • Data parallel execution of fragments • Swizzle texture identifiers • sGridPrev = sGrid, sGrid = sGridNew; sGridNew = sGrdPrev • Display height field in texture sGrid (0,1) (1,1) (1,0) (texCoord = 0,0)

Programmable graphics hardware Algorithm contd.: • Display: • Upload fragment shader (render to color buffer): voidPerPixelRefract ( float2 fragpos: TEXCOORD0, out color : COLOR0) { tangent = float3(1.0, 0.0, tex2D(sGrid, fragpos + rightIndex).r - tex2D(sGrid, fragpos).r; binormal = float3(0.0, 1.0, tex2D(sGrid, fragpos + upper).r - tex2D(sGrid, fragpos).r); normal = normalize(cross(tangent, binormal)); refract = f(normal, refractionIndex); color = tex2D(sBackground, fragpos + refract); }

GPGPU ParticleTracing

GPU Partikelverfolgung

GPU Partikelverfolgung Eingabe Strom VertexShader InputAssembler Rasterizer Ausgabe Strom Output Merger Pixel Shader

Programmable graphics hardware Demonstration

Evolution of Graphics Hardware: From Rasterization to Programmable GPUs

Evolution of Graphics Hardware: From Rasterization to Programmable GPUs

Presentation Transcript

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Image Synthesis

Global Illumination for Image Synthesis

Synthesis of Image Calculation of Visibility

Visual Perception in Realistic Image Synthesis

Image Synthesis

Image Synthesis