220 likes | 361 Vues
This overview covers the significant advancements in graphics hardware over the past few decades, highlighting key technologies and milestones. Beginning with simple rasterization techniques in the '80s to the sophisticated programmable GPUs of today, we explore the evolution of performance, functionality, and architectural innovations. Key developments from the PlayStation 3's Cell processor and Nvidia RSX to the rise of parallel processing and the advent of programmable shaders are discussed. This journey unveils the relentless pursuit of higher fidelity, realism, and interactive experience in computer graphics.
E N D
Image Synthesis GP-GPU
Graphics hardware • Current performace – PlayStation 3 • CPU: Cell Prozessor (3,2 GHz) • 512 kB L2-Cache • ~200 GFLOP/s • GPU (Graphics Processing Unit) • Nvidia RSX Reality Synthesizer (550 MHz, ~300 MTransistors • ~ 1,8 TFLOP/s • ~ 20 GPixels/s • ~ 2 GTriangles/s
Graphics hardware - history • 80: simple rasterization • Windows, lines, polygons, text-fonts • 90-95: „Geometry-Engines“ only on High-End-Workstations • e.g. SGI O2 vs. Indigo2) • 95: newrasterizationfunctionality • Realismbytexturing, e.g: SGI Infinite Reality • 98: Geometryprocessor (T&L) on PC-Graphics • 2000: PC-Graphics achievessimilarperformanceto High-End-Workstations • 3D isbecomingstandard in Aldi-PC • 2001: PC-Graphics offersnewfunctionality • Multitextures, Vertex- andPixel-Shader • 2002: DirectX Level 9.0 Hardware • High Level ShaderLanguages • 2006: DirectX Level 10.0 Hardware • Geometry – Shader
Trends in graphics hardware Numberoftransistorsdoublesevery 6 months Advances in performanceandfunctionality ATI R520 300 GeForceFX / ATI Radeon 9800 150 60 50 GeForce3 (57M) R200 (60M) 40 30 Transistors (Mi) Riva 128 (3M) 20 10 0 Time (month/year) 9/97 3/98 9/98 3/99 9/99 3/00 9/00 3/01 9/02
Graphics CPU Performance Network Time Trends in graphics hardware • Grows faster than Moore‘s law predicts
Parallel graphics hardware • Graphics hardware has always been parallel • Internal on chip or board • Multiple rasterizer serve one frame buffer • Multi-Pipe • Multiple graphics cards in one system for one or multiple displays • Multiple geometry engines • Distributed graphics • Multiple knots in a connected cluster with one or multiple cards serve one or multiple displays driven by one application
Graphics architectures • State-of-the-Art GPUs • Highly parallel streamarchitecture • Stream ofvertices/fragmentsisprocessed • Pipelinedand SIMD parallel processing • SIMD: singlesetofinstructions on multiple streamelements • Specifiesnewrenderingpipeline • Additional stages a vertexor a fragmentispassingthrough • Specifiesnew (vendorspecific) OpenGLextensions • Allowsfornewclassesofalgorithms • Eventuallymakesprogramsplatformdependent
Graphics architectures State-of-the-Art GPUs (G80)
Graphics architectures • State-of-the-Art GPUs • Multiple (texture) render targets • Upto2GB videomemory • Floating pointtextures (4 x 32 Bit) • Internal computations in float /double precision • Z-cull: discardsfragments (beforeenteringthepixelpipelines) that will failthedepthtest • Dynamic flowcontrol: per-vertex/geometry/fragmentspecificoperations (ifthenelse) • PCIe: serial, pont2point protocol, dual channelstoallowforbandwidth in bothdirections (upload/download) • Fix fragment-to-pixelbound, i.e. a fragment (XY) can not bewrittento a pixel (X´Y´) • noscattering(at least not in DX/GL)– onlygathering
Graphics architectures State-of-the-Art programmable GPUs
Graphics architectures State-of-the-Art programmable GPUs
Programmable graphics hardware Displacementmapping Simulation generatesheight field texture static grid water surface Displacer Rendering
Programmable graphics hardware • GPU memory objects • Semantics can be specified for chunk of memory • Memory object can be a texture, a vertex array, a frame buffer object • What was a texture render target in the current pass becomes a vertex array in the upcoming pass • Texture elements can be interpreted as vertex attributes without any copying operations (not in OpenGL) • Same effect can be achieved with vertex texture fetch, but this fetch actually slows down performance
Programmable graphics hardware • Example • Computationofheightvaluesuatverticesof a 2D grid • Startingwith an initialdistribution, computeevolutionover time t y Pij+1 Pi-1j+1 Pi+1j+1 h Pij Pi-1j Pi+1j Pij-1 Pi-1j-1 Pi+1j-1 h x
Programmable graphics hardware Algorithm: • Load initial height values (NxxNy) as 2D texture (sGridPrev, sGrid) • Upload fragment shader (render to sGridNew): voidPerPixelSim ( float2 fragpos: TEXCOORD0, out height : COLOR0) { centerPrev = tex2D(sGridPrev, fragpos); float2 leftIndex = float2(-1.0/TexSize, 0.0); left = tex2D(sGrid, fragpos + leftIndex); // same forright, upper, lower, center height = f(left, right, upper, lower, center, centerPrev); }
Programmable graphics hardware Algorithm contd.: • Simulation: • Render a Quad that covers Nx x Ny pixelswith appropriate texture coords. • Nx x Ny fragments will be generated • Data parallel execution of fragments • Swizzle texture identifiers • sGridPrev = sGrid, sGrid = sGridNew; sGridNew = sGrdPrev • Display height field in texture sGrid (0,1) (1,1) (1,0) (texCoord = 0,0)
Programmable graphics hardware Algorithm contd.: • Display: • Upload fragment shader (render to color buffer): voidPerPixelRefract ( float2 fragpos: TEXCOORD0, out color : COLOR0) { tangent = float3(1.0, 0.0, tex2D(sGrid, fragpos + rightIndex).r - tex2D(sGrid, fragpos).r; binormal = float3(0.0, 1.0, tex2D(sGrid, fragpos + upper).r - tex2D(sGrid, fragpos).r); normal = normalize(cross(tangent, binormal)); refract = f(normal, refractionIndex); color = tex2D(sBackground, fragpos + refract); }
GPU Partikelverfolgung Eingabe Strom VertexShader InputAssembler Rasterizer Ausgabe Strom Output Merger Pixel Shader
Programmable graphics hardware Demonstration