CUDA Lecture 2 History of GPUs

CUDA Lecture 2History of GPUs Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Graphics in a Nutshell • Make great images • Intricate shapes • Complex optical effects • Seamless motion • Make them fast • Invent clever techniques • Use every trick imaginable • Build monster hardware Eugene d’Eon, David Luebke, Eric Enderton, In Proc. EGSR 2007 and GPU Gems 3 History of GPUs – Slide 2

Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline History of GPUs – Slide 3

Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline • Transform from “world space” to “image space” • Compute per-vertex lighting History of GPUs – Slide 5

Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline • Convert geometric representation (vertex) to image representation (fragment) • Interpolate per-vertex quantities across pixels History of GPUs – Slide 6

Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Key abstraction of real-time graphics • Hardware used to look like this • One chip/board per stage • Fixed data flow through pipeline History of GPUs – Slide 8

Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Everything fixed function with a certain number of modes • Number of modes for each stage grew over time • Hard to optimize hardware • Developers always wanted more flexibility History of GPUs – Slide 9

Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Remains a key abstraction • Hardware used to look like this • Vertex and pixel processing became programmable, new stages added • GPU architecture increasingly centers around shader execution History of GPUs – Slide 10

Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Exposing an (at first limited) instruction set for some stages • Limited instructions and instruction types and no control flow at first • Expanded to full ISA History of GPUs – Slide 11

Why GPUs Scale So Nicely • Workload and programming model provide lots of parallelism • Applications provide large groups of vertices at once • Vertices can be processed in parallel • Apply same transform to all vertices • Triangles contain many pixels • Pixels from a triangle can be processed in parallel • Apply same shader to all pixels • Very efficient hardware to hide serialization bottlenecks History of GPUs – Slide 12

With Moore’s Law… Pixel 0 Pixel 1 Blend Pixel Pixel 2 Blend Pixel 3 Vrtx 1 Vrtx 2 Vertex Vrtx0 Vertex Raster Raster History of GPUs – Slide 13

More Efficiency • Note that we do the same thing for lots of pixels/vertices • A warp = 32 threads launched together • Usually execute together as well Control Control Control Control Control Control ALU ALU ALU ALU ALU ALU Control ALU ALU ALU ALU ALU ALU History of GPUs – Slide 14

What Is (Historical) GPGPU? • All this performance attracted developers • To use GPUs, re-expressed their algorithms as general purpose computations using GPUs and graphics API in applications other than 3-D graphics • Pretend to be graphics; disguise data as textures or geometry, disguise algorithm as render passes • Fool graphics pipeline to do computation to take advantage of massive parallelism of GPU • GPU accelerates critical path of application History of GPUs – Slide 15

General Purpose GPUs (GPGPUs) • Data parallel algorithms leverage GPU attributes • Large data arrays, streaming throughput • Fine-grain SIMD parallelism • Low-latency floating point (FP) computation • Applications – see http://GPGPU.org • Game effects (FX) physics, image processing • Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting History of GPUs – Slide 16

Previous GPGPU Constraints • Dealing with graphics API • Working with the corner cases of the graphics API • Addressing modes • Limited texture size/dimension • Shader capabilities • Limited outputs • Instruction sets • Lack of integer & bit ops • Communication limited • Between pixels • Scatter a[i] = p per thread per Shader per Context Input Registers Fragment Program Texture Constants Temp Registers Output Registers FB Memory History of GPUs – Slide 17

Summary: Early GPGPUs • To use GPUs, re-expressed algorithms as graphics computations • Very tedious, limited usability • Still had some very nice results • This was the lead up to CUDA History of GPUs – Slide 18

Compute Unified Device Architecture (CUDA) • General purpose programming model • User kicks off batches of threads on the GPU • GPU = dedicated super-threaded, massively data parallel co-processor • Targeted software stack • Compute oriented drivers, language, and tools History of GPUs – Slide 19

Compute Unified Device Architecture (CUDA) • Driver for loading computation programs into GPU • Standalone Driver - Optimized for computation • Interface designed for compute – graphics-free API • Data sharing with OpenGL buffer objects • Guaranteed maximum download & readback speeds • Explicit GPU memory management History of GPUs – Slide 20

Example of Physical Reality behind CUDA CPU (host) GPU w/ local DRAM (device) History of GPUs – Slide 21 21

Parallel Computing on a GPU • 8-series GPUs deliver 25 to 200+ GFLOPSon compiled parallel C applications • Available in laptops, desktops, and clusters • GPU parallelism is doubling every year • Programming model scales transparently GeForce 8800 Tesla D870 History of GPUs – Slide 22

Parallel Computing on a GPU • Programmable in C with CUDA tools • Multithreaded SPMD model uses application data parallelism and thread parallelism Tesla S870 History of GPUs – Slide 23

Final Thoughts • GPUs evolve as hardware and software evolve • Five stage graphics pipelining • An example of GPGPU • Intro to CUDA History of GPUs – Slide 24

End Credits • Reading: Chapter 2, “Programming Massively Parallel Processors” by Kirk and Hwu. • Based on original material from • The University of Illinois at Urbana-Champaign • David Kirk, Wen-mei W. Hwu • The University of Minnesota: Weijun Xiao • Stanford University: Jared Hoberock, David Tarjan • Revision history: last updated 5/24/2011. History of GPUs – Slide 25

CUDA Lecture 2 History of GPUs