1 / 34

GRAPHICS AND COMPUTING GPUS

GRAPHICS AND COMPUTING GPUS. Jehan-François Pâris jfparis@uh.edu. Chapter Organization. Why bother? Evolution GPU System Architecture Programming GPUs …. Why bother? (I). Yesterday's fastest computer was the Sequoia supercomputer

langer
Télécharger la présentation

GRAPHICS AND COMPUTING GPUS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GRAPHICS ANDCOMPUTING GPUS Jehan-François Pâris jfparis@uh.edu

  2. Chapter Organization • Why bother? • Evolution • GPU System Architecture • Programming GPUs • …

  3. Why bother? (I) • Yesterday's fastest computer was the Sequoia supercomputer • Can crunch 16.32 quadrillion calculations per second (16.32 Petaflops/s). • 98,304 compute nodes • Each compute nodes is a 16-core PowerPC A2 processor

  4. Why bother? (II) • Today's fastest computer is the Cray XK7 • Hits 17.59 Petaflops/s on the LINPAC benchmark. • Features 560,640 processors, including 261,632 Nvidia K20x accelerating cores. • Supercomputing version of consumer-oriented GK104 CPU

  5. Why bother (III) • Most techniques developed for high-speed computing end trickling down to mass markets

  6. EVOLUTION

  7. History (I) • Up to late 90's • No GPUs • Much simpler VGA controller • Consisted of • A memory controller • Display generator + DRAM • DRAM was either shared with CPU or private

  8. History (I) • By 1997 • More complex VGA controllers • Incorporated 3D accelerating functions in hardware • Triangle set up and rasterization • Texture mapping and shading

  9. Rasterization • Converting • An image described in a vector graphics format as a combination of shapes • Lines, polygons, letters, … into • A raster image consisting of individual pixels

  10. History (II) • By 2000 • Single chip graphics processor incorporated nearly all functions of graphics pipeline of high-end workstations • Beginning of the end of high-end workstation market • VGA controller was renamed Graphic Processing Units

  11. Current trends (I) • Graphics processing standards • Well defined APIs • Open GL:Open standard for 3D graphics programming • DirectX:Set of MS multimedia programming interfaces (Direct3D for 3D graphics) • Xbox was named after it!

  12. Current trends (II) • Frequent doubling of GPU speeds • Every 12 to 18 months • New paradigm: • Visual computing stands at the intersection graphic processing and parallel computing • Can implement novel graphics algorithms • Use GPUs for non-conventional applications

  13. Two results • Triumph of heterogeneous architectures • Combining powers of CPU and GPU • GPUs become scalable parallel processors • Moving from hardware-defined pipelining architectures to more flexible programmable architectures

  14. From GPGU to CUDA • GPGU • General-Purpose computing on GPU • Uses traditional graphics API and graphics pipeline

  15. From GPGU to CUDA • CUDA • Compute Unified Device Architecture • Parallel computing platform and programming model • C/C++ • Invented by NVIDIA • Single Program Multiple Data approach

  16. GPU SYSTEM ARCHITECTURE

  17. Old School Approach CPU NorthBridge RAM PCI bus SouthBridge VGAController Framebuffer UART To VGA display

  18. Intel Architecture CPU Todisplay GPU NorthBridge DDR2 RAM GPUMemory SouthBridge

  19. AMD Architecture CPU NorthBridge DDR2 RAM Todisplay GPU Chipset GPUMemory

  20. Variations • Unified Memory Architecture (UMA): • GPU shares RAM with CPU • Lower memory bandwidth, higher latency • Cheap, low-end solution • Scalable Link Interconnect: • NVIDIA • Allows multiple GPUs • High-end solution

  21. Integrated solutions • Integrate CPU and Northbridge • Integrate GPU and chipset

  22. Game console • Similar architectures • Architectures evolve over time • Objective is to reduce costs while maintaining performance

  23. GPU interfaces and drivers • GPU attached to CPU via PCI-Express • Replaces older AGP • Interfaces such as OpenGL and Direct3D use the GPU as a coprocessor • Send commands, programs and data to GPU through a specific GPU device driver They are often buggy!

  24. Raster&Merger VertexShader GeometryShader Setup&Raster PixelShader Graphics logical pipeline InputAss'er These functions must be mappedinto a programmable GPU

  25. Basic Unified GPU Architecture • Programmable processor array • Tightly integrated with fixed-function processors for texture filtering, rasterization, raster operations • Emphasis in on very high level of parallelism

  26. Example architecture • Tesla architecture (NVIDIA Geoforce 8800) • 116 streaming processors (SP) cores • Organized as 14 multithreaded streaming multiprocessors (SM) • Each SP core • Manages 96 concurrent threads • Thread state are maintained by hardware • Connects with four 64-bit DRAM partitions

  27. Example architecture • Each SM has • 8 SP cores • 2 special function units • Separate caches for instructions and constants • A multithreaded instruction unit • Shared memory (NUMA?)

  28. PROGRAMMING GPUS Will focus on parallel computing applications

  29. Key idea • Must decompose problem into set of parallel computations • Ideally two-level to match GPU organization

  30. Example Data are inbig array Small array Small array Small array Small array Small array Tiny Tiny Tiny Tiny

  31. CUDA • CUDA programs are written in C • Provides three abstractions • Hierarchy of thread groups • Shared memory • Barrier synchronization

  32. Barrier synchronization • Barriers let threads • Wait for completion of a computation step by other cores so they can • Exchange results • Start next step

  33. Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Tiny Example Barrier = Wait for each other Exchange partial results Barrier = Wait for each other Exchange partial results

  34. Big fallacies • GPUs • Not good for general computation • Cannot run double precision arithmetic • Do not do floating point correctly • Cannot speedup O(n) algorithms

More Related