1 / 15

GPUs: Overview of Architecture and Programming Options

GPUs: Overview of Architecture and Programming Options. Lee Barford. firstname dot lastname at gmail dot com. Outline. Why parallel computing is now important What GPUs are and what they provide Overview of GPU architecture Enough to orient the discussion of programming them

Télécharger la présentation

GPUs: Overview of Architecture and Programming Options

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com

  2. Outline • Why parallel computing is now important • What GPUs are and what they provide • Overview of GPU architecture • Enough to orient the discussion of programming them • Future changes • Three “languages” for programming GPUs • Those we’re not doing include CUDAFortran, Python CUDA & CL bindings, WebCL

  3. Exponentially growing gap Serial App Performance Graph from UC Berkeley ParLab

  4. Graphics Processor (GPU) as Parallel Accelerator • Commodity priced, massively parallel floating point • Claimed performance on various problems 50-2500x CPU running serial code Graph from http://drdobbs.com/high-performance-computing/231500166

  5. The GPU as a Co-Processor to the CPU:The physical and logical connections Control actions & code (kernels) to run GPU • I/Os: • Video • Ethernet • USB hub • Firewire • … CPU chipset PCIe Slow Main memory GPU memory Running GPU code is like requesting asynchronous I/O

  6. 0.5-3 years from now: Fusion of CPU and GPU CPU GPU Multiple cores Hardware task scheduler Running GPU code will be like pending method pointers for future execution. (Like C++11, TBB, TPL, PPL). Main memory I/O subsystem

  7. Programming Tomorrow’s CPU will be Like Programming Today’s GPU • GPUs that compute will come “for free” with computers • Slow step of moving data to/from GPU will be eliminated • Hardware task scheduler for both CPU and GPU will • Almost eliminate OS & I/O overhead for invoking GPU kernels • Also almost eliminate OS overhead for invoking parallel tasks on CPU • AMD laptop chip available now (but no boards/systems) • NVIDIA GPU+ARM chip available now for battery operated devices • Both promise desktop chips in next year or two • Programming models will probably evolve from what we’ll cover • Course will use current, PCIe-based GPUs • We will be dealing with overheads that will pass away over next few years

  8. CUDA (NVIDIA) GPU Compute Architecture:Many Simple, Floating-Point Cores

  9. Cores organized into groups • 32 cores (Streaming Multiprocessor) share: • Instruction stream • Registers • Execute same program (kernel) • SPMD: ~ [Same place in same kernel at the same time] • Act as 100-1000’s more cores by switching context instead of waiting for memory • 1000’s of virtual cores executing same lines of code together, but • Sharing limited resources

  10. GPU has multiple SMs • SMs run in parallel • Do not need to be executing same location in the same program at the same time • In aggregate, many 1000’s of parallel copies of same kernel running simultaneously • Total of up to 1Tflop/s at peak • CENTRAL SOFTWARE ISSUE: • How to generate and control this much parallelism

  11. GPUs: Programming Options • Libraries: called from CPU code. Write no GPU code. Examples: • Image/video processing, dense & sparse matrix, FFT, random numbers • Generic programming for GPU • Thrust • Like C++ Standard Template Library • Specialize & use built-in data structures and algorithms • NVIDIA GPUs only • Programming the GPU directly • CUDA C/C++, OpenCL, WebCL, CUDA Fortran, various Python libraries • Write code that runs on GPU (kernels) • Write CPU code that directly controls and coordinates • Data movement between CPU memory and GPU memory • Startup of kernels on GPU • CPU processing of results from GPU when they become available

  12. CUDA C/C++ vsOpenCL CUDA C/C++ OpenCL Open standard (Khronos) Code runs on NVIDIA & AMD GPUs, x86 multicore, FPGAs (academic research) at the same time Compiles at build time to intermediate form that is compiled at run time for the hardware that is present Compiler is available at run time Can execute downloaded or dynamically generated source code • Proprietary (NVIDIA) • Code runs on NVIDIA GPUs • Reportedly 10-50% faster than OpenCL • Compiles at build time to binary code for particular targeted hardware • Specific NVIDIA hardware architecture versions • No compiler available at run time

  13. The Three Programming Environments We’ll Cover • Thrust: • Easy to write • Algorithms provided among the fastest (e.g., sort) • NVIDIA GPUs only • CUDA C/C++: • Very efficient code • Lots of fussy detail to get that efficiency • Robust tool chains for Linux, Windows, MacOS • Specific to NVIDIA • OpenCL: • Write once, run many • Supports heterogeneous parallel machines (fusion) • Tool chains good enough for research • IMHO, will eventually replace CUDA C/C++

  14. Class Project Idea • Accurate edge finding in a 1D signal • Journal paper published on multicore version • Student project last year doing Thrust implementation • Project: Do CUDA version + performance tests • Paper combining previous student’s work with above: 60% probability of getting accepted in a particular IEEE conference • 3 co-authors, including previous student & Lee • Extended abstract due: Nov 6 • Class project due during finals, same as everyone else • Camera ready paper due: March 4 • See or email me in the next week or two if interested

  15. Questions

More Related