OpenCL: API and Programming Language for Heterogeneous Computing

Peter Holvenstot OpenCL

OpenCL • Designed as an API and language specification • Standards maintained by the Khronos group • Currently 1.0, 1.1, and 1.2 • Manufacturers release their own SDK and drivers • Major backers: Apple, AMD/ATI, Intel

OpenCL • Alternative to CUDA • Not limited to ATI GPUs • Designed for “heterogenous computing” • Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs

OpenCL • Similar structure of host programs and kernels • Set of compute devices is called a 'context' • Kernels executed by 'processing elements' • Kernels can be compiled at run-time or build-time

OpenCL • Task Parallelism – many kernels running at once • OpenCL 1.2 – device can be partitioned down to single Compute Unit • Built-in kernels for device-specific functionality

Advantages • Same code can be run on different devices • Can also be run on NVIDIA GPUs! • AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units) • Limited library of portable math routines • Most common BLAST and FFT routines

Performance

Disadvantages • No “official” implementation • Vendors may meet specs or add restrictions • Apple adds restrictions on group size • Devices need appropriate settings to perform well • Different capabilities → different performance • Solution: Tuning/load balancing framework

Non-Optimized Performance

Restrictions • No recursion, variadics, or function pointer • Cannot dynamically allocate memory from device • No native variable-length arrays, double-precision • Some can be worked around by extensions

OpenCL: Stream Core Compute Unit Wavefront Intermediate Language Terminology CUDA: • Scalar Core • Streaming Multiprocssr • Warp • PTX

OpenCL: Host Memory Global Memory Global Memory Constant Memory Local Memory Private Memory Terminology CUDA: • Host Memory • Global/Device Memory • Local Memory • Constant Memory • Shared Memory • Registers

OpenCL: NDRange Work group Work item Global ID Block ID Local ID Terminology CUDA: • Grid • Block • Thread • Thread ID • Block Index • Thread Index

References • http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/CUDAvsOpenCL.pdf • https://wiki.aalto.fi/download/attachments/40025977/Cuda+and+OpenCL+API+comparison_presented.pdf • http://www.hpcwire.com/hpcwire/2012-02-28/opencl_gains_ground_on_cuda.html • http://www.netlib.org/utk/people/JackDongarra/PAPERS/parcocudaopencl.pdf • http://www.netlib.org/lapack/lawnspdf/lawn228.pdf

OpenCL: API and Programming Language for Heterogeneous Computing

OpenCL: API and Programming Language for Heterogeneous Computing

Presentation Transcript

OpenCL

OpenCL

OPENCL OVERVIEW

OpenCL™

Evolution of OpenCL *

OpenCL

OpenCL

OpenCL Extensions

OpenCL

OpenCL ch. 2~4

OpenCL

Introduction to OpenCL

OpenCL

OpenCL Particle System

OpenCL Particle System

OpenCL

OpenCL Particle System

OpenCL

OpenCL

Portability with OpenCL