OpenCL

OpenCL “The open standard for parallel programming of heterogeneous systems.”

What is OpenCL? • According to the Khronos Group, OpenCL is “the open standard for parallel programming of heterogeneous systems.” • What does that mean? • First, some history on OpenCL.

The History of OpenCL • Apple was the driving force behind OpenCL 1.0. It was published through the Khronos Group in collaboration with AMD, Intel, Nvidia and others. • Version 1.0 was initially released with OSX 10.6, or Snow Leopard. • OpenCL is a fully open standard, available for use royalty-free by anyone.

The History of OpenCL • Additional versions of OpenCL have since been released, adding new functionality. • OpenCL 1.1 – June, 2010 • OpenCL 1.2 – November, 2011 • OpenCL 2.0 – November, 2013 • Not all companies have adopted the latest versions for their hardware. Nvidia, notably, still only supports up through version 1.1 even in their latest video cards.

What is parallel processing? • Parallel processing is the capability to perform multiple actions or operations at the same time. • In a single system this can be done by dividing up computing tasks between multiple cores or processors. • GPUs have a high number of smaller cores (up to thousands), which make them significantly more efficient at handling a large number of independent pieces of data in parallel.

What is a heterogeneous system? • At it’s simplest, a heterogeneous system is a system that has more than one kind of processor. • The different kinds of processors can be used to perform different or specialized tasks. • For example, in a traditional PC setup, the CPU can be used to perform serial tasks while parallel tasks can be run more efficiently on the GPU. • Other processor types included in the OpenCL standard include digital signal processors (DPSs) and field-programmable gate arrays (FPGAs).

What does it all mean? • OpenCL is designed to allow a programmer to develop parallelized programs for systems with a varied set of different processor types amongst CPUs, GPUs, DSPs and FPGAs. • OpenCL is also designed to be able to run on any supporting device, so code will be portable even between different processor types and can be set up to automatically balance program loads.

OpenCL and GPGPU • OpenCL is the most popular open framework for performing GPGPU, or General Purpose Computing on Graphics Processing Units. • GPGPU allows for non-graphics processing calculations to be run on the GPU, which is specially suited for certain tasks as mentioned previously.

OpenCL Details • OpenCL uses a host model, where the host is connected to multiple compute devices, such as GPUs. • A context is a collection of devices in OpenCL • These compute devices are in turn a collection of compute units, which are the cores located on that device. • These compute units are yet further broken down into processing elements, which can be collections of elements of the core, such as ALUs.

OpenCL Details • OpenCL code is primarily written in the OpenCL C language, which is based on C99. • The OpenCL C language provides a significant amount of built-in math, geometric, relational and vector functions. • Bindings are available for other languages, for example OpenCL.net or Cloo for .NET languages like C#, JOCL for Java, WebCL for Javascript among others.

OpenCL Details • A unit of executable code in OpenCL is referred to as a kernel. These are analogous to functions in C. • Smaller kernels run more efficiently on GPUs due to the greater number of simpler cores, while lengthy kernels are better run on CPUs. • A program is a collection of these kernels as well as standard functions.

OpenCL Details • In OpenCL, executing code is divided into work-items, which are contained within work groups • A work-item is a kernel being executed by a processing element, and is analogous to a thread – in an image processing application each of these could be a pixel that is being modified • A workgroup is a set of work-items that are related and executing on a single core or compute unit – in the same program, this could be a block of pixels that are being modified and are dependent on each other

OpenCL Details • When developing in OpenCL, the developer defines both global and local dimensions • The global dimensions define the problem space for the whole program – for example, in image processing this would be the dimension of the image • The local dimensions define the size of the workgroups – again, to use image processing as an example, this would be the individual blocks or sets of blocks in an algorithm

OpenCL Details • The memory structure of an OpenCL program can be seen represented below. • Although there is global memory access, OpenCL provides no methods for global synchronization, so programs cannot use it to pass data between workgroups, or must use multiple passes. • Workgroups do have synchronization methods that can be used internally however.

OpenCL Details • Kernels are written similarly to functions and begin with the keyword __kernel, for example • __kernel demoKernel(__global float* exampleGlobal, __local float* exampleLocal) { } • For the above parameters, __global refers to a global memory location in the device’s memory • __local variables are shared amongst all members of a workgroup • __constant memory parameters and variables can also be declared

OpenCL Code Example • Now, we’ll take a look at an example of some OpenCL code in Visual Studio, the commands and steps needed to set up OpenCL, and a couple of kernels used for solving some basic math applications.

OpenCL Comparison - CUDA • CUDA is Nvidia’s proprietary GPGPU framework. • CUDA is considered to be easier to work with, requiring less set up from the programmer to get running • On Nvidia hardware, much more updated versions of CUDA than OpenCL are supported with increased functionality • However, CUDA only operates on Nvidia hardware, where OpenCL will operate on hardware from Nvidia, AMD, Intel and others • Both have similar default languages based on C, and moving code between the two would require minimal modification

OpenCL Comparison - DirectCompute • DirectCompute is Microsoft’s GPGPU framework, linked to their DirectX API • DirectCompute operates on any video card capable of supporting DirectX 10 or 11 • However, able to run on more cards, DirectCompute is platform dependent and will only run on Windows • DirectCompute’s programming language is based on HLSL, or High-level shader language, Microsoft’s proprietary language, so moving code between DirectCompute and OpenCL would be more difficult

OpenCL – The Future • OpenCL has a firm base of support from AMD, Intel and Apple, among others • There is a cross industry effort to try to make getting into OpenCL programming easier

OpenCL – The Future • However, there are still possible speed bumps, such as Nvidia’s lack of support for newer versions • Microsoft can also be a tough competitor in any market space and has a huge install base with DirectX both on the desktop and in consoles

References • Gaster, Benedict. Heterogeneous computing with OpenCL, revised OpenCL 1.2 edition. Rev. OpenCL 1.2 ed. Waltham, Mass.: Morgan Kaufmann, 2013. Print. • Banger, Ravishekhar. OpenCL programming by example: a comprehensive guide on OpenCL programming with examples. Birmingham: Packt, 2013. Print. • "OpenCL Zone - AMD." AMD RSS2. N.p., n.d. Web. . <http://developer.amd.com/resources/heterogeneous-computing/opencl-zone/>. • "OpenCL." NVIDIA Developer Zone. N.p., n.d. Web. . <https://developer.nvidia.com/opencl>. • "WHAT IS GPU COMPUTING?." What is GPU Computing?. N.p., n.d. Web. . <http://www.nvidia.com/object/what-is-gpu-computing.html>. • "The Khronos Group." The Khronos Group Inc.. N.p., n.d. Web. . <https://www.khronos.org/>.

OpenCL

OpenCL

Presentation Transcript

OpenCL

OpenCL

OPENCL OVERVIEW

Evolution of OpenCL *

OpenCL

OpenCL

OpenCL Extensions

OpenCL ch. 2~4

OpenCL

Introduction to OpenCL

OpenCL

OpenCL Particle System

OpenCL Particle System

OpenCL

OpenCL Particle System

OpenCL

OpenCL™

OpenCL

OpenCL

Portability with OpenCL