140 likes | 273 Vues
This document presents an overview of OpenCL and its applications within the Amsterdam Medical Centre internship program. It delves into the key concepts of OpenCL, including its runtime system, kernel execution, and accelerated devices. The text discusses the challenges of low-level C/C++ programming in OpenCL and emphasizes the importance of high-level frameworks for easing application development. It also describes dataflow processing frameworks and how they can enhance computational power using heterogeneous architectures across various devices, enabling better load balancing and scheduling.
E N D
Processing Framework Sytse van Geldermalsen Masters Grid Computing – University of Amsterdam Internship at Amsterdam Medical Centre
Contents • OpenCL • Concepts • Problems • Research and projects • Processing Framework • Example
OpenCL • Requires vendor support • ARM, AMD, Intel, Apple, Vivante Corporation, STMicroelectronics International NV, IBM Corporation, Imagination Technologies, Creative Labs, NVIDIA • Portable • Works on heterogeneous architecture • Provides great computational power http://www.khronos.org/opencl/
How much computational power? http://www.r-bloggers.com/cpu-and-gpu-trends-over-time/
Key Concepts • OpenCL - Runtime system • Kernel • Accelerated Device
OpenCL Kernel // Sequential c/c++ code for(intx=0;x<1024;x++) { for(inty=0;y<1024;y++) { matrix[x][y]=matrix[x][y]+1;// Code is run 1048576 times for this thread } } // Parallel kernel code kernelvoidMatrixIncrement(globalint**matrix) { intx=get_global_id(0); inty=get_global_id(1); matrix[x][y]=matrix[x][y]+1; // Code is run once for this thread }
Problems • Low level C/C++ Library • A lot of overhead code • Things can and will go wrong
Ease of OpenCL application development High Level Frameworks Increase ease of application development Tools: Debuggers, Profilers, Middleware/Library: Video, Imaging, Math/Sciences, Physics Wrappers C++, C#, Java, Python, Javascript OpenCL C Library Drivers and Hardware, CPU’s, GPU’s, Cell Processors, FPGA’s
High Level Frameworks • Research has been done in: • Scheduling multiple kernels on device • Overlapping memory transfers with kernel execution • Load balancing • Distribution over GPU’s on the grid • Task scheduling
Dataflow Processing Framework • In a nutshell: • Based on ideas of different research • Increase the ease of development • Uses the dataflow concept • Simplicity • Asynchronous overlapped data transfers and kernel executions
Conceptual Example Input B Input A Legend: Async Process Async memory xfer CPU Process GPU Process Data Dependency Data Output 1 2 3 4 5 6 Output A Output B
Programming with the Framework • Programmer defines a number of processes and data • The process uses a OpenCL kernel or a standard C/C++ function • User defines the arguments of the kernel with the defined data • These processes compute on user selected device: CPU/ GPU/FPGA… etc • Signal the framework to run
Programming Example Framework ProcessingFrameworkpf; ProcessingComponentone,two,three,four; DeviceMemoryArrayA,ArrayB,Output; ArrayA=pf.CreateInputMemory(mem_size); ArrayB=pf.CreateInputMemory(mem_size); Output=pf.CreateOutputMemory(mem_size); one=pf.CreateAPC(pf.GPUDevice(),"Sort"); two=pf.CreateAPC(pf.CPUDevice(),"Sort"); three=pf.CreateAPC(pf.GPUDevice(),"Filter"); four=pf.CreateAPC(pf.CPUDevice(),"Search"); one.SetArg(0,ArrayA); one.SetWorkSize(arr_size); two.SetArg(0,ArrayB); two.SetWorkSize(arr_size); three.SetDependency(one,0,ArrayA); three.SetDependency(two,1,ArrayB); three.SeWorkSize(arr_size); four.SetDependency(three,0,ArrayA); four.SetArg(1,Output); four.SetWorkSize(arr_size); pf.Run(); Array A Array B 1 Sort 2 Sort 3 Filter 4 Search Output