200 likes | 504 Vues
Few words about OpenCL. Dmytro Konobrytskyi. Content . Introduction Supported devices OpenCL “Hello world” OpenCL vs. CUDA performance Conclusion / Questions. History.
E N D
Few words about OpenCL Dmytro Konobrytskyi Dmytro Konobrytskyi
Content • Introduction • Supported devices • OpenCL “Hello world” • OpenCL vs. CUDA performance • Conclusion / Questions Dmytro Konobrytskyi
History • Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing unit (CPUs), graphics processing unit (GPUs), and other processors. • OpenCL 1.0 - 08 December 2008 • 5 August 2009, AMD: SDK • 28 September 2009, Nvidia: drivers and SDK • OpenCL 1.1 - 14 June 2010 • OpenCL 1.2 - 15 November 2011 Dmytro Konobrytskyi
Supported devices • NVidia GPU • On CUDA architecture • AMD CPU, GPU, APU • Intel CPU • SSE/AVX support • IGP support starting with Ivy Bridge • Multi-core ARM CPUs • ZiiLABS (Creative) ZMS processor • Mobile Phone GPUs? Dmytro Konobrytskyi
OpenCL Platform Model Dmytro Konobrytskyi http://www.nvidia.com/content/GTC/documents/1409_GTC09.pdf
OpenCL memory model Dmytro Konobrytskyi
OpenCL “Hello world” Go to Visual Studio Dmytro Konobrytskyi
OpenCL vs. CUDA performance Dmytro Konobrytskyi
OpenCL vs. CUDA performance • How do we have to compare performance? • The same code vs. Optimized algorithms • The latest drivers vs. The most stable drivers • The latest hardware vs. The most popular • Raw math and simple algorithms vs. Real world complicated algorithm vs. All possible algorithms • We need to remember that NV GPUs actually use the same set of commands for both CUDA and OpenCL Dmytro Konobrytskyi
OpenCL vs. CUDA performance • There is no the right answer for all these questions and testing requires a lot of time. • And testing results actually may be not valid with new drivers. • So we will look on existing testing results available in the web from different people, different algorithms and hardware and we will try to see the trends in these data. Dmytro Konobrytskyi
www.sisoftware.net Sep 2009 • Typical Arithmetic Results • Environment: Windows Vista x64 SP2; Catalyst 9.11 video / STREAM 1.4.427 / OpenCL 1.0 Beta 4; ForceWare 190.89 video / CUDA 2.3 / OpenCL 1.0 live release. Dmytro Konobrytskyi http://www.sisoftware.net/?d=qa&f=gpgpu_gpu_perf&l=en&a=oca
www.sisoftware.net Sep 2009 • Typical Memory Bandwidth Results • Environment: Windows Vista x64 SP2; Catalyst 9.11 video / STREAM 1.4.427 / OpenCL 1.0 Beta 4; ForceWare 190.89 video / CUDA 2.3 / OpenCL 1.0 live release. Dmytro Konobrytskyi http://www.sisoftware.net/?d=qa&f=gpgpu_gpu_perf&l=en&a=oca
Accelereyes blog May 2010 C2050 Dmytro Konobrytskyi http://blog.accelereyes.com/blog/2010/05/10/nvidia-fermi-cuda-and-opencl/
Paper: A Performance Comparison of CUDA and OpenCL by KamranKarimi May 2010 • Adiabatic QUantum Algorthms (AQUA), a Monte Carlo simulation Kernel execution and GPU data transfer times in seconds. Dmytro Konobrytskyi http://arxiv.org/abs/1005.2581
Accelereyes webinar Feb 2012 Accelereyes: Our OpenCL support is new and not nearly as mature as our support of CUDA. But our initial OpenCL support is better than our initial CUDA support was when we first launched our CUDA products. And we expect OpenCL to continue to mature rapidly in the near future. Dmytro Konobrytskyi http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/
Kyle Spafford of Oak Ridge National Laboratory: Feb 2012 Dmytro Konobrytskyi http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2012-02-20/13-shoc.pdf
Kyle Spafford of Oak Ridge National Laboratory: Dmytro Konobrytskyi http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2012-02-20/13-shoc.pdf
Kyle Spafford of Oak Ridge National Laboratory: Dmytro Konobrytskyi http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2012-02-20/13-shoc.pdf
OpenCL vs. CUDA performance • Conclusions: • Performance of simple math operations was the same initially and the same now; • OpenCL does not have access to few hardware instructions and algorithms which use them are slower (texture, cache size selection); • OpenCL uses more accurate special functions by default (but can use native functions); • OpenCL was slower initially but the modern implementation is as fast as CUDA. Dmytro Konobrytskyi
Conclusion suppose to be here but let’s just discuss it together Dmytro Konobrytskyi