High Performance Computing with CUDA™ Supercomputing 2011 Tutorial

High Performance Computing with CUDA™Supercomputing 2011 Tutorial Cyril Zeller, NVIDIA Corporation

Welcome • Goal: an introduction to high performance computing with CUDA • CUDA = NVIDIA’s architecture for GPU computing • Outline: • Motivation and introduction • CUDA C/C++ • CUDA Fortran and CUDA libraries • Optimizations • Multi-GPU programming • Case studies

GPUs are Fast! 8x Higher Linpack CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz,48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw

World’s Fastest MD Simulation Sustained Performance of 1.87 Petaflops/s Institute of Process Engineering (IPE) Chinese Academy of Sciences (CAS) Used all 7168 Tesla GPUs on Tianhe-1A GPU Supercomputer MD Simulation for Crystalline Silicon

World’s Greenest Petaflop Supercomputer Tsubame 2.0Tokyo Institute of Technology 1.19 Petaflops 4,224 Tesla M2050 GPUs

Increasing Number of Professional CUDA Applications Available Now Future • TauCUDA • Perf Tools • PGI CUDA-X86 • CUDA C/C++ • Parallel Nsight • Vis Studio IDE • NVIDIA • Video Libraries • ParaTools • VampirTrace • PGI • Accelerators • EMPhotonics • CULAPACK • Allinea DDTDebugger • Platform LSF • Cluster Mgr • GPU.net Tools & Libraries • NVIDIA NPP • Perf Primitives • PGI Fortran • Thrust C++ • Template Lib • Bright Cluster • Manager • MAGMA • GPU Packages • For R Stats Pkg • CAPS HMPP • pyCUDA • R-Stream • Reservoir Labs • PBSWorks • MOAB • Adaptive Comp • Torque • Adaptive Comp • TotalView • Debugger • IMSL • Headwave Suite • OpenGeoSolns • OpenSEIS • GeoStar Seismic • Acceleware • RTM Solver • StoneRidge • RTM • Seismic City • RTM • Tsunami • RTM • Schlumberger • Petrel Oil & Gas • Paradigm • VoxelGeo • Paradigm • GeoDepth RTM • ffA SVI Pro • Paradigm • SKUA • VSG • Open Inventor • VSG • Avizo • SVI Pro • SEA 3D • Pro 2010 • Schlumberger • Omega Numerical Analytics • LabVIEW • Libraries • AccelerEyes • Jacket: MATLAB • MATLAB • Mathematica • Murex • MACS • NumerixCounterpartyRisk • Aquimin • AlphaVision • HanweckVolera • Options Analysi • NAG • RNG • SciComp • SciFinance Finance • Siemens • 4D Ultrasound • Digisens • CT • Schrodinger • Core Hopping • Useful Prog Medical Imag • ASUCA • Weather Model Other • MVTech • Mach Vision • Manifold • GIS • Dalsa • Mach Vision • WRF • Weather • Announced • Available

Increasing Number of Professional CUDA Applications Available Now Future • Acellera • ACEMD • AMBER • NAMD • GROMACS • GROMOS • HOOMD Bio- Chemistry • GAMESS • TeraChem • BigDFT • ABINT • VMD • LAMMPS • DL-POLY • CUDA-BLASTP • CUDA-EC • CUDA-MEME • CUDA SW++ • OpenEye ROCS Bio-Informatics • GPU-HMMR • MUMmerGPU • PIPER • Docking • HEX Protein • Docking • Agilent • EMPro 2010 • CST Microwave • SPEAG • SEMCAD X • ANSOFT Nexxim • Agilent ADS • SPICE Sim • Remcom • XFdtd • Synopsys • TCAD • Gauda OPC EDA • Metacomp • CFD++ • ACUSIM/Altair • AcuSolve • Autodesk • Moldflow • ANSYS • Mechanical • SIMULIA • Abaqus/Std • Impetus • AFEA • FluiDynaCulisesOpenFOAM • LSTC • LS-DYNA 972 • MSC.Software • Marc CAE • Adobe • Premier Pro • Elemental • Live & Server • MS Expression Encoder • MotionDSP • Ikena Video • MainConcept • CUDA H.264 • Sorenson • Squeeze 7 • Fraunhofer • JPEG2000 Video • Bunkspeed • Shot (iray) • Refractive SW • Octane • Chaos GroupV-Ray RT • Autodesk 3ds Max (iray) • Dassault • Catia v6 (iray) • Cebas • finalRender • Lightworks • Artisan, Author Rendering • mental images • iray (OEM) • NVIDIA OptiX (SDK) • Caustic • OpenRL (SDK) • Weta Digital • PantaRay • Works Zebra • Zeany • Announced • Available

CUDA by the Numbers 300,000,000 CUDA Capable GPUs 500,000 CUDA Toolkit Downloads 100,000 Active CUDA Developers 400 Universities Teaching CUDA 100 % OEMs offer CUDA GPU PCs

GPU Computing Applications GPU Computing Applications Libraries & Middleware C C++ CUBLAS OpenCL™ CUFFT CULAPACK Direct Compute NPP & CUDPP Video Fortran PhysXPhysics OptiXRay tracing Java & Python mental rayirayRendering Reality Server3D web services NVIDIA GPUwith CUDA Parallel Computing Architecture OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.

Tesla Data Center & Workstation GPU Solutions Tesla C-series GPUs C2070| C2050 Tesla M-series GPUs M2090 | M2070 | M2050 Servers & Blades Workstations

NVIDIA Developer Ecosystem Parallelizing Compilers GPU Compilers Numerical Packages Debuggers & Profilers C C++ Fortran OpenCL DirectCompute Java Python PGI Accelerator CAPS HMPP mCUDA OpenMP MATLAB Mathematica NI LabView pyCUDA cuda-gdb NV Visual Profiler Parallel Nsight Visual Studio Allinea TotalView Libraries BLAS FFT LAPACK NPP Video Imaging GPULib GPGPU Consultants & Training OEM Solution Providers ANEO GPU Tech

Parallel Nsight Visual Studio Visual Profiler Windows/Linux cuda-gdb Linux/Mac

Schedule Beginner Beginner Intermediate IntermediateAdvanced

Schedule IntermediateAdvanced IntermediateAdvanced

High Performance Computing with CUDA™ Supercomputing 2011 Tutorial