Combining MPI and GPGPU for Monte Carlo Applications

Combining MPI and GPGPU for Monte Carlo Applications Andrew Dunkman Nathan Eloe

Overview • Monte Carlo Simulations: The Ising Model • Two approaches to parallel solution: • MPI • GPGPU (CUDA/OpenCL) • Combining solutions (MPI/GPGPU)

Monte Carlo Simulation • Method of evaluating multidimensional integrals. • Take weighted average of points in bounded area as approximation of the integral’s value. • Applications in Statistics and Physics. • Inherently parallelizable.

The Ising Model A specific Monte Carlo Simulation • Simplistic model of a magnetic material. • Square lattice of sites, each with magnetic spin [-1,+1] • During each Monte Carlo sweep, each site is given the chance to change its polarity. • The probability of a site changing its polarity is

Parallelizing Monte Carlo Using MPI • Every site can be calculated independently of the other sites. • Only important information is the state of the system at the beginning of the sweep. • Only the sites adjacent to the site of interest are important to the calculation • Perfect for a mesh configuration of processors

Parallelizing Monte Carlo Optimizing MPI’s Communication • Assuming a n by n square lattice, communicating the entire lattice after every sweep would give a communication cost of O(n2) per sweep. • This is unnecessary. We only need to communicate the sites along the boundaries of the data separation, as well as some aggregate data per sweep. • Can reduce this communication cost to O(n)

Communication Example: P=4 n P0 P1 4 communications, N elements each Even if you gather the change in energy and magnetism per sweep at P0, you are still looking at O(n) elements to communicate. n P2 P3

Parallelizing Monte Carlo Using MPI Pros Cons Communication overhead between all processes. Each process still has to sequentially calculate every site in its block • Reducing the problem size per process (for N>>P). • Communication can be reduced to O(n) between sweeps.

Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) What is GPGPU?

What is GPGPU? • General Purpose Programming on a GPU (GPGPU) is an relatively new trend in parallel computing. • Graphics adapters are heavily optimized for floating point math (core of graphics applications). • GPU is controlled by host process (CPU).

What are OpenCL and CUDA? • OpenCL is an open standard supported by all modern graphics card manufacturers that allows access to the graphics card’s computing abilities. • CUDA is a set of extensions on top of OpenCL specific to nVidia graphics cards.

OpenCLand CUDA

OpenCLand APP (Stream)

Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) • One O(n2) communication at beginning • Can be avoided if the card can generate the initial state. • After each Monte Carlo sweep, no communication of the global state to the host (CPU) is needed.

Parallelizing Monte Carlo Advantages to GPGPU (CUDA/OpenCL) • FLOPs are crazy fast on GPUs. • Modern GPUs have a large number of processing units onboard. • The new GeForce GTX 580 has 512 CUDA Cores • On card memory communication very fast. • Shared memory to some degree. • Closer to P=n

CUDA vs CPU (Single Thread)

Resulting Lattice

Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) Pros Cons Limited memory The GTX 580 only has <1.5 Gb RAM (albeit very fast RAM) • Fast, closer to P=n • Possible one time communication, very little communication between sweeps. • Communication speed only limited by bandwidth of PCIe bus

So, can we use BOTH!?

Parallelizing Monte Carlo Using MPI and GPGPU (CUDA/OpenCL) • Use same communication scheme as MPI alone. • Divide problem over hosts, and only communicate bordering sites between sweeps • Only update the changed information between the host and the GPU

Communication Example: P=4 n P0 P1 n P2 P3

Code Demo

Questions? Bibliography Image Sources http://www.nvidia.com/object/product-geforce-gtx-580-us.html http://pressroom.nvidia.com/easyir/imga.do?easyirid=A0D622CE9F579F09 amd.com • Kirk, David, Wen-Mei W., and Wen-meiHwu.Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2010. Print. • Pang, Tao. Introduction to Computational Physics. [S.l.]: Cambridge Univ, 2005. Print.

Combining MPI and GPGPU for Monte Carlo Applications

Combining MPI and GPGPU for Monte Carlo Applications

Presentation Transcript

Monte Carlo Simulation

Monte Carlo Simulation

Monte Carlo Simulation

Monte Carlo

Monte Carlo Simulation

Combining Tensor Networks with Monte Carlo: Applications to the MERA

Monte Carlo Methods

Combining Monte Carlo Estimators

Optimally Combining Sampling Techniques for Monte Carlo Rendering

Applications of Extended Ensemble Monte Carlo

Monte Carlo Simulations

Monte-carlo and Bootstrapping

The Virtual Monte-Carlo, status and applications

Monte Carlo Integration

Monte-Carlo Methods

Applications of Extended Ensemble Monte Carlo

Monte Carlo Issues

Monte Carlo I