Introduction to Parallel Computing and MPI
200 likes | 240 Vues
Learn about parallel computing and how it can be useful, including different parallel paradigms, how to parallelize problems, and an overview of the Message Passing Interface (MPI) standard.
Introduction to Parallel Computing and MPI
E N D
Presentation Transcript
FLASH TutorialMay 13, 2004 Parallel Computing and MPI
What is Parallel Computing ?And why is it useful • Parallel Computing is more than one cpu working together on one problem • It is useful when • Large problem, could take very long • Data size too big to fit in the memory of one processor • When to parallelize • Problem could be subdivided into relatively independent tasks • How much to parallelize • While the speedup in computation relative to single processor is of the order of number of processors
Parallel paradigms • SIMD – Single instruction multiple data • Processors work in lock-step • MIMD – Multiple instruction multiple data • Processors do their own thing with occasional synchronization • Shared Memory • One way communications • Distributed Memory • Message passing • Loosely Coupled • When the process on each cpu is fairly self contained and relatively independent of processes on other cpu’s • Tightly Coupled • When cpu’s need to communicate with each other frequently
How to Parallelize • Divide a problem into a set of mostly independent tasks • Partitioning a problem • Tasks get their own data • Localize a task • They operate on their own data for the most part • Try to make it self contained • Occasionally • Data may be needed from other tasks • Inter-process communication • Synchronization may be required between tasks • Global operation • Map tasks to different processors • One processor may get more than one task • Task distribution should be well balanced
New Code Components • Initialization • Query parallel state • Identify process • Identify number of processes • Exchange data between processes • Local, Global • Synchronization • Barriers, Blocking Communication, Locks • Finalization
MPI • Message Passing Interface, standard for distributed memory model of parallelism • MPI-2 will support one-way communication, commonly associated with shared memory operations • Works with communicators; a collection of processors • MPI_COMM_WORLD default • Has support for lowest level communication operations and composite operations • Has blocking and non-blocking operations
Communicators COMM1 COMM2
Low level Operations in MPI • MPI_Init • MPI_Comm_size • Find number of processors • MPI_Comm_rank • Find my processor number • MPI_Send/Recv • Communicate with other processors one at a time • MPI_Bcast • Global data transmission • MPI_Barrier • Synchronization • MPI_Finalize
Advanced Constructs in MPI • Composite Operations • Gather/Scatter • Allreduce • Alltoall • Cartesian grid operations • Shift • Communicators • Creating subgroups of processors to operate on • User-defined Datatypes • I/O • Parallel file operations
0 1 2 0 1 All to All 2 3 0 1 Point to Point 2 3 Collective 0 1 2 3 0 1 2 3 One to All Broadcast Shift Communication Patterns
Communication Overheads • Latency vs. Bandwidth • Blocking vs. Non-Blocking • Overlap • Buffering and copy • Scale of communication • Nearest neighbor • Short range • Long range • Volume of data • Resource contention for links • Efficiency • Hardware, software, communication method
Parallelism in FLASH • Short range communications • Nearest neighbor • Long range communications • Regridding • Other global operations • All-reduce operations on physical quantities • Specific to solvers • multi-pole method • FFT based solvers
Domain Decomposition P1 P0 P2 P3
Border Cells / Ghost Points • When splitting up solnData, need data from other processors. • Need a layer of cells from each processor • Need to update each time step
Border/Ghost Cells Short Range communication
MPI_Cart_create Create topology MPE_Decomp1d Domain decomp on topology MPI_Cart_shift Who’s on the left/right? MPI_SendRecv Ghost cells left MPI_SendRecv Ghost cells right MPI_Comm_rank MPI_Comm_size Manually decompose grid over processors Calculate left/right MPI_Send/MPI_Recv Carefully to avoid deadlocks Two MPI Methods for doing it
Adaptive Grid Issues • Discretization not uniform • Simple left-right guard cell fills inadequate • Adjacent grid points may not be mapped to the nearest neighbors in processors topology • Redistribution of work necessary
Regridding • Change in number of cells/blocks • Some processors get more work than others • Load imbalance • Redistribute data to even out work on all processors • Long range communications • Large quantities of data moved
Other parallel operations in FLASH • Global max/sum etc (Allreduce) • Physical quantities • In solvers • Performance monitoring • Alltoall • FFT based solver on UG • User defined datatypes and file operations • Parallel I/O