Basics of Message Passing for Parallel Programming Efficiency

Basics of Message-passing • Mechanics of message-passing • A means of creating separate processes on different computers • A way to send and receive messages • Single program multiple data (SPMD) model • Logic for multiple processes merged into one program • Control Statements separate processor blocks of logic • A compiled program is stored on each processor • All executables are started together statically • Example: MPI • Multiple program multiple data (MPMD) model • Each processor has a separate master program • Master program spawns child processes dynamically • Example: PVM

Process 1 Process 2 x y send(&x, 2); recv(&y, 1); Generic syntax (actual formats later) Point-to-point Communication • General syntax Send(data, destination, message tag) Receive(data, source, message tag) • Synchronous • Completes after data safely transferred • No copying between message buffers • Asynchronous • Completes when transmission begins • Local buffers are free for application use

Process 1 Process 2 Time Request to send send(); Suspend Ac kno wledgment recv(); process Both processes Message contin ue (a) send() occurs before recv() Process 1 Process 2 Time recv(); Suspend Request to send process send(); Message Both processes contin ue Ac kno wledgment Synchronized sends and receives (b) recv() occurs before send()

Synchronous Send - completes when data is successfully received Buffered Send - completes after data is copied to a user supplied buffer Becomes synchronous if no buffers are available Ready Send – synchronous; matching receive must precede the send Completion occurs when remote processor receives the data A matching receive must precede the send Receive - completes when the data becomes available. Standard Send If receive posted, completes if data is on its way (asynchronous) If no receive posted, completes when data is buffered by MPI Becomes synchronous if no buffers are available Blocking - Return occurs when the call completes Non-Blocking - Return occurs immediately Application is responsible to properly poll or wait for completion Allows more parallel processing Point to Point MPI calls

Buffered Send Example Applications supply a data buffer area using MPI_Buffer_attach() to hold the data during transmission

Message Tags • Differentiates between types of messages • The message tag is carried within message. • Wild card receive operations • MPI_ANY_TAG: matches any message type • MPI_ANY_SOURCE: matches messages from any sender Send message type 5 from buffer x to buffer y in process 2

Collective Communication Route a message to a communicator (group of processors) • MPI_Bcast()): Broadcast or Multicast data to processors in a group • Scatter (MPI_Scatter()): Send part of an array to separate processes • Gather (MPI_Gather()): Collect array elements from separate processes • AlltoAll (MPI_Alltoall()): A combination of gather and scatter • MPI_Reduce(): Combine values from all processes to a single value • MPI_Reduce_scatter(): Combination of reduce and then scatter result • MPI_Scan(): Perform a prefix reduction on data on all processors • MPI_Barrier(): Pause until all processors reach the barrier call • Advantages: • MPI can use the processor hierarchy to improve efficiency • Less programming needed for collective operations

Process 0 Process 1 Process p - 1 data data data Action buf bcast(); bcast(); bcast(); Code MPI f or m Broadcast Broadcast - Sending the same message to all processes Multicast - Sending the same message to a defined group of processes.

Process 0 Process 1 Process p 1 - data data data Action buf scatter(); scatter(); scatter(); Code MPI f or m Scatter Distributing each element of an array to separate processes Contents of the ith location of the array transmits to process i

Process 0 Process 1 Process p 1 - data data data Action buf gather(); gather(); gather(); Code MPI f or m Gather One process collects individual values from set of processes.

Process 0 Process 1 Process p 1 - data data data Action buf + reduce(); reduce(); reduce(); Code MPI f or m Reduce Perform a distributed calculation Example: Perform addition over a distributed array

PVM (Parallel Virtual Machine) • Oak Ridge National Laboratories, Free distribution • Host process controls the environment • Parent process spawns other processes • Daemon processes control message passing • Get Buffer, Pack, Send, Unpack • Non-blocking send, blocking or non-blocking receive • Wild card tags and process source • Sample Program: Page 52

mpij and MpiJava • Overview • MpiJava is a wrapper sitting on mpich or lamMpi • mpij is a native Java implementation of mpi • Documentation • MpiJava (http://www.hpjava.org/mpiJava.html) • mpij (uses the same API as MpiJava) • Java Grande consortium (http://www.javagrande.org) • Sponsors conferences & encourages Java for Parallel Programming • Maintains Java based paradigms (mpiJava, HPJava, and mpiJ) • Other Java based implementations • JavaMpi is another less popular MPI Java wrapper

SPMD Computation main (int argc, char *argv[]) { MPI_Init(&argc, &argv); . . MPI_Comm_rank(MPI_COMM_WORLD, &myrank); if (myrank == 0) master(); else slave(); . . MPI_Finalize(); } The master process executes master() The slave processes execute slave()

Unsafe message passing Delivery order is a function of timing among processors

Communicators A collection of processes • Communicators allow: • Collective communication to groups of processors • Give mechanism to identify processors for point to point transfers • The default communicator is MPI_COMM_WORLD • A unique rank corresponds to each executing process • The rank is an integer from 0 to p – 1 • The number of processors executing is p • Applications can create subset communicators • Each processor has a unique rank in each sub-communicator • The rank is an integer from 0 to g-1 • The number of processors in the group is g

Point-to-point Message Transfer MPI_Comm_rank(MPI_COMM_WORLD,&myrank); int x; MPI_Status *stat; if (myrank == 0) { MPI_Send(&x,1,MPI_INT,1,99,MPI_COMM_WORLD); } else if (myrank == 1) { MPI_Recv(&x,1,MPI_INT,0,99,MPI_COMM_WORLD,stat); }

MPI_Comm_rank(MPI_COMM_WORLD, &myrank); int x; MPI_Request *io; MPI_STATUS *stat; if (myrank == 0) { MPI_Isend(&x,1,MPI_INT,1,99,MPI_COMM_WORLD,io); doSomeProcessing(); MPI_Wait(io, stat); } else if (myrank == 1) { MPI_Recv(&x,1,MPI_INT,0,99,MPI_COMM_WORLD,stat); } MPI_Isend() and MPI_Irecv() return immediately MPI_Wait() returns after the operation completes MPI_Test() returns a non-zero if the operation is complete Non-blocking Point-to-point Transfer

Collective Communication Example • Processor 0 gather items from a group of processes • The master processor allocates memory to hold the data • The remote processors initialize the data array • All processors execute the MPI_Gather() function int data[10]; /*data to gather from processes*/ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); if (myrank == 0) { MPI_Comm_size(MPI_COMM_WORLD, &grp_size); buf = (int *)malloc(grp_size*10*sizeof (int)); } else { for (i=0; i<10; i++) data[i] = myrank; } MPI_Gather(data, 10, MPI_INT, buf, grp_size*10 , MPI_INT, 0,MPI_COMM_WORLD) ;

Calculate Parallel Run Time Sequential execution time:ts • ts = # compute steps of best sequential algorithm (Big Oh) Parallel execution time:tp = tcomp + tcomm • Communication overhead: Tcomm = m(tstartup + ntdata)wheretstartup is message latency = time to send a message with no datatdata is transmission time to send one data elementn is the number of data elements, m is the number of messages • Computation overhead tcomp=f (n, p)) • Assumptions • All processors are homogeneous and run at the same speed • Tp = worst case execution time over all processors • Tstartup, tdata, and tcomp are measured in computational step units so they can be added together

Estimating Scalability • Notes: • p and n respectively indicate number of processors and data elements • The above formulae help estimate scalability with respect to p and n

Parallel Visualization Tools Observe using a space-time diagram (or process-time diagram)

Parallel Program Development • Cautions • Parallel program programming is harder than sequential programming • Some algorithms don’t lend themselves to running in parallel • Advised Steps of Development • Step 1: Program and test as much as possible sequentially • Step 2: Code the Parallel version • Step 3: Run in parallel; one processor with few threads • Step 4: Add more threads as confidence grows • Step 5: Run in parallel with a small number of processors • Step 6: Add more processes as confidence grows • Tools • There are parallel debuggers that can help • Insert assertion error checks within the code • Instrument the code (add print statements) • Timing: time(), gettimeofday(), clock(), MPI_Wtime()

Basics of Message Passing for Parallel Programming Efficiency

Basics of Message Passing for Parallel Programming Efficiency

Presentation Transcript

Message Passing Basics

Message Passing

Message Passing Communication

Message Passing Models

Message Passing

Message-Passing

Message Passing

Message Passing Interface

Message Passing

Message Passing

Message-Passing Computing

Message Passing

Message-Passing Computing

Basics of Message-passing

Message Passing Interface

Message Passing Interface

Message Passing Interface

Visualization of Message Passing

Message Passing Computing