Parallel Programming using MPI

Parallel Programming using MPI by Phil Bording Husky Energy Chair in Oil and Gas Research Memorial University of Newfoundland Parallel Computing - MPI/OpenMP

Table of Contents • Introduction • Program Structure • Defining Parallel Computing • Domain Decomposition Parallel Computing - MPI/OpenMP

Rank and Structure • Each processor has a name or rank • Rank=name=number • identification • Processor organization • Structure = Communicator • The commucation chain How the problem communicates defines the needed structure! Parallel Computing - MPI/OpenMP

Processor Rank • 16 Processors – Rank 0 to 15 15 0 6 1 5 Realize that the rank position is relative and could be structured differently as needed. Parallel Computing - MPI/OpenMP

Processor Rank with Structure • 16 Processors – Rank 0 to 15 15 0 0 1 2 3 6 1 4 5 6 7 5 8 9 10 11 Realize that the rank position is relative and could be structured differently as needed. 12 13 14 15 Parallel Computing - MPI/OpenMP

SPMD Parallel Computing • Simple Code Example - almost correct Integer Rank Call MPI_INIT(return_code) Dimension Psi(0:100) Call MPI_Rank(Rank,return_code) Write(6,*) Rank Do i=0,Rank Psi(i) = i Enddo Write(6,*) (Psi(i),i=0,Rank) Call MPI_finish(return_code) End Parallel Computing - MPI/OpenMP

SPMD Parallel Computing • Simple Code Example - almost correct • Assuming four parallel processes • The Output looks like this • 0 • 0.0 • 2 • 0.0,1.0,2.0 • 3 • 0.0,1.0,2.0,3.0 • 1 • 0.0,1.0 MPI has no standard for the sequence of appearance in output streams Parallel Computing - MPI/OpenMP

SPMD Parallel Computing We’ll get back to MPI coding after we figure out how we are going to do the domain decomposition. The Omega Domain Ώ Ώ0 Ώ1 Ώ2 Parallel Computing - MPI/OpenMP

Discussion Time Parallel Computing - MPI/OpenMP

Domain Decomposition • Subdivision of problem domain into parallel regions • Example using 2 dimensional data arrays • Linear One Dimension versus • Grid of Two Dimensions Parallel Computing - MPI/OpenMP

Single Processor Memory Arrays, Nx by Ny Dimension Array (Nx,Ny) Parallel Computing - MPI/OpenMP

Multiple Processor Memory Arrays, Nx/2 by Ny/2 4 Processors Two way decomposition Parallel Computing - MPI/OpenMP

Multiple Processor Memory Arrays, Nx by Ny/3 3 Processors One way decomposition Parallel Computing - MPI/OpenMP

Multiple Processor Memory Arrays, Nx/3 by Ny 3 Processors One way decomposition – the other way Parallel Computing - MPI/OpenMP

So which one is better? Or does it make a difference?? One way decomposition – one way or the other? Parallel Computing - MPI/OpenMP

Dimension Array (Nx,Ny) becomes Dimension Array (Nx/3,Ny) or Dimension Array (Nx,Ny/3) The Nx/3 in Fortran has shorter do loop lengths in the fastest moving index. Which could limit performance. Further, the sharing of data via message passing will have non-unit stride data access patterns. Parallel Computing - MPI/OpenMP

So the design issue becomes one of choice for the programming language Decide which language you need to use and then Create the decomposition plan Parallel Computing - MPI/OpenMP

Realize of course that a one-dimension decomposition has on Np_one processors And a two dimensional decomposition could have Np_two x Np_two. So in the design of your parallel code you would have to be aware of your resources. Further, few if any programs scale well and being realistic about the number of Processors to be used in important in deciding how hard you want to work at the parallelization effort. Parallel Computing - MPI/OpenMP

Discussion Time Parallel Computing - MPI/OpenMP

Processor Interconnections • Communication hardware connects the processors • These wires carry data and address information • The best interconnection is the most expensive -- all machines have a direct connection to all other machines • Because of cost we have to compromise Parallel Computing - MPI/OpenMP

Processor Interconnections • The slowest is also the cheapest • We could just let each machine connect to some other machine in a daisy chain fashion. • Messages would bump along until they reach their destination. • What other schemes are possible? Parallel Computing - MPI/OpenMP

Processor Interconnections • The Linear Daisy Chain • The Binary tree • The Fat Tree • The FLAT network • The Hypercube • The Torus • The Ring • The Cross Bar • And many, many others Parallel Computing - MPI/OpenMP

The Linear Daisy Chain Processor 0 Processor 1 Processor 2 Parallel Computing - MPI/OpenMP

The Cross Bar Processor 0 Processor 1 Processor 2 Processor 0 Processor 1 Processor 2 O(1) but Switch is O(n^2) The fastest and most expensive Parallel Computing - MPI/OpenMP

Lets Look at the Binary TreeO(Log N) Parallel Computing - MPI/OpenMP

Lets Look at the Fat TreeO(Log N)+ Parallel Computing - MPI/OpenMP

Lets Look at the Hypercube Order 1 Order 2 Order 3 Duplicate and connect the edges together Parallel Computing - MPI/OpenMP

Lets Look at the Binary Tree • Every node can reach every other node • Has Log N connections, 32 nodes have 5 levels • Some neighbors are far apart • Little more expensive • Root is a bottleneck Parallel Computing - MPI/OpenMP

Lets Look at the Fat Tree • Every node can reach every other node • Has Log N connections, 32 nodes have 5 levels • Some neighbors are far apart • Even more expensive • Root bottleneck is better managed • Each level has multiple connections Parallel Computing - MPI/OpenMP

Fortran MPI Commands INCLUDE 'mpif.h' MPI_INIT(ierr) MPI_COMM_SIZE(MPI_COMM_WORLD,p,ierr) MPI_COMM_RANK(MPI_COMM_WORLD, my_rank,ierr) Parallel Computing - MPI/OpenMP

MPI Commands MPI_SCATTER(A,chunkA,MPI_REAL,A_local,chunkA, MPI_REAL,0,MPI_COMM_WORLD, ierr) MPI_SCATTER(b,chunkb,MPI_REAL,b_local,chunkb, MPI_REAL,0,MPI_COMM_WORLD, ierr) MPI_ALLGATHER(x_local,chunkb,MPI_REAL,x_new, chunkb,MPI_REAL, MPI_COMM_WORLD,ierr) Parallel Computing - MPI/OpenMP

Message Passing • Broadcast L I K+1 J K Broadcast from the Ith processor to all other processors Parallel Computing - MPI/OpenMP

Message Passing • Broadcast - Scatter L I K+1 Chunk Is of Uniform length J K Scatter from the Ith processor to all other processors Parallel Computing - MPI/OpenMP

Message Passing • Gather L I K+1 J K Gather from all other processors to the Ith processor Parallel Computing - MPI/OpenMP

Message Passing • Broadcast - Gather L I K+1 Chunk Is of Uniform length J K Gather from the Ith processor to all other processors Parallel Computing - MPI/OpenMP

Message Passing • Exchange – Based on User Topology Ring Or Linear L I K+1 J K Based on connection topology processors exchange information Parallel Computing - MPI/OpenMP

Parallel Computing - MPI/OpenMP

Just what is a message? Message Content To: You@Address From: Me@Address Parallel Computing - MPI/OpenMP

Just what is a message? Message Content To: You@Address:Attn Payroll From: Me@Address:Attn Payroll Parallel Computing - MPI/OpenMP

Message Structure • To: Address(Rank) • Content(starting array/vector/word address and length) • Tag • Data Type • Error Flag • Communicator We know who we are so From: Address(Rank) is implicit! Parallel Computing - MPI/OpenMP

Messaging • For every SEND we must have a RECEIVE! • The transmission is one-sided the receiver agrees to allow the sender to put the data into a memory location in the receiver process. Parallel Computing - MPI/OpenMP

Message Passing The interconnection topology is called a communicator – Predefined at startup However the user can define his own topology – and should as needed A problem dependent communicator – actually more than one can be defined as needed Parallel Computing - MPI/OpenMP

Program Structure Processor Rank 0 Processor Rank 1 Processor Rank 2 Input Loops Output Input Loops Output Input Loops Output Sync-Barriers Parallel Computing - MPI/OpenMP

MPI Send – Receive Send Processor K Count Receive Processor L Length ≥ Count Each cell holds one MPI_Data_Type MPI_Data_Type Must be the same! MPI_Data_Type Parallel Computing - MPI/OpenMP

MPI Data_Types Type Number of bytes Float 4 Double 8 Integer 4? Boolean 4 Character 1? A bit of care is need between Fortran and C data types Parallel Computing - MPI/OpenMP

#define MPI_BYTE ... #define MPI_PACKED ... #define MPI_CHAR ... #define MPI_SHORT ... #define MPI_INT ... #define MPI_LONG ... #define MPI_FLOAT ... #define MPI_DOUBLE ... #define MPI_LONG_DOUBLE ... #define MPI_UNSIGNED_CHAR ... Parallel Computing - MPI/OpenMP

Parallel Computing - MPI/OpenMP

MPI Data_TYPE Issues • Just what is a data type? • How many bits? • Big Endian versus Little Endian? • What ever is used must be consistent! • Could type conversions be automatic or transparent?? Parallel Computing - MPI/OpenMP

The Grid for 2 by 2 Parallel Computing - MPI/OpenMP

2D 1 Nx/2-1 Nx/2 Nx 1 Ny/2-1 Ny/2 Ny P2 P0 u(i,j) = u(i-1,j)+u(i,j) P3 P1 u(i-1,j) u(i,j) How do the processors see the variables they don’t have? Parallel Computing - MPI/OpenMP

Parallel Programming using MPI

Parallel Programming using MPI

Presentation Transcript

Using the Parallel Universe beyond MPI

Advanced Parallel Programming with MPI

Parallel Programming in MPI part 1

Parallel Programming in MPI part 1

Parallel Programming with MPI: Day 1

Parallel Computing Using MPI

Parallel Programming with MPI- Day 3

Parallel Computing Using MPI

Parallel Programming with MPI

Parallel Programming in MPI Answer

Parallel Computing/Programming using MPI

Parallel Programming with MPI and OpenMP

MPI Parallel Programming

Introduction to Parallel Programming Using MPI (1)

Parallel Programming with MPI- Day 4

Parallel Programming with MPI- Day 2

Parallel Programming in MPI part 2

Parallel Programming in MPI

Introduction to Parallel Programming with MPI

Parallel Programming in MPI part 2

Parallel Programming with MPI and OpenMP

Using the Parallel Universe beyond MPI