1.09k likes | 1.27k Vues
Introduction to Parallel Programming Using MPI (1). Jun Ni, Ph.D. Associate Professor Department of Radiology, College of Medicine Information Technology Services The University of Iowa. Message Passing Interface I. Outline Introduction to Message passing library (MPI)
E N D
Introduction to Parallel Programming Using MPI (1) Jun Ni, Ph.D. Associate Professor Department of Radiology, College of Medicine Information Technology Services The University of Iowa CIVICSE-HPC002
Message Passing Interface I • Outline • Introduction to Message passing library (MPI) • Basics of MPI implementation (blocking communication) • Basic input and output data • Basic nonbloking communication
Introduction • Basic concept of message passing • Most commonly used method of programming in distributed-memory MIMD systems • In message passing, the processes coordinate their activities by explicitly sending and receiving messages
Introduction to MPI • Message Passing Interface (MPI) • Commonly used message passing library, which can statically allocate processes (number of processes is set at the beginning of the program execution, and no additional processes are created during execution). • Each processes is assigned a unique integer rank in the rang 0, 1, … p-1 (p is the total number of processes defined) • Basically, one can write a single program and execute on different processes (SPMD)
Introduction to MPI • Message Passing Interface (MPI) • The selective execution is based on the conditional branch within the source code. • Buffering in communication • Blocking and non-blocking communication
Introduction to MPI • Parallel computing utility library of subroutine/functions, not a independent language • MPI subroutines and functions can be called from Fortran and C, respectively • Compiled with FORTRAN or C compilers • MPI-1 doesn’t support F90, but MPI-2 does support F90 and C++
Introduction to MPI (cont.) • Why people use MPI? • speed up computation • big demand of CPU time and more memory • more portable and scalable rather than using automatic "parallelizer" , which might not work • good for distributed memory computers, such as distributed clusters, network based computers or workstations
Introduction to MPI (cont.) • Why people are afraid of MPI? • more complicated than serial computing • more complicated to master the technique • synchronization lost • amount of time required to convert serial code to parallelized code
Introduction to MPI (cont.) • Alternative ways? • data parallel model using high level language such as HPF • advanced library (or interface), such as (The Portable, Extensible Toolkit for Scientific Computation (PETSC) • Java multithread computing on internet based distributed computation
Basics of MPI • MPI header (library) file should be included in user’s FORTRAN or C codes. The library files contains definitions of constants, prototypes. #include "mpif.h" for FORTRAN code #include "mpi.h" for C code
Basics of MPI • MPI is initiated by calling MPI_Init() first before invoking any other MPI subroutines or functions. • MPI processing ends with a call MPI_Finalize().
Basics of MPI • Only difference between MPI subroutines (for FORTRAN) and MPI functions (for C) is the error reporting flag. • In FORTRAN, it is returned as the last member of the subroutine's argument list. In C, the integer error flag is returned through the function return value.
Basics of MPI • Consequently, MPI FORTRAN subroutines always contain one additional variable in the argument list than the C counterpart.
Basics of MPI (cont.) • C's MPI function names start with MPI_ followed by a character string with the leading character in upper case letter while the rest in lower case letters • FORTRAN subroutines bear the same names but are case-insensitive. • On SGI's Origin20000 (NCSA), parallel I/O is supported.
Compilation and Execution (f77) • To compile and execute a f77 (or f90) code without MPI • f77 -o example example.f • f90 –o example example.f • /bin/time example • Or • time example
To compile and execute a f77 (or f90) code with MPI f77 -o example1_1 example1_1.f –lmpi g77 -o example1_1 example1_1.f -lmpi f90 -o example1_1 example1_1.f –lmpi mpif77 -o example1_1 example1_1.f (our cluster) mpif90 -o example1_1 example1_1.f (our cluster) bin/time mpirun -np 4 example1_1 time mpirun -np 4 example1_1
To compile and execute a C code without MPI • gcc -o exampleC exampleC.c -lm • Or • cc -o exampleC exampleC.c -lm • exampleC
To compile and execute a C code with MPI • cc –o exampleC1_1 exampleC1_1.c –lm –lmpi • gcc –o exampleC1_1 exampleC1_1.c –lm –lmpi • mpicc exampleC1_1.c (our cluster) • Execution: • bin/time mpirun -np 10 exampleC1_1 • time mpirun -np 10 exampleC1_1
Basic communication among processes • Example 0: basic communication between processes • p, multiple processes: starting from 0 to p-1 • process 0 receive message from other processes message process 1 process 0 process 2 process 3
Learning MPI by Examples • Example 0: mechanism • system copies the executable code to each processes • each process begins execution of the copied executable code, simultaneously • different processes can execute different statements by branching within the program based on their ranks (this form of MIMD programming is called single-program multiple-data (SPMD) programming)
/*************************************************** greetings.c -- greetings program Send a message from all processes with rank != 0 to process 0. Process 0 prints the messages received. Input: none. Output: contents of messages received by process 0. ***************************************************/ #include <stdio.h> #include <string.h> #include "mpi.h" include MPI library
Passing command-line parameters to main function main(int argc, char* argv[]) { int my_rank; /* rank of process */ int p; /* number of processes */ int source; /* rank of sender */ int dest; /* rank of receiver */ int tag = 0; /* tag for messages */ char message[100]; /* storage for message */ MPI_Status status; /* return status for receive */ /* Start up MPI */ MPI_Init(&argc, &argv);
Obtain the rank number /* Find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); printf("my_rank is %d\n",my_rank); /* Find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p); printf("p, the toal number of processes: %d\n",p); if (my_rank != 0) /* other processes, but not process 0 */ { /* Create message */ sprintf(message, "Greetings from process %d!", my_rank); dest = 0; /* destination to where the message send
/* Use strlen+1 so that '\0' gets transmitted */ MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else /* my_rank == 0 , process 0*/ { for (source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } }
Learning MPI by Examples /* Shut down MPI */ MPI_Finalize(); } /* main */ Commands: % mpicc greetings.c % mpirun -np 8 a.out
Result: mpicc greetings.c mpirun -np 8 a.out my_rank is 3 p, the toal number of processes: 8 my_rank is 4 p, the toal number of processes: 8 my_rank is 0 p, the toal number of processes: 8 my_rank is 1 p, the toal number of processes: 8 Greetings from process 1! my_rank is 2
p, the toal number of processes: 8 my_rank is 7 p, the toal number of processes: 8 Greetings from process 2! Greetings from process 3! my_rank is 5 p, the toal number of processes: 8 Greetings from process 4! Greetings from process 5! my_rank is 6 p, the toal number of processes: 8 Greetings from process 6! Greetings from process 7!
c greetings.f -- greetings program c c Send a message from all processes with rank != 0 to process 0. c Process 0 prints the messages received. c c Input: none. c Output: contents of messages received by process 0. c c Note: Due to the differences in character data in Fortran and char c in C, their may be problems in MPI_Send/MPI_Recv c • Example 0: (in Fortran)
program greetings c include 'mpif.h' c integer my_rank integer p integer source integer dest integer tag character*100 message character*10 digit_string integer size integer status(MPI_STATUS_SIZE) integer ierr c
c function integer string_len c call MPI_Init(ierr) c call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierr) call MPI_Comm_size(MPI_COMM_WORLD, p, ierr) c if (my_rank.ne.0) then call to_string(my_rank, digit_string, size) message = 'Greetings from process ! ' // digit_string(1:size) +// dest = 0 tag = 0 call MPI_Send(message, string_len(message), MPI_CHARACTER, dest, tag, MPI_COMM_WORLD, ierr) else
do 200 source = 1, p-1 tag = 0 call MPI_Recv(message, 100, MPI_CHARACTER, source, + tag, MPI_COMM_WORLD, status, ierr) call MPI_Get_count(status, MPI_CHARACTER, size, ierr) write(6,100) message(1:size) 100 format(' ',a) 200 continue endif c call MPI_Finalize(ierr) stop end c c
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccc c c Converts the integer stored in number into an ascii c string. The string is returned in string. The number of c digits is returned in size. subroutine to_string(number, string, size) integer number character *(*) string integer size character*100 temp integer local integer last_digit integer i local = number i = 0
c strip digits off starting with least significant c do-while loop 100 last_digit = mod(local,10) local = local/10 i = i + 1 temp(i:i) = char(last_digit + ichar('0')) if (local.ne.0) go to 100 size = i c reverse digits do 200 i = 1, size string(size-i+1:size-i+1) = temp(i:i) 200 continue c return end
c to_string c c ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc c Finds the number of characters stored in a string c integer function string_len(string) character*(*) string c character*1 space parameter (space = ' ') integer i c i = len(string)
c while loop 100 if ((string(i:i).eq.space).and.(i.gt.1)) then i = i - 1 go to 100 endif c if ((i.eq.1).and.(string(i:i).eq.space)) then string_len = 0 else string_len = i endif c return end c end of string_len
mpif77 greetings.f mpirun –np 8 a.out
Not necessary to call MPI_Init function at the beginning of your code. • Not necessary to call MPI_finalize function athe the end of your code. • MPI section should be inserted only into wherever you need the code to be in parallel.
Numerical Integration • Example 1: numerical integration using mid-point method • mathematical problem • numerical method • serial programming and parallel programming
Problem: • Testing integration of cos(x) from 0 to p/2
Example of C serial program: /* serial.c -- serial version of trapezoidal rule * * Calculate definite integral using trapezoidal rule. * The function f(x) is hardwired. * Input: a, b, n. * Output: estimate of integral from a to b of f(x) * using n trapezoids. * * See Chapter 4, pp. 53 & ff. in PPMPI. */ #include <stdio.h>
main() { float integral; /* Store result in integral */ float a, b; /* Left and right endpoints */ int n; /* Number of trapezoids */ float h; /* Trapezoid base width */ float x; int i; float f(float x); /* Function we're integrating */ printf("Enter a, b, and n\n"); scanf("%f %f %d", &a, &b, &n);
h = (b-a)/n; integral = (f(a) + f(b))/2.0; x = a; for (i = 1; i <= n-1; i++) { x = x + h; integral = integral + f(x); } integral = integral*h; printf("With n = %d trapezoids, our estimate\n", n); printf("of the integral from %f to %f = %f\n", a, b, integral); }
float f(float x) { float return_val; /* Calculate f(x). Store calculation in return_val. */ return_val = x*x; return return_val; }
Example of serial code in Fortran: C serial.f -- calculate definite integral using trapezoidal rule. C C The function f(x) is hardwired. C Input: a, b, n. C Output: estimate of integral from a to b of f(x) C using n trapezoids. C C See Chapter 4, pp. 53 & ff. in PPMPI. C PROGRAM serial INCLUDE 'mpif.h' real integral real a real b
integer n real h real x integer i C real f C print *, 'Enter a, b, and n' read *, a, b, n C h = (b-a)/n integral = (f(a) + f(b))/2.0 x = a do 100 i = 1 , n-1 x = x + h integral = integral + f(x) 100 continue
integral = integral*h C print *,'With n =', n,' trapezoids, our estimate' print *,'of the integral from ', a, ' to ',b, ' = ' , integral end C C****************************************************** real function f(x) real x C Calculate f(x). Store calculation in return_val. C f = x*x return end
To compile and execute serial.f • Result: • g77 -o serial serial.f • example The result = 1.000000 real 0.021 user 0.002 sys 0.013
Parallel programming with MPI blocking Send/Receive • implement-dependent because using assignment of inputs • Using the following MPI functions • MPI_Init and MPI_Finalize • MPI_Comm_rank • MPI_Comm_size • MPI_Recv • MPI_Send