1 / 37

Introduction to Parallel Programming with MPI

Introduction to Parallel Programming with MPI. Morris Law, SCID Apr 25, 2018. Multi-core programming. Currently, most CPUs has multiple cores that can be utilized easily by compiling with openmp support

klind
Télécharger la présentation

Introduction to Parallel Programming with MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Parallel Programming with MPI Morris Law, SCID Apr 25, 2018

  2. Multi-core programming • Currently, most CPUs has multiple cores that can be utilized easily by compiling with openmp support • Programmers no longer need to rewrite a sequential code but to add directives to instruct the compiler for parallelizing the code with openmp. • For reference site: http://bisqwit.iki.fi/story/howto/openmp/

  3. Openmp example /* * Sample program to test runtime of simple matrix multiply * with and without OpenMP on gcc-4.3.3-tdm1 (mingw) * compile with gcc –fopenmp * (c) 2009, RajorshiBiswas */ #include <stdio.h> #include <stdlib.h> #include <time.h> #include <assert.h> #include <omp.h> int main(intargc, char **argv) { inti,j,k; int n; double temp; double start, end, run; printf("Enter dimension ('N' for 'NxN' matrix) (100-2000): "); scanf("%d", &n); assert( n >= 100 && n <= 2000 ); int **arr1 = malloc( sizeof(int*) * n); int **arr2 = malloc( sizeof(int*) * n); int **arr3 = malloc( sizeof(int*) * n); for(i=0; i<n; ++i) { arr1[i] = malloc( sizeof(int) * n ); arr2[i] = malloc( sizeof(int) * n ); arr3[i] = malloc( sizeof(int) * n ); } printf("Populating array with random values...\n"); srand( time(NULL) ); for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { arr1[i][j] = (rand() % n); arr2[i][j] = (rand() % n); } } printf("Completed array init.\n"); printf("Crunching without OMP..."); fflush(stdout); start = omp_get_wtime(); for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { temp = 0; for(k=0; k<n; ++k) { temp += arr1[i][k] * arr2[k][j]; } arr3[i][j] = temp; } } end = omp_get_wtime(); printf(" took %f seconds.\n", end-start); printf("Crunching with OMP..."); fflush(stdout); start = omp_get_wtime(); #pragma omp parallel for private(i, j, k, temp) for(i=0; i<n; ++i) { for(j=0; j<n; ++j) { temp = 0; for(k=0; k<n; ++k) { temp += arr1[i][k] * arr2[k][j]; } arr3[i][j] = temp; } } end = omp_get_wtime(); printf(" took %f seconds.\n", end-start); return 0; }

  4. Compiling for openmp support • GCC gcc –fopenmp –o foo foo.c gfortran –fopenmp –o foo foo.f • Intel Compiler icc -openmp –o foo foo.c ifort –openmp –o foo foo.f • PGI Compiler pgcc -mp –o foo foo.c pgf90 –mp –o foo foo.f

  5. What is Message Passing Interface (MPI)? • Portable standard for communication • Processes can communicate through messages. • Each process is a separable program • All data is private

  6. What is Message Passing Interface (MPI)? • This is a library, not a language!! • Different compilers, but all must use the same libraries, i.e. MPICH, LAM, OPENMPI etc. • Use standard sequential language. Fortran, C, C++, etc.

  7. Basic Idea of Message Passing Interface (MPI) • MPI Environment • Initialize, manage, and terminate communication among processes • Communication between processes • Point to point communication, i.e. send, receive, etc. • Collective communication, i.e. broadcast, gather, etc. • Complicated data structures • Communicate the data effectively • e.g. matrices and memory

  8. Serial P1 P2 P3 Process P1 Process 0 Data exchange via interconnection Message Passing P2 Process 1 Process 2 P3 time time Message Passing Model

  9. MPI include file variable declarations Initialize MPI environment Do work and make message passing calls Terminate MPI Environment General MPI program structure #include <mpi.h> int main (int argc, char *argv[]) { int np, rank, ierr; ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&np); /* Do Some Works */ printf(“Helloworld, I’m P%d of %d\n”,rank,np); ierr = MPI_Finalize(); } Helloworld, I’m P0 of 3 Helloworld, I’m P1 of 3 Helloworld, I’m P2 of 3

  10. When Use MPI? • You need a portable parallel program • You are writing a parallel library • You care about performance • You have a problem that can be solved in parallel ways

  11. F77/F90, C/C++ MPI library calls • Fortran 77/90 uses subroutines • CALL is used to invoke the library call • Nothing is returned, the error code variable is the last argument • All variables are passed by reference • C/C++ uses functions • Just the name is used to invoke the library call • The function returns an integer value (an error code) • Variables are passed by value, unless otherwise specified

  12. Types of Communication • Point to Point Communication • communication involving only two processes. • Collective Communication • communication that involves a group of processes.

  13. Implementation of MPI

  14. Browse the sample files • Inside your home directory, the sample zip file, mpi-1.zip has been stored for the laboratory. • Please unzip the file unzip mpi-1.zip • There shall be 4 subdirectories inside mpi-1 ls –l mpi-1 total 20 drwxr-xr-x. 2 morris dean 4096 Nov 27 09:55 00openmp drwxrwxr-x. 2 morris dean 4096 Nov 27 09:41 0-hello drwxrwxr-x. 2 morris dean 4096 Nov 27 09:51 array-pi drwxrwxr-x. 2 morris dean 4096 Nov 27 09:49 mc-pi drwxrwxr-x. 2 morris dean 4096 Nov 27 09:51 series-pi • The 4 subdirectories stored sample mpi-1 programs with README files for this laboratory.

  15. First MPI C program:- hello1.c • Change directory to hello and use an editor, e.g. nano to open hello1.c cd hello nano hello1.c #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int version, subversion; MPI_Init(&argc, &argv); MPI_Get_version(&version, &subversion); printf("Hello world!\n"); printf("Your MPI Version is: %d.%d\n", version, subversion); MPI_Finalize(); return(0); }

  16. First MPI Fortran program:- hello1.f • Use an editor to open hello1.f cd hello nano hello1.f program main include 'mpif.h' integer ierr, version, subversion call MPI_INIT(ierr) call MPI_GET_VERSION(version, subversion, ierr) print *, 'Hello world!' print *, 'Your MPI Version is: ', version, '.', subversion call MPI_FINALIZE(ierr) end

  17. Second MPI C program:- hello2.c • Use an editor to open hello2.c cd hello nano hello2.c #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("Hello world! I am P%d of %d\n", rank, size); MPI_Finalize(); return(0); }

  18. Second MPI Fortran program:- hello2.f • Use an editor to open hello2.f cd hello nano hello2.f program main include 'mpif.h' integer rank, size, ierr call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr) print *, 'Hello world! I am P', rank, ' of ', size call MPI_FINALIZE(ierr) end

  19. Make all files in hello • ‘Makefile’ is written for each example directory. • Run ‘make’ will compile all hello examples make /usr/lib64/mpich/bin/mpif77 -o helloF1 hello1.f /usr/lib64/mpich/bin/mpif77 -o helloF2 hello2.f /usr/lib64/mpich/bin/mpicc -o helloC1 hello1.c /usr/lib64/mpich/bin/mpicc -o helloC2 hello2.c

  20. mpirun hello examples in foreground • You may run the hello examples in foreground by specifying the no of processors and the machinefile with mpirun. e.g. mpirun –np 4 –machinefile machine ./helloC2 Hello world! I am P0 of 4 Hello world! I am P2 of 4 Hello world! I am P3 of 4 Hello world! I am P1 of 4 • machine is the file storing the hostname you want the programs run.

  21. Exercise • Follow the above hello example, mpirun helloC1, helloF1 and helloF2 in foreground with 4 processors in foreground • Change directory to mc-pi, • compile all programs inside using ‘make’ • Run mpi-mc-pi using 2,4,8 processors. • Change directory to series-pi, • compile all programs inside using ‘make’ • Run series-pi using 2,4,6,8 processors. • Note the time difference

  22. Parallelization example: serial-pi.c #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); } 22

  23. Parallelizing serial-pi.c into mpi-pi.c:-Step 1: Adding MPI environment #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, pi, sum = 0.0; MPI_Init(&argc,&argv); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); MPI_Finalize(); }

  24. Parallelizing serial-pi.c into mpi-pi.c :-Step 2: Adding variables to print ranks #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",pi, rank, size); MPI_Finalize(); }

  25. Parallelizing serial-pi.c into mpi-pi.c :-Step 3: divide the workload #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",mypi, rank, size); MPI_Finalize(); }

  26. Parallelizing serial-pi.c into mpi-pi.c :-Step 4: collect partial results #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (rank==0) printf("Est Pi= %f, \n",pi); MPI_Finalize(); }

  27. Compile and run mpi program $ mpicc –o mpi-pi mpi-pi.c $ mpirun -np 4 -machinefile machines mpi-pi

  28. Parallelization example 2: serial-mc-pi.c #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q; time_t now; in = 0; srand(time(&now)); printf("Input no of samples : "); scanf("%ld",&n); for (i=0;i<n;i++) { x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) { in++; } } q = ((double)4.0)*in/n; printf("pi = %.20lf\n",q); printf("rmse = %.20lf\n",sqrt(( (double) q*(4-q))/n)); } 2r

  29. Parallelization example 2: mpi-mc-pi.c 2r #include "mpi.h" #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q,Q; time_t now; int rank,size; MPI_Init(&argc, &argv); in = 0; MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); srand(time(&now)+rank); if (rank==0) { printf("Input no of samples : "); scanf("%ld",&n); } MPI_Bcast(&n,1,MPI_LONG,0,MPI_COMM_WORLD); for (i=0;i<n;i++) { x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) { in++; } } q = ((double)4.0)*in/n; MPI_Reduce(&q,&Q,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); Q = Q / size; if (rank==0) { printf("pi = %.20lf\n",Q); printf("rmse = %.20lf\n",sqrt(( (double) Q*(4-Q))/n/size)); } MPI_Finalize(); }

  30. Compile and run mpi-mc-pi $ mpicc –o mpi-mc-pi mpi-mc-pi.c $ mpirun -np 4 -machinefile machines mpi-mc-pi

  31. distribute your data among the processes information of all processes is used to provide a condensed result by/for one process 1 3 5 7 105 reduction scatter Collective communication (e.g. PROD)

  32. MPI_Scatter Distributes data from root to all other tasks in a group int MPI_Scatter (void *sendbuf, int sendcnt, MPI_Datatype sendtype ,void *recvbuf,int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm )

  33. Data Data a[0] a[0] a[1] a[1] a[2] a[2] a[3] a[3] m m 0 0 10 Processor Processor 1 1 11 2 2 10 10 11 11 12 12 13 13 12 3 3 13 MPI_Scatter • Example: two vectors are distributed in order to prepare a parallel computation of their scalar product MPI_Scatter MPI_Scatter(&a,1,MPI_INT,&m,1,MPI_INT,2,MPI_COMM_WORLD);

  34. Reduces values on all processes to a single value on root. int MPI_Reduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm ) MPI_Reduce

  35. Data Data a b c d a b c d 0 2 0 2 Processor Processor 1 3 1 3 2 5 2 5 19 3 9 3 9 MPI_Reduce • Example: calculation of the global minimum of the variables kept by all processes, calculation of a global sum, etc. MPI_Reduce op:MPI_SUM MPI_Reduce(&b,&d,1,MPI_INT,MPI_SUM,2,MPI_COMM_WORLD);

  36. MPI Datatype

  37. Thanks Please give me some comments https://goo.gl/forms/880NY3kZ9h7ay7r32 Morris Law, SCID (morris@hkbu.edu.hk)

More Related