280 likes | 456 Vues
This program demonstrates the parallel implementation of matrix multiplication and numerical integration using the Message Passing Interface (MPI). For matrix multiplication, each processor computes a portion of the result, optimizing workload distribution. Special attention is given to handling non-integer row distributions. The numerical integration section approximates the integral of a continuous function by dividing the area into trapezoids, with each processor calculating a subset to enhance efficiency. This implementation showcases the power of parallel processing for computationally intensive tasks.
E N D
Two Example Parallel Programs using MPI UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007
Matrix Multiplication • Matrices are multiplied together using the dot product of each row of the first matrix with each column of the second matrix B A C = *
Matrix Multiplication • For each value at row i and column j, the result is the dot product of the ith row from A and the jth column from B:
Matrix Multiplication • For each row i from [0..N-1] and each column j from [0..N-1] the value for position [i][j] of the resulting matrix is computed: for (i = 0; i < N; i++) for (j = 0; j < N; j++) { C[i][j] = 0; for (k = 0; k < N; j++) C[i][j] += A[i][k] * B[k][j]; }
Matrix Multiplication • This can be implemented on multiple processors where each processor is responsible for computing a different set of rows in the final matrix • As long as each processor has the parts of the A and B matrix, they can do this without communication C
Matrix Multiplication • If there are N rows and P processors, then each processor is responsible for N/P rows. • Each processor is responsible for the rows from my_rank * N/P up to (but excluding) (my_rank + 1) * N/P 0 * N/P { my_rank = 0 1 * N/P { my_rank = 1 2 * N/P { my_rank = 2 3 * N/P
Matrix Multiplication • This is coded as: for (i = 0 + my_rank * N/P; i < 0 + (my_rank + 1) * N/P; i++) for (j = 0; j < N; j++) { C[i][j] = 0; for (k = 0; k < N; j++) C[i][j] += A[i][k] * B[k][j]; }
Matrix Multiplication • One Problem: What if N/P is not an integer? • The last processor has fewer than N/P rows for which it is responsible. • The code on the previous slide will cause the last processors (or last couple of processors) to compute beyond the last row of the matrix
Matrix Multiplication • This is dealt with as follows: blksz = (int) ceil((float) N / P); for (i = 0 + my_rank * blksz; i < min(N, 0 + (my_rank + 1) * blksz); i++) for (j = 0; j < N; j++) { C[i][j] = 0; for (k = 0; k < N; j++) C[i][j] += A[i][k] * B[k][j]; }
Matrix Multiplication • For example suppose N=13 and P=4. Then: blksz = ceiling(13/4) = 4 Processor 0 : i = [0*4..1*4) = [0..4) Processor 1 : i = [1*4..2*4) = [4..8) Processor 2 : i = [2*4..3*4) = [8..12) Processor 3 : i = [3*4..min(13,4*4))=[12..13)
Matrix Multiplication • The assignment deals with the parallel execution of matrix multiplication
Numerical Integration • Suppose we have a non-negative, continuous function f and we want to compute the integral of f from a to b: y x a b
Numerical Integration • We can approximate the integral by dividing the area into trapezoids and summing the area of the trapezoids y x a b
Numerical Integration • If we use equal width partitions, then each partition is h=(a+b)/n y x a b
Numerical Integration • The area of the ith trapezoid is: y x h a b
Numerical Integration • The area for all trapezoids is:
Numerical Integration Sequential program double f(double x); main (int argc, char *argv[]) { int N, i; double a, b, h, x, integral; char *usage = "Usage: %s a b N \n"; double elapsed_time; struct timeval tv1, tv2;
Numerical Integration Sequential program if (argc < 4) { fprintf (stderr, usage, argv[0]); return -1; } a = atof(argv[1]); b = atof(argv[2]); N = atoi(argv[3]);
Numerical Integration Sequential program gettimeofday(&tv1, NULL); h = (b - a) / N; integral = (f(a) + f(b))/2.0; x = a + h; for (i = 1; i < N; i++) { integral += f(x); x += h; } integral = integral*h; gettimeofday(&tv2, NULL);
Numerical Integration Sequential program elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / 1000000.0); printf ("elapsed_time=\t%lf seconds\n", elapsed_time); printf ("With N = %d trapezoids, \n", N); printf ("estimate of integral from %f to %f = %f\n", N, a, b, integral); }
Numerical Integration Sequential program double f(double x) { return 6*x*x - 5*x; }
Numerical Integration Sequential program $ ./integ 1 3 10000 a = 1.000000, b = 3.000000, N = 10000 elapsed_time= 0.000567 seconds With N = 10000 trapezoids, estimate of integral from 1.000000 to 3.000000 = 32.000000
Numerical Integration Parallel program • Each processor will be responsible for computing the area of a subset of trapezoids y { { { x a b P2 P0 P1
Numerical Integration Parallel program double f (double x); int main(int argc, char *argv[]) { int N, P, mypid, blksz, i; double a, b, h, x, integral, localA, localB, total; char *usage = "Usage: %s a b N \n"; double elapsed_time; struct timeval tv1, tv2; int abort = 0;
Numerical Integration Parallel program a = atof(argv[1]); b = atof(argv[2]); N = atoi(argv[3]); MPI_Bcast (&a, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast (&b, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD); h = (b - a) / N;
Numerical Integration Parallel program blksz = (int) ceil ( ((float) N) / P); localA = a + mypid * blksz * h; localB = min(b, a + (mypid + 1) * blksz * h); integral = (f(localA) + f(localB))/2.0; x = localA + h; for (i = 1; i < blksz && x <= localB; i++) { integral += f(x); x += h; } integral = integral*h;
Numerical Integration Parallel program MPI_Reduce (&integral, &total, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (mypid == 0) printf ("integral = %f\n", total); } float f(float x) { return 6*x*x - 5*x; }
Numerical Integration Parallel program $ mpicc mpiInteg.c -o mpiInteg -lm $ mpirun -nolocal -np 4 mpiInteg 1 3 10000 elapsed_time= 0.001416 seconds integral = 32.000000