1 / 22

Parallel Matrix Multiplication Using MPI: Naive Implementation and Performance Analysis

This project proposal outlines a naive version of parallel matrix multiplication leveraging MPI (Message Passing Interface). We detail the step-by-step computation process, including how Processor0 reads input, distributes matrices, broadcasts data, and gathers results from all processors. Data generation is conducted in R, producing matrices of varying sizes (up to 4096x4096) with integer values ranging from -1000 to 1000. We also investigate performance characteristics, including superlinear speedup, while noting the trade-offs introduced by communication overhead.

Télécharger la présentation

Parallel Matrix Multiplication Using MPI: Naive Implementation and Performance Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE5304—Project ProposalParallel Matrix Multiplication Tian Mi

  2. An naive version with MPI Result: P1 P2 … Pi … PN

  3. An naive version with MPI Pi  Pi

  4. An naive version with MPI • Processor0 reads input file • Processor0 distributes one matrix • Processor0 broadcasts the other matrix • All processors in parallel • Do the multiplication of each piece of data • Processor0 gathers the result • Processor0 writes result to output file

  5. MPI_Scatter

  6. MPI_Scatter

  7. MPI_Bcast

  8. MPI_Bcast

  9. MPI_Gather

  10. MPI_Gather

  11. Data generation • Data generation in R with package “igraph” • Integer in range of [-1000, 1000] • Matrix size:

  12. Result • Data size: 1024*1024

  13. Result • Data size: 1024*1024

  14. Result • Data size: 1024*1024

  15. Result • Data size: 2048*2048

  16. Result • Data size: 2048*2048

  17. Result • Data size: 2048*2048

  18. Result • Data size: 4096*4096

  19. Analysis • To see the superlinear speedup • increase the computation, which is not dominant enough • larger matrix and larger integer • However, larger matrix or long integer will also increase the communication time (broadcast, scatter, gather)

  20. Cannon's algorithm--Example • http://www.vampire.vanderbilt.edu/education-outreach/me343_fall2008/notes/parallelMM_10_09.pdf

  21. Cannon's algorithm • Still Implementing and debugging • No result to share at present

  22. Thank you • Questions & Comments?

More Related