CSE5304—Project Proposal Parallel Matrix Multiplication

CSE5304—Project ProposalParallel Matrix Multiplication Tian Mi

An naive version with MPI Result: P1 P2 … Pi … PN

An naive version with MPI Pi  Pi

An naive version with MPI • Processor0 reads input file • Processor0 distributes one matrix • Processor0 broadcasts the other matrix • All processors in parallel • Do the multiplication of each piece of data • Processor0 gathers the result • Processor0 writes result to output file

MPI_Scatter

MPI_Bcast

MPI_Gather

Data generation • Data generation in R with package “igraph” • Integer in range of [-1000, 1000] • Matrix size:

Result • Data size: 1024*1024

Analysis • To see the superlinear speedup • increase the computation, which is not dominant enough • larger matrix and larger integer • However, larger matrix or long integer will also increase the communication time (broadcast, scatter, gather)

Cannon's algorithm--Example • http://www.vampire.vanderbilt.edu/education-outreach/me343_fall2008/notes/parallelMM_10_09.pdf

Cannon's algorithm • Still Implementing and debugging • No result to share at present

Thank you • Questions & Comments?

CSE5304—Project Proposal Parallel Matrix Multiplication

CSE5304—Project Proposal Parallel Matrix Multiplication

Presentation Transcript

sparse matrix-vector multiplication

Matrix Multiplication

Matrix-chain Multiplication

CS 267 Dense Linear Algebra: Parallel Matrix Multiplication

Strassen's Matrix Multiplication

Matrix Multiplication Chains

Parallel Matrix Multiplication and other Full Matrix Algorithms

MATRIX MULTIPLICATION

Matrix-Matrix Multiplication

Matrix Multiplication

MATRIX MULTIPLICATION

Matrix Multiplication in CUDA

CS 240A : Matrix multiplication

Strassen Matrix Multiplication Algorithm

Matrix Multiplication

4-3 Matrix Multiplication

Fast Sparse Matrix Multiplication

Matrix Multiplication

2.2 Matrix Multiplication

2.3 Matrix Multiplication

MMX-accelerated Matrix Multiplication

Parallel Computing-Dense Matrix Multiplication on CUDA