Efficient Parallelization of Backsolve Algorithms in Matrix Computations

PARALLELIZATION OF MULTIPLE BACKSOLVES Project #2 James Stanley April 25, 2002

PARALLELIZATION OF MULTIPLE BACKSOLVESProject #2 • Introduction (Backsolve) • Challenge • Example (m = 5) • Problem Description • Solution Technique • Parallel Implementation • Results

Introduction (Backsolve) If R is an upper triangular matrix, a backsolve is a solution of the equation, Rx=b Where b is a vector of length m. The formulas for the solution are

bm ___ mth equation: rmmxm = bm => xm= rmm Introduction (Backsolve) cont. from mth equation m-1th equation: rm-1,m-1xm-1 + rm-1,mxm = bm-1 bm-1- rm-1,* xm __________________ => xm-1 = rm-1

Introduction (Backsolve) cont. rii,xi + ri,i+1xi+1 + …+ rim,xi = bi , for 0  i  m-1 ith equation: =>

Challenge Storage: To avoid storing zeros, store n(n+1)/2 nonzero elements of R1in a 1-Dimensional array by rows, and the m(m+1)/2 Nonzero elements of R2 in a 1-Dimensional array by rows.

Example Suppose m = 5, then or,

Example cont. Solving for the xi’s provides in memory 

Problem Description Given RHS matrix H to solve for the nxmunknown matrix Y : R1YR2T = H Where R1 is a square upper triangular matrix of order nxn and R2is a square upper triangular matrix of order mxm.

(2) Let Z =R1Y • Then R1YR2T = ZR2T • ZR2T = H Solution Technique  R2ZT = HT Parallel Solution of (1) and (2): (1) R2T=HT = (h1,h2,…,hn) (2) R1Y =Z = (z1,z2,…,zm) 1rst solve (1) for the mxn matrix ZT. 2nd take the Transpose of ZT to get Z. 3rd solve (2) for the nxm solution matrix Y.

Parallel Implementation • Generate HT and R2, R1 on Process 0, using a Random Number Generator. • Move all of R2 to all processes with the MPI_BCAST. (If there are p processors, then make sure n/p and m/p are integers. • Ship the n/p of the rows of H to each process with MPI_Scatter. • On each process solve n/p equations for local Z. • Ship all of the columns of Z using MPI_Gather to process 0. • Perform the transpose of Z on process 0. • Ship m/p of the rows of ZT to each process with MPI_Scatter. • On each process solve m/p equations for local Y. • Ship all of the m/p columns of local Y to process 0 and print the solution on process 0. Denotes communication time Denotes computation time

Results

Efficient Parallelization of Backsolve Algorithms in Matrix Computations

Efficient Parallelization of Backsolve Algorithms in Matrix Computations

Presentation Transcript

Parallelization: Conway’s Game of Life

Parallelization of Expert System

Loop Parallelization

Trend Towards Parallelization

Parallelization

Massive Parallelization of SAT Solvers

Cooperative Parallelization

Parallelization of ‘Sieve of Eratosthenes ’ Algorithm

HW5: Parallelization

Automatic Parallelization

Parallelization of FFT in AFNI

Parallelization of urbanSTREAM

Parallelization of RHSEG

Parallelization of RHSEG

Scalable Parallelization of CPAIMD using Charm++

Parallelization Strategies

Potential of Dynamic Binary Parallelization

Shared Memory Parallelization

Basic Loop Parallelization

Reasons for parallelization

Parallelization of An Example Program