220 likes | 372 Vues
This presentation discusses a novel algorithm based on Cramer's Rule aimed at reducing communication overhead in parallel computing environments when solving linear systems. It elaborates on the motivation behind the approach, provides a review of the algorithm's structure, and evaluates its numerical accuracy and stability. The implementation details highlight parallel communication methodologies and demonstrate significant performance improvement over traditional methods like LU decomposition. The study suggests future optimization paths to enhance computational efficiency.
E N D
Gabriel cramer (1704-1752) A Condensation-based Low Communication Linear Systems Solver Utilizing Cramer's Rule Ken Habgood, ItamarArelDepartment of Electrical Engineering & Computer ScienceThe University of Tennessee
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Introduction • Mainstream approach: Gaussian Elimination • e.g. LU decomposition • Looking for a lower communication overhead, efficient parallel solver • Targeting an unpopular approach: Cramer’s Rule
LU Communication Pattern Communication for distributed LU decomposition L00 U00 U01 U02 L10 A11 A12 L20 A21 A22 • Three sequential steps • Top left computes and sends • Row and column leads compute and send • Remaining processors factorize their blocks • One-to-one communication • Idle time while leads processing Source: http://www.caam.rice.edu/~timwar/MA471F03/
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Matrix “Mirroring” • Mirroring example • Applying Chio’s condensation yields:
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Accuracy and Numerical Stability • Backward error estimation • Theoretical estimate of rounding error • E matrix depends on two items • The largest element in A or b • The growth factor of the algorithm • Same growth factor as LU-decomposition with partial pivoting
Outline • Motivation & problem statement • Algorithm review • Numerical accuracy & stability • Parallel Implementation • Communication Results Source: http://tridane.faculty.asu.edu
Serial Performance Results support the theoretical ~2.5x complexity ratio
Communication Complexity • Two phases of parallel communication • Parallel Chio’s • Gather Columns • Overall Bandwidth N: Original matrix size, P: number of processors, F: gather columns size
Where’s the Breakeven Point? • Point at which Communication “dead time” matches computational workload • Assuming dC = .05 and N = 1000, the breakeven processors point would be P~142
Closing Thoughts … • Proposed O(N3) Cramer’s Rule method • Significantly lower communications overhead • Many more “broadcasts” than “unicasts” • Comm. function of problem size not processors • Next steps … • Optimize parallel implementation • Spare matrix version