Download Presentation
## Solving Large Sparse Linear Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Solving Large Sparse Linear Systems**Ruth Shaw Dept. of CS & Appl. Stats. UNB Saint John**Outline**• Solving Linear Systems • Iterative Methods • Jacobi Method • PDE Example • Designing the Parallel Algorithm**Solving Linear Systems**Ax = b**Iterative Methods**• Jacobi • Gauss-Seidel • successive over-relaxation (SOR) • conjugate gradient method • biconjugate gradient method • generalized minimal residual method**Jacobi Method**Convert Ax = b into the form (L + D + U)x = b Dx = (-L - U)x + b x = D-1(-L - U)x + D-1b x = Cx + d Generate a sequence of approximations x (k+1) = Cx (k) + d**Jacobi Method**Consider a small 3 x 3 system of equations: 2x1 – x2 + x3 = -1 x1 + 2x2 – x3 = 6 x1 – x2 + 2x3 = -3 This is converted to: x1 = [-1 – (x2 - x3)] / 2 x2 = [ 6 – (- x1 + x3)] / 2 x3 = [-3 – (x1 + x2)] / 2**Jacobi Method**Successive iterates are generated as follows: x1 (k+1) = [-1 – (x2 (k)- x3 (k))] / 2 x2(k+1) = [ 6 – (- x1 (k) + x3 (k))] / 2 x3(k+1) = [-3 – (x1 (k) + x2 (k))] / 2 The initial guess (k = 0) to the solution vector is usually taken to be zero**Jacobi Method**Approximation to solution after 13 iterations: x1 = 1.0002, x2 = 2.0001, x3 = -0.9997 Exact Solution: x1 = 1, x2 = 2, x3 = -1 Max Error: 3.0 x 10-4**PDE Example**• Finite-Difference Solution of a PDE • typically involves solving a linear system with a large banded coefficient matrix • based on subdividing the domain of the problem with a mesh of discrete points … derivatives are replaced with difference quotients**PDE Example**The steady-state temperature distribution in a plate, 0 ≤ x≤ 1, 0 ≤ y ≤ 1, can be described by the Poisson equation: uxx + uyy = f(x, y) If there are no heat sources (f(x, y) = 0) the equation is know as Laplace’s equation.**PDE Example**Consider the two-dimension Laplace equation uxx + uyy = 0, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 with boundary conditions u(0, y) = y2, u(1, y) = 1, 0 < y < 1 u(x, 0) = x2, u(x, 1) = 1, 0 < x < 1**PDE Example**We define a mesh in the (x, y) plane: xi = ih, i = 0, 1, …, n, h = ∆x = 1/n yi = jk, j = 0, 1, …, m, k = ∆y = 1/m Denote the value of u(x, y) at the point (xi, yi) as uij**PDE Example**Replacing the second derivatives in the Laplace equation uxx + uyy = 0 by centered differences gives a system of algebraic equations for the function values at the mesh points (1/k2)(ui,j+1 – 2uij + ui,j-1) + (1/h2)(ui+1,j – 2uij + ui-1,j) = 0**PDE Example**Setting ∆x = ∆ y gives (ui,j+1 – 2uij + ui,j-1) + (ui+1,j – 2uij + ui-1,j) = 0 Which results in ui,j+1 – 4uij + ui,j-1 + ui+1,j + ui-1,j = 0**PDE Example**Setting n =3 gives the following system of equations: u12 – 4u11 + u10 + u21 + u01 = 0 u13 – 4u12 + u11 + u22 + u02 = 0 : u10 u01**PDE Example**The values outside the mesh represent the boundaries of the plate**PDE Example**Denote the boundary values by bk , k = 1,…,4n and the unknown mesh values by ui , i = 1,…,n*n**PDE Example**Moving the boundary values to the right of each equation gives us the following linear system: 4u1 – u2 – u4 = b1 + b12 - u1 + 4u2 – u3 – u5 = b2 – u2 + 4u3 – u6 = b3 + b4 - u1 + 4u4 – u5 – u7 = b11 – u2 – u4 + 4u5 – u6 – u8 = 0 - u3 – u5 + 4u6 – u9 = b5 - u4 + 4u7 – u8 = b9 + b10 - u5 – u7 + 4u8 – u9 = b8 - u6 – u8 4u9 = b6 + b7**PDE Example**Assigning values to the boundaries then gives us the following graph**PDE Example**Which now represents the following system: 4u1 – u2 – u4 = 0.3 - u1 + 4u2 – u3 – u5 = 0.25 – u2 + 4u3 – u6 = 1.9 - u1 + 4u4 – u5 – u7 = 0.5 – u2 – u4 + 4u5 – u6 – u8 = 0 - u3 – u5 + 4u6 – u9 = 1.0 - u4 + 4u7 – u8 = 1.8 - u5 – u7 + 4u8 – u9 = 1.0 - u6 – u8 + 4u9 = 2.0**PDE Example**For the Jacobi method, this system is converted to the following set of equations: u1 = [0.3 – (u2 + u4)] / 4 u2 = [0.25 – (u1 + u3 + u5)] / 4 : u9 = [2.0 – (u6 + u8)] / 4**PDE Example**The Jacobi iteration scheme is then given by: u1(k+1) = [0.3 – (u2(k) + u4(k) )] / 4 u2(k+1) = [0.25 – (u1(k) + u3(k) + u5(k))] / 4 : u9(k+1) = [2.0 – (u6(k) + u8(k))] / 4**PDE Example**After 38 iterations Max error 8.9 x 10-7**Designing the Parallel Algorithm**There are essentially two approaches to designing a parallel program: data-parallel approach domain decomposition … divide the data into pieces control-parallel approach functional decomposition … divide the computation into pieces**Designing the Parallel Algorithm**After deciding on the decomposition of the algorithm, the communication pattern needs to be determined => minimize processor communication**Parallel Jacobi**Recall our PDE example for n = 3: u1(k+1) = [0.3 – (u2(k) + u4(k) )] / 4 u2(k+1) = [0.25 – (u1(k) + u3(k) + u5(k))] / 4 u3(k+1) = [1.9 – (u2(k) + u6(k))] / 4 u4(k+1) = [0.5 – (u1(k) + u5(k) + u7(k))] / 4 u5(k+1) = [0 – (u2(k) + u4(k) + u6(k) + u8(k))] / 4 u6(k+1) = [1.0 – (u3(k) + u5(k) + u9(k))] / 4 u7(k+1) = [1.8 – (u4(k) + u8(k))] / 4 u8(k+1) = [1.0 – (u5(k) + u7(k) + u9(k))] / 4 u9(k+1) = [2.0 – (u6(k) + u8(k))] / 4**Parallel Jacobi**Assume p processes where n ≥ p … An obvious approach is to have each process calculate the values in a subvector of the solution vector => for simplicity, assume n/p is even**Parallel Jacobi**Assume p = 3: P0: u1(k+1) = [0.3 – (u2(k) + u4(k) )] / 4 u2(k+1) = [0.25 – (u1(k) + u3(k) + u5(k))] / 4 u3(k+1) = [1.9 – (u2(k) + u6(k))] / 4 P1: u4(k+1) = [0.5 – (u1(k) + u5(k) + u7(k))] / 4 u5(k+1) = [0 – (u2(k) + u4(k) + u6(k) + u8(k))] / 4 u6(k+1) = [1.0 – (u3(k) + u5(k) + u9(k))] / 4 P2: u7(k+1) = [1.8 – (u4(k) + u8(k))] / 4 u8(k+1) = [1.0 – (u5(k) + u7(k) + u9(k))] / 4 u9(k+1) = [2.0 – (u6(k) + u8(k))] / 4**Parallel Jacobi**Partition coefficient matrix A using block-row chunkA = (n*n) / p call MPI_SCATTER(A, chunkA, MPI_REAL, A_local, chunkA, MPI_REAL, 0, PI_COMM_WORLD, ierr) 4 -1 0 -1 0 0 0 0 0 -1 4 -1 0 -1 0 0 0 0 0 -1 4 0 0 -1 0 0 0 -1 0 0 4 -1 0 -1 0 0 0 -1 0 -1 4 -1 0 -1 0 0 0 -1 0 -1 4 0 0 -1 0 0 0 -1 0 0 4 -1 0 0 0 0 0 -1 0 -1 4 -1 0 0 0 0 0 -1 0 -1 4**Parallel Jacobi**Scatter the RHS vector b chunkb = n/p call MPI_SCATTER(b, chunkb, MPI_REAL, b_local, chunkb, MPI_REAL, 0, MPI_COMM_WORLD, ierr) 0.3 0.25 1.9 0.5 0 1.0 1.8 1.0 2.0**Parallel Jacobi**Each processor calculates n/p elements of the solution vector P0: x_local(1), x_local(2), x_local(3) P1: x_local(4), x_local(5), x_local(6) P2: x_local(7), x_local(8), x_local(9)**Parallel Jacobi**After each iteration use MPI_Allgather to give all processes a copy of the updated solution vector call MPI_ALLGATHER(x_local, chunkb, MPI_REAL, x_new, chunkb, MPI_REAL, MPI_COMM_WORLD, ierr) P0 P2 P1**Parallel Jacobi**Determine if the norm of the solution vector has reached the set tolerance option 1: all processes compute global norm option 2: each process computes local norm and then use MPI_Allreduce to give all processes a copy of the global norm**Parallel Jacobi**Speedup = serial time / parallel time**Parallel Jacobi**Speedup = serial time / parallel time**References**Iterative Methods for Solving Linear Systems http://www.quantlet.com/mdstat/scripts/csa/html/node38.html Applied Numerical Analysis Using MATLAB, 2nd Ed., by Laurene Fausett, Pearson Prentic-Hall, 2008 Parallel programming in C with MPI and OpenMP by Michael Quinn, McGraw-Hill, 2004 Parallel Programming with MPI by Peter Pacheco, Morgan Kaufmann, 1997