Use of adaptive low-rank approximation in sparse direct solver for 3D Helmholtz problem

Use of adaptive low-rank approximation in sparse direct solver for 3D Helmholtz problem IPGG SB RAS, Russia, Novosibirsk Sergey Solovyev

Agenda • Statement of the problem, key points • Proposed algorithm • Testing results

Statement of the problem, key points

Statement of the problem, key points Acoustic problem: Scalar Helmholtz equation Induction logging problem: equation Elasticity problem 3D, many right hand sides Nested Dissection reordering The Solver Gauss elimination algorithm Low-rank/HSS approximation Adaptive Cross Approximation Intel MKL BLAS/LAPACK Iterative refinement

Proposed algorithm

Algorithm of multifrontal HSS solver Input: init SLAE AX=B • Adaptive cross approximation (ACA) approach • Adaptive balancing between robust and low-rank arithmetic 1) Permute col/rows of matrix A 2) Perform decomposition using standard LU-algorithm and doing low-rank approximation of factors during factorization process References: Chandrasekaran, S., Gu, M., Li X. S., Xia, J. Some fast algorithms for hierarchically semiseparable matrices Xia, J. Efficient Structured Multifrontal Factorization for General Large Sparse Matrices. Rjasanow S. Adaptive Cross Approximation of Dense Matrices. 3) Solve triangular systems (LU factors inversion) 4) Perform iterative refinement if needed Output:

Nested dissection technique of decreasing fill-in in LU-factors … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Number of non-zeros in and is … … … Number of non-zeros in LU is not so dramatic, but still large for big n Factorization of diagonal blocks can be done in parallel … … … … … … … … … … … … … … … … … … … … … … …

Rank properties of LU-factors Sparse blocks … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Dense blocks … … … … … … … … … … … … … … … … … … … … …

Rank properties of LU-factors Diagonal blocks can be compressed by HSS technique Sparse blocks … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Dense blocks … … … … … … … … … … … … … … … Off-diagonal blocks can be approximated by low-rank matrices … … … … … …

Low-rank approximation, internal accuracy n n k k n m m … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … – low-rank approximation for … … … … … … … – internal accuracy … … … … … … … … … … … … … … … … … …

Structured MultifrontalFactorization k+3 15 k+3 15 LU low approximation k+2 13 14 k+2 13 14 Robust LU k+1 Switching level k+1 9 10 11 12 k+1 9 10 11 12 k 8 7 k 4 2 3 5 6 1 8 7 Robust LU 4 2 3 5 6 1 1…k-1 1…k-1 1…k-1 k k+1 k+2 k+3 1…k-1 k k+1 k+2 k+3 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 9 10 11 12 … 9 10 11 12 … … … … … … … … … … … … … … … … 13 14 13 … 14 … … … … … … … … … … … … … … … 15 15 … … … … … … … … … … … … … … … … … … Pic. 2. L pattern for multifrontal factorization Pic. 1. L pattern in robust arithmetic

Statement of Acoustic problem Helmholtz equation: • 3D computational domain • Parallelepipedal grid • Second-order approximation on 7-point stencil • Perfect Matching Layer (PML) boundary condition AX=B • Complex symmetric matrix A (non-Hermitian) • Many right hand sides, B

Results of testing

Testing complex Helmholts n*n*n • Test description: • Computational domain is cube 1000m^3; V=1500 m/s, frequency = 12 Hz • Right hand side – delta function • OMP switch off • Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz • 512 GB RAM n n n • Check behavior of the next characteristics (n=141): • Residual b-Ax; • Memory usage; • Performance; • Investigate performance and memory consumption (n=100,…,300)

Validation testing results. Correlation between relative residual and internal accuracy (iterative refinement switch off)

Memory testing results. HSS: Memory gain up to 2.5 times PARDISO: Number of nnz in L-factors: 3.5*10^9 Correlationbetween memorystorageandinternal accuracy nnz(L) = 3.5*10^9

Performance testing results. HSS: Iterative refinement accuracy equal Factorization time up to 7 times faster Total time up to 2.5 times faster PARDISO: Factorization time 9045sec. Solve time 500sec. (100nrhs)

Performance and memory consumption n • Test description: • eps=, • OMP switch off • n is increased from 100 to 300 n n Table 2. Trend of increasing memory and factorization time on cube domain

Conclusion • The first version of solver support SLAE with symmetric matrices both in real and complex arithmetic. • Effective of algorithm was demonstrated • There are prospects for the further development

Road map • The next steps of developing: • HSS threading (OMP) version Q3’2014 • The first hybrid version of solver (MPI*OMP) Q4’2014, comparison with MUMPS, PARDISO. • Adapt solver for general sparse matrices (METIS optimization)

Acknowledgments • The research described was partially supported by • U.S. Civilian Research & Development Foundation (CRDF) grant RUE1-30034-NO-13 • Russian Foundation for Basic Research (RFBR) grants 14-01-31340, 14-05-31222, 14-05-00049.

Thank you for attention!

Use of adaptive low-rank approximation in sparse direct solver for 3D Helmholtz problem

Use of adaptive low-rank approximation in sparse direct solver for 3D Helmholtz problem

Presentation Transcript

Chess Problem Solver

Dimensionality Reduction For k-means Clustering and Low Rank Approximation

Sparse Linear Solver for Power System Analysis using FPGA

Problem Solver

Low Rank Approximation and Regression in Input Sparsity Time

Reconstruction from Randomized Graph via Low Rank Approximation

Low-Rank Solution of Convex Relaxation for Optimal Power Flow Problem

Project Problem Solver

Recovering low rank and sparse matrices from compressive measurements

The Role of the Problem Solver

Topics in MMSE Estimation for Sparse Approximation

Be the Problem Solver

Amesos Sparse Direct Solver Package

Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions

SuperLU: Sparse Direct Solver

Direct Convex Relaxations of Sparse SVM

Exact Recovery of Low-Rank Plus Compressed Sparse Matrices

Adaptive PU-BEM for Helmholtz problems

Topics in MMSE Estimation for Sparse Approximation

Algebra Math Problem Solver