FSAIPACK: Parallel Preconditioning for Sparse Linear Systems using Factorized Sparse Approximate Inverses
FSAIPACK is a software package designed for high-performance parallel preconditioning, specifically for solving symmetric positive definite (SPD) linear systems. The package implements the Factorized Sparse Approximate Inverse (FSAI) approach, which constructs effective parallel preconditioners with an independent selection of sparsity patterns. Codified in FORTRAN90 and utilizing OpenMP for shared memory architectures, FSAIPACK allows for flexible user-defined strategies. This enhances the performance of conjugate gradient-like methods in complex numerical applications, making large, sparse linear systems easier to solve.
FSAIPACK: Parallel Preconditioning for Sparse Linear Systems using Factorized Sparse Approximate Inverses
E N D
Presentation Transcript
Department ICEA A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallelpreconditioning of linear systems Massimiliano Ferronato, Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto Sparse Days 2014 June 5-6
Outline • Introduction: preconditioning techniques for high performance computing • Approximate inverse preconditioning for Symmetric Positive Definite matrices: the FSAI-based approach • FSAIPACK: a software package for high performance FSAI preconditioning • Numerical results • Conclusions and future work
Introduction Preconditioningtechniquesfor high performance computing • The implementation of large models is becoming quite a popular effort in several applications, with the the use of parallel computational resources almost mandatory • One of the most expensive and memory-consuming tasks in any numerical application is the solution of large and sparse linear systems • Conjugate Gradient-like solution methods can be efficiently implemented on parallel computers provided that an effective parallel preconditioner is available • Algebraic preconditioners: robust algorithms that generate a preconditioner from the knowledge of the system matrix only, independently of the problem it arises from • Most popular and successful classes of preconditioners: • Incomplete LU factorizations • Approximate inverses • Algebraic multigrid
Introduction Preconditioningtechniquesfor high performance computing • For parallel computations the Factorized Sparse Approximate Inverse (FSAI) approach is quite attractive, as it is «naturally» parallel • FSAIPACK: a parallel software package for high performance FSAI preconditioning in the solution of Symmetric Positive Definite linear systems • Collection of routinesthatimplementseveraldifferentexistingmethods for computing an FSAI-basedpreconditioner • Allows for a veryflexibleuser-specifiedconstruction of a parallel FSAI preconditioner • General purpose package easy to be includedas an externallibraryintoanyexisting code • Currentlycoded in FORTRAN90 with Open MP directives for sharedmemorymachines • Freelyavailable online atwww.dmsa.unipd.it/~janna/software.html
The FSAI-basedapproach FSAI definition • Factorized Sparse Approximate Inverse (FSAI): an almost perfectly parallel factored preconditioner for SPD problems [Kolotilina & Yeremin, 1993] : with G a lower triangular matrix such that: over the set of matrices with a prescribed lower triangular sparsity pattern SL, e.g. the pattern of A or A2, where L is the exact Cholesky factor of A L is not actually required for computing G! • Computed via the solution of n independent small dense systems and applied via matrix-vector products • Nice features: (1) ideally perfect parallel construction and application of the preconditioner; (2) preservation of the positive definiteness of the native matrix
The FSAI-basedapproach FSAI definition • The key property for the quality of any FSAI-based parallel preconditioner is the selection of the sparsity pattern SL • Historically, the first idea to build SL is to define it a priori, but more effective strategies can be developed dynamically selecting the position of the non-zero entries in SL • Static FSAI: SL is defined a priori, e.g., as the pattern of Ak, possibly after a sparsification of A[Huckle 1999; Chow 2000, 2001] • Dynamic FSAI: SL is defined dynamically during the computation of G using some optimization algorithm [Huckle 2003; Janna & Ferronato, 2011] • Recurrent FSAI: the FSAI factor G is defined as the product of several factors, computed either statically or dynamically [Wang & Zhang 2003; Bergamaschi & Martinez 2012] • Post-filtration: it is generally recommended to apply an a posteriori sparsification of G dropping the smallest entries [Kolotilina & Yeremin, 1999]
FSAIPACK Static FSAI construction • FSAIPACK is a software library that collects several different ways for computing an FSAI preconditioner in a shared memory environment and allows for combining the construction techniques into original user-specified strategies • Assuming that SL is given, it is possible to compute G • Static FSAI: denote by Pi the set of column indices belonging to the i-th row of SL Compute the vector by solving the mi×mi linear system: and scale to obtain the dense i-th row of G:
FSAIPACK Static pattern generation • The non-zero pattern for the Static FSAI computation can be generated with the aid of the following recurrence • Static pattern generation:SL is the lower triangular pattern of a power k of A or of a sparsified A with: and: • User-specified parameters needed: k (integer), t (real)
FSAIPACK Dynamic FSAI construction • For ill-conditioned problems high values of k may be needed to properly decrease the iteration count, or even to allow for convergence, and the preconditioner construction and application can become quite heavy • A most efficient option relies on selecting the pattern dynamically by an adaptive procedure which uses somewhat the “best” available positions for the non-zero coefficients • The Kaporin conditioning number b of an SPD matrix is defined as: where: and iff
FSAIPACK Dynamic FSAI construction • The Kaporin conditioning number of an FSAI preconditioned matrix reads [Janna & Ferronato 2011; Janna et al. 2014] : where yi depends on the non-zero entries in the i-th row of G: • The scalar yi is a quadratic form of A in • Idea fo generating the pattern dynamically: for each row select the non-zero positions in providing the largest decrease in the yi value • Compute the gradient of yi with respect to and retain the positions containing the largest entries • The procedure can be iterated until either a maximum number of iterations or some exit tolerance is met
FSAIPACK Dynamic FSAI construction • Dynamic construction of FSAI by an adaptive pattern generation row-by-row: • Adaptive FSAI:SL is built dynamically and G immediately computed, choosing s entries per step, with a maximum number of kmax steps, into the i-th row such that: until the exit tolerance e is achieved: • User-specified parameters needed: kmax (integer), s (integer), e (real) • The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible
FSAIPACK Dynamic FSAI construction • As yi is a quadratic form of A in the i-th row of G, it can be minimized by using a gradient method • This gives rise to an iterative construction of SL and G, another kind of Dynamic FSAI • Iterative FSAI: the i-th row of G is computed by minimizing yi with an incomplete Steepest Descent method: retaining the s largest entries per row for kiter iterations until the exit tolerance e is achieved • User-specified parameters needed: kiter (integer), s (integer), e (real) • The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible • The use of an inner preconditioner M-1 is also allowed
FSAIPACK Recurrent FSAI construction • Implicit construction of the sparsity pattern SL, writing the FSAI preconditioner as a product of factors: • Recurrent FSAI: the final factor G is obtained as the product of nl factors: where Gk is the k-level preconditioning factor for: with A0=A and G0=I. Even if each factor is very sparse and computationally very cheap, the resulting preconditioner is actually very dense and never formed explicitly:
FSAIPACK Numericalresults • Analysis of the properties of each single method on a structural test case (size = 190,581, no. of non-zeroes: 7,531,389): • Static FSAI
FSAIPACK Numericalresults • Adaptive FSAI
FSAIPACK Numericalresults • Iterative FSAI
FSAIPACK Numericalresults • Recurrent FSAI
FSAIPACK Numericalresults • Comparison between the different methods on a Linux Cluster with 24 processors: • The most efficient option is combining the different methods so as to maximize the pros and minimize the cons • FSAIPACK implements all the methods for building a FSAI-based preconditioner following a user-specified strategy that can be prescribed by a pseudo-programming language
FSAIPACK Numericalresults • Examples and numerical results (Linux Cluster, 24 processors) EMILIA (reservoir mechanics): size = 923,136 non-zeroes = 41,005,206 Note: Post-filtration is used anyway
FSAIPACK Numericalresults STOCF (porous media flow): size = 1,465,137 non-zeroes = 21,005,389 Note: Post-filtration is used anyway
FSAIPACK Numericalresults MECH (structural mechanics): size = 1,102,614 non-zeroes = 48,987,558 Note: Post-filtration is used anyway
FSAIPACK Numericalresults • Example of strategy prescribed using the pseudo-programming language > MK_PATTERN [ A : patt ] -t -k 1e-2 2 > STATIC_FSAI [ A, patt : F ] > TRANSP_FSAI [ F : Ft ] > PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8 > ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3 > POST_FILT [ A : F ] -t 0.01 > TRANSP_FSAI [ F : Ft ] > APPEND_FSAI [ F, Ft : PREC ] Easy management also of complex strategies
FSAIPACK Numericalresults • FSAIPACK scalability on the largest example • Test on an IBM-Bluegene/Q node equipped with 16 cores • Between 16 and 64 threads the ideal profile is flat because all physical cores are saturated • Using more threads than cores is convenient as we hide memory access latencies
Conclusions Results… • FSAI-based approaches are attractive preconditioners for an efficient solution of SPD linear systems on parallel computers • The traditional static pattern generation is fast and cheap, but can give rise to poor preconditioners • The dynamic pattern generation can improve considerably the FSAI quality, especially in ill-conditioned problems, but its cost typically increases quite rapidly with the density of the preconditioner • FSAIPACK is a high performance software package that has been implemented for building a FSAI-based preconditioner using a user-specified strategy that combines different methods for selecting the sparsity pattern • A smart combination of static and dynamic pattern generation techniques is probably the most efficient way to build an effective preconditioner even for very ill-conditioned problems
Conclusions … and future work • Generalizing the results also for non-symmetric linear systems: difficulties with existence and uniqueness of the preconditioner, and with an efficient dynamic pattern generation • Implementing the FSAIPACK library also for distributed memory computers and GPU accelerators mixing OpenMP, MPI and CUDA • Studying in more detail the Iterative FSAI construction: • Analysis of the theoretical properties of Incomplete gradient methods • Replace the Incomplete Steepest Descent method with an Incomplete Self-Preconditioned Conjugate Gradient method • Understand why the pattern is generally good, even though the computed coefficients could be inaccurate • FSAIPACK is freely available online at: http://www.dmsa.unipd.it/~janna/software.html
Department ICEA Thankyouforyourattention Sparse Days 2014 June 5-6