Department ICEA

Department ICEA A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallelpreconditioning of linear systems Massimiliano Ferronato, Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto Sparse Days 2014 June 5-6

Outline • Introduction: preconditioning techniques for high performance computing • Approximate inverse preconditioning for Symmetric Positive Definite matrices: the FSAI-based approach • FSAIPACK: a software package for high performance FSAI preconditioning • Numerical results • Conclusions and future work

Introduction Preconditioningtechniquesfor high performance computing • The implementation of large models is becoming quite a popular effort in several applications, with the the use of parallel computational resources almost mandatory • One of the most expensive and memory-consuming tasks in any numerical application is the solution of large and sparse linear systems • Conjugate Gradient-like solution methods can be efficiently implemented on parallel computers provided that an effective parallel preconditioner is available • Algebraic preconditioners: robust algorithms that generate a preconditioner from the knowledge of the system matrix only, independently of the problem it arises from • Most popular and successful classes of preconditioners: • Incomplete LU factorizations • Approximate inverses • Algebraic multigrid

Introduction Preconditioningtechniquesfor high performance computing • For parallel computations the Factorized Sparse Approximate Inverse (FSAI) approach is quite attractive, as it is «naturally» parallel • FSAIPACK: a parallel software package for high performance FSAI preconditioning in the solution of Symmetric Positive Definite linear systems • Collection of routinesthatimplementseveraldifferentexistingmethods for computing an FSAI-basedpreconditioner • Allows for a veryflexibleuser-specifiedconstruction of a parallel FSAI preconditioner • General purpose package easy to be includedas an externallibraryintoanyexisting code • Currentlycoded in FORTRAN90 with Open MP directives for sharedmemorymachines • Freelyavailable online atwww.dmsa.unipd.it/~janna/software.html

The FSAI-basedapproach FSAI definition • Factorized Sparse Approximate Inverse (FSAI): an almost perfectly parallel factored preconditioner for SPD problems [Kolotilina & Yeremin, 1993] : with G a lower triangular matrix such that: over the set of matrices with a prescribed lower triangular sparsity pattern SL, e.g. the pattern of A or A2, where L is the exact Cholesky factor of A L is not actually required for computing G! • Computed via the solution of n independent small dense systems and applied via matrix-vector products • Nice features: (1) ideally perfect parallel construction and application of the preconditioner; (2) preservation of the positive definiteness of the native matrix

The FSAI-basedapproach FSAI definition • The key property for the quality of any FSAI-based parallel preconditioner is the selection of the sparsity pattern SL • Historically, the first idea to build SL is to define it a priori, but more effective strategies can be developed dynamically selecting the position of the non-zero entries in SL • Static FSAI: SL is defined a priori, e.g., as the pattern of Ak, possibly after a sparsification of A[Huckle 1999; Chow 2000, 2001] • Dynamic FSAI: SL is defined dynamically during the computation of G using some optimization algorithm [Huckle 2003; Janna & Ferronato, 2011] • Recurrent FSAI: the FSAI factor G is defined as the product of several factors, computed either statically or dynamically [Wang & Zhang 2003; Bergamaschi & Martinez 2012] • Post-filtration: it is generally recommended to apply an a posteriori sparsification of G dropping the smallest entries [Kolotilina & Yeremin, 1999]

FSAIPACK Static FSAI construction • FSAIPACK is a software library that collects several different ways for computing an FSAI preconditioner in a shared memory environment and allows for combining the construction techniques into original user-specified strategies • Assuming that SL is given, it is possible to compute G • Static FSAI: denote by Pi the set of column indices belonging to the i-th row of SL Compute the vector by solving the mi×mi linear system: and scale to obtain the dense i-th row of G:

FSAIPACK Static pattern generation • The non-zero pattern for the Static FSAI computation can be generated with the aid of the following recurrence • Static pattern generation:SL is the lower triangular pattern of a power k of A or of a sparsified A with: and: • User-specified parameters needed: k (integer), t (real)

FSAIPACK Dynamic FSAI construction • For ill-conditioned problems high values of k may be needed to properly decrease the iteration count, or even to allow for convergence, and the preconditioner construction and application can become quite heavy • A most efficient option relies on selecting the pattern dynamically by an adaptive procedure which uses somewhat the “best” available positions for the non-zero coefficients • The Kaporin conditioning number b of an SPD matrix is defined as: where: and iff

FSAIPACK Dynamic FSAI construction • The Kaporin conditioning number of an FSAI preconditioned matrix reads [Janna & Ferronato 2011; Janna et al. 2014] : where yi depends on the non-zero entries in the i-th row of G: • The scalar yi is a quadratic form of A in • Idea fo generating the pattern dynamically: for each row select the non-zero positions in providing the largest decrease in the yi value • Compute the gradient of yi with respect to and retain the positions containing the largest entries • The procedure can be iterated until either a maximum number of iterations or some exit tolerance is met

FSAIPACK Dynamic FSAI construction • Dynamic construction of FSAI by an adaptive pattern generation row-by-row: • Adaptive FSAI:SL is built dynamically and G immediately computed, choosing s entries per step, with a maximum number of kmax steps, into the i-th row such that: until the exit tolerance e is achieved: • User-specified parameters needed: kmax (integer), s (integer), e (real) • The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible

FSAIPACK Dynamic FSAI construction • As yi is a quadratic form of A in the i-th row of G, it can be minimized by using a gradient method • This gives rise to an iterative construction of SL and G, another kind of Dynamic FSAI • Iterative FSAI: the i-th row of G is computed by minimizing yi with an incomplete Steepest Descent method: retaining the s largest entries per row for kiter iterations until the exit tolerance e is achieved • User-specified parameters needed: kiter (integer), s (integer), e (real) • The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible • The use of an inner preconditioner M-1 is also allowed

FSAIPACK Recurrent FSAI construction • Implicit construction of the sparsity pattern SL, writing the FSAI preconditioner as a product of factors: • Recurrent FSAI: the final factor G is obtained as the product of nl factors: where Gk is the k-level preconditioning factor for: with A0=A and G0=I. Even if each factor is very sparse and computationally very cheap, the resulting preconditioner is actually very dense and never formed explicitly:

FSAIPACK Numericalresults • Analysis of the properties of each single method on a structural test case (size = 190,581, no. of non-zeroes: 7,531,389): • Static FSAI

FSAIPACK Numericalresults • Adaptive FSAI

FSAIPACK Numericalresults • Iterative FSAI

FSAIPACK Numericalresults • Recurrent FSAI

FSAIPACK Numericalresults • Comparison between the different methods on a Linux Cluster with 24 processors: • The most efficient option is combining the different methods so as to maximize the pros and minimize the cons • FSAIPACK implements all the methods for building a FSAI-based preconditioner following a user-specified strategy that can be prescribed by a pseudo-programming language

FSAIPACK Numericalresults • Examples and numerical results (Linux Cluster, 24 processors) EMILIA (reservoir mechanics): size = 923,136 non-zeroes = 41,005,206 Note: Post-filtration is used anyway

FSAIPACK Numericalresults STOCF (porous media flow): size = 1,465,137 non-zeroes = 21,005,389 Note: Post-filtration is used anyway

FSAIPACK Numericalresults MECH (structural mechanics): size = 1,102,614 non-zeroes = 48,987,558 Note: Post-filtration is used anyway

FSAIPACK Numericalresults • Example of strategy prescribed using the pseudo-programming language > MK_PATTERN [ A : patt ] -t -k 1e-2 2 > STATIC_FSAI [ A, patt : F ] > TRANSP_FSAI [ F : Ft ] > PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8 > ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3 > POST_FILT [ A : F ] -t 0.01 > TRANSP_FSAI [ F : Ft ] > APPEND_FSAI [ F, Ft : PREC ] Easy management also of complex strategies

FSAIPACK Numericalresults • FSAIPACK scalability on the largest example • Test on an IBM-Bluegene/Q node equipped with 16 cores • Between 16 and 64 threads the ideal profile is flat because all physical cores are saturated • Using more threads than cores is convenient as we hide memory access latencies

Conclusions Results… • FSAI-based approaches are attractive preconditioners for an efficient solution of SPD linear systems on parallel computers • The traditional static pattern generation is fast and cheap, but can give rise to poor preconditioners • The dynamic pattern generation can improve considerably the FSAI quality, especially in ill-conditioned problems, but its cost typically increases quite rapidly with the density of the preconditioner • FSAIPACK is a high performance software package that has been implemented for building a FSAI-based preconditioner using a user-specified strategy that combines different methods for selecting the sparsity pattern • A smart combination of static and dynamic pattern generation techniques is probably the most efficient way to build an effective preconditioner even for very ill-conditioned problems

Conclusions … and future work • Generalizing the results also for non-symmetric linear systems: difficulties with existence and uniqueness of the preconditioner, and with an efficient dynamic pattern generation • Implementing the FSAIPACK library also for distributed memory computers and GPU accelerators mixing OpenMP, MPI and CUDA • Studying in more detail the Iterative FSAI construction: • Analysis of the theoretical properties of Incomplete gradient methods • Replace the Incomplete Steepest Descent method with an Incomplete Self-Preconditioned Conjugate Gradient method • Understand why the pattern is generally good, even though the computed coefficients could be inaccurate • FSAIPACK is freely available online at: http://www.dmsa.unipd.it/~janna/software.html

Department ICEA Thankyouforyourattention Sparse Days 2014 June 5-6

Department ICEA

Department ICEA

Presentation Transcript

HEALTHCARE DEPARTMENT

Department

Emergency Department

DEPARTMENT MEETING

DEPARTMENT

Technology Department

PHYSICS DEPARTMENT

PHYSICS DEPARTMENT

DEPARTMENT:

DEPARTMENT

Livingston Police Department Department Update

Department

Department of Physics Department of Chemistry

DEPARTMENT

DEPARTMENT

South Portland School Department Transportation Department

Department

ICEA Approved Trainer

Pakistan; Challenges in Political-Economic Development ICEA Feb 13 th 2007

The Chinese Economy Macroeconomic Risks 11 th October 2005 – ICEA Prof. Paul E M Reynolds

NEVADA DEPARTMENT