1 / 21

MLD2P4: a package of parallel algebraic multilevel Preconditioners

Bologna, March 2008. MLD2P4: a package of parallel algebraic multilevel Preconditioners. Pasqua D’Ambra , Institute for High-Performance Computing and Networking (ICAR-CNR), Naples Branch, Italy. joint work with Daniela di Serafino, Second University of Naples

louvain
Télécharger la présentation

MLD2P4: a package of parallel algebraic multilevel Preconditioners

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bologna, March 2008 MLD2P4: a package of parallel algebraic multilevel Preconditioners Pasqua D’Ambra, Institute for High-Performance Computing and Networking (ICAR-CNR), Naples Branch, Italy joint work with Daniela di Serafino, Second University of Naples Salvatore Filippone, University of Rome “Tor-Vergata”

  2. Overview • Motivations • Background • Objectives • MLD2P4: Multi-Level Domain Decomposition Parallel Preconditioners Package based on PSBLAS • Algorithms and computational kernels • Software architecture • Some Results & Applications Pasqua D'Ambra - Bologna March 2008

  3. Background Large-scale applications have to solve The linear system matrix is: • Real or complex and square • Large and Sparse • Distributed among parallel processors • Matrix dimensions and entries, conditioning, sparsity pattern andcoupling among variables vary along simulations Pasqua D'Ambra - Bologna March 2008

  4. Background (cont’d) What is the best method/preconditioner? • No absolute winner, experimentation is needed • Reliable preconditioners require access to the complete matrix • Parallel implementation is not trivial Interfacing with application software is required • Custom-made interfaces to parallel legacy codes • Different interfaces for different preconditioners/solvers Pasqua D'Ambra - Bologna March 2008

  5. Objectives designing and implementing a suite of algebraic preconditioners based on Linear Algebra kernels for parallel sparse matrix computations • Flexibility • Different preconditioners by single API • Portability & Efficiency • Standard base software for serial kernels and data communications • Simplicity of usage • Modern (OO) Fortran 95 features and auxiliary routines for smooth legacy code integration Pasqua D'Ambra - Bologna March 2008

  6. mld_prec_apply(M,x,y,…) M, distributed sparse preconditioner (input) x,y, distributed vectors (input/output) mld_prec_build(A,M,…) A, distributed sparse matrix (input) M, distributed sparse preconditioner (output) Additive Schwarz with arbitrary overlap Algebraic multi-level Schwarz Diagonal Block-Jacobi PSBLAS Parallel Sparse Basic Linear Algebra Subprograms MLD2P4 Multi-Level Domain Decomposition Parallel Preconditioners Package based on PSBLAS Pasqua D'Ambra - Bologna March 2008

  7. SBLAS (Duff et al.) PSBLAS (Filippone et al., http://www.ce.uniroma2.it/psblas/) Basic Linear Algebra Operations with Sparse Matrices on MIMD Architectures Iterative Sparse Linear Solvers CG, BiCG, CGS, BiCGSTAB, RGMRES,… Appl. Parallel Sparse Matrix Operations matrix-matrix products, matrix-vector products, … Parallel Sparse Matrix Management allocate, build, update, … Kernels BLACS Basic Linear Algebra Communication Subprograms Base sw MPI F77 F95 Pasqua D'Ambra - Bologna March 2008

  8. MLD2P4 Design Algorithms • Algebraic multi-level Schwarz preconditioners • based on smoothed aggregation • good trade-off between parallelism and convergence • optimal scalability for symmetric positive-definite matrices • algebraic framework allows general-purpose application Pasqua D'Ambra - Bologna March 2008

  9. (1-lev) Schwarz: basic ingredients Adjacency graph ofA 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 0-overlap partition of W d-overlap partition of W Pasqua D'Ambra - Bologna March 2008

  10. AS: basic ingredients (cont’d) Restriction/prolongation operators 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Restriction of A Pasqua D'Ambra - Bologna March 2008

  11. Coarse level correction: basic ingredients Algebraic coarsening uncoupled aggregation Smoothed prol./restr. operators Coarse-level matrix Pasqua D'Ambra - Bologna March 2008

  12. build apply Multilevel-Schwarz preconditioners & computational kernels Example: 2-lev hybrid-post P. D’Ambra, D. di Serafino, S. Filippone, On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners, Applied Numerical Mathematics, 57, 2007. Pasqua D'Ambra - Bologna March 2008

  13. Parallel Preconditioners BJA, ASM, RAS, ASH, ml-additive, ml-hybridpre, ml-hybridpost, ml-symmhybrid Appl. Preconditioner Build prolongation, restriction, coarse matrix, local sparse ILU and LU Preconditioner Application distributed & serial coarse matrix solvers Kernels PSBLAS 2.0 extended version of PSBLAS 1.0 Base sw MLD2P4 Design Software Architecture Pasqua D'Ambra - Bologna March 2008

  14. Performance Results & Comparisons • Different test matrices from various sources • thm matrices: thermal diffusion in solids • kivap matrices: automotive engine design • shipsec matrices: from UF sparse matrix collection • Experiments carried out on different Linux clusters • 64 Intel Itanium dual-processor nodes connected by Quadrics QSNetII Elan 4 • 32 AMD Opteron dual-processor nodes connected by Myrinet • 8 AMD Opteron dual-processor nodes connected by InfiniBand • 8 Intel Itanium dual-processor nodes connected by Myrinet • 16 Intel Pentium IV nodes connected by Fast Ethernet • Comparison with up-to-date related work • Trilinos-ML A. Buttari, P. D’Ambra, D. di Serafino, S. Filippone, 2LEV-D2P4: a package of high-performance preconditioners for scientific and engineering applications, Applicable Algebra in Engineering, Communication and Computing, Vol. 18, 2007. Pasqua D'Ambra - Bologna March 2008

  15. Stopping criterion: or maxit • Unit right-hand side and null starting guess • Row-block distribution of matrices: # submatrices = # procs Experimental Setting MLD2P4: right-preconditioned BiCGSTAB • 1-lev Restricted Additive Schwarz preconditioner with ILU(0) (RAS) • 2-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec. • Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (2LDI) or with UMFPACK (2LDU) on diagonal blocks • 3-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec. • Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (3LDI) or with UMFPACK (3LDU) on diagonal blocks Pasqua D'Ambra - Bologna March 2008

  16. thm matrices: number of iterations thm1 n = 600000 nnz = 2996800 64 Intel Itanium dual-processor nodes connected by QSNetII Pasqua D'Ambra - Bologna March 2008

  17. thm matrices: execution times and speed-ups (OV=1; best execution times:3LDU) 64 Intel Itanium dual-processor nodes connected by QSNetII Pasqua D'Ambra - Bologna March 2008

  18. Application test caselarge eddy simulation of incompressible turbulent flows in a bi-periodical channel main computational kernel nonsymmetric and singular linear systems arising from elliptic PDE with Neumann b.c. A. Aprovitola, P. D’Ambra, F. M. Denaro, D. di Serafino, S. Filippone, Application of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Large-Eddy Simulations of Wall-bounded Turbulent Flows: First Experiments, RT-ICAR-NA-2007-02, July 2007. Pasqua D'Ambra - Bologna March 2008

  19. Experimental Setting Reynolds number: 180 Computational Grid: 140x32x45 non-uniform in the y direction, time-step 10-4 Pressure linear system n=201600 nnz=1398600 MLD2P4: right-preconditioned RGMRES(30) • 1-lev Restricted Additive Schwarz preconditioner with ILU(0) (RAS) • 2-lev/3-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec. • Distributed coarse matrix: 4 sweeps of block Jacobi with ILU(0) (2LDI/3LDI) on diagonal blocks • Stopping criterion: or maxit • General row-block distribution Pasqua D'Ambra - Bologna March 2008

  20. LES of incompressible wall-bounded flow SOR on 1 proc.=9 sec. SOR on 1 proc.=8580 sec. 16 Intel Itanium dual-processor nodes connected by QSNetII Pasqua D'Ambra - Bologna March 2008

  21. Work in progress • Package available on the web very soon • More sophisticated aggregation algorithms • Integration of preconditioners and solvers in large-scale applications Pasqua D'Ambra - Bologna March 2008

More Related