1 / 16

Accelerating generalized Cholesky decomposition using multiple processors

Accelerating generalized Cholesky decomposition using multiple processors. Application in Least-Squares Collocation. Error-covariance estimation. Cholesky Factorization. L : lower triangular matrix. Generalized Cholesky. More Generalized Cholesky. Parallization.

jewell
Télécharger la présentation

Accelerating generalized Cholesky decomposition using multiple processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating generalized Cholesky decomposition using multiple processors C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  2. Application in Least-Squares Collocation C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  3. Error-covariance estimation C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  4. Cholesky Factorization • L: lower triangular matrix C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  5. Generalized Cholesky C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  6. More Generalized Cholesky C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  7. Parallization • When diagonal element has been computed may each element in the row be reduced separately: • Hence each processor may take care of one column. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  8. Blockwise factorization • Should one row be factorized at at time ? • Or should we make the factorization of blocks of elements ? • Out-of-core factorization needed for large matrices, so let the processors work on blocked matrices. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  9. c11 c12 c13 c14 c15 c16 c11 c12 c13 c14 c15 c16 c22 c25 c22 c25 c21 c23 c24 c26 c21 c23 c24 c26 c33 c34 c35 c36 c33 c34 c35 c36 c31 c32 c31 c32 c41 c42 c43 c44 c45 c46 c41 c42 c43 c44 c45 c46 c51 c52 c53 c54 c55 c56 c51 c52 c53 c54 c55 c56 c61 c62 c63 c64 c65 c66 c61 c62 c63 c64 c65 c66 Block division Column-wise and rectangular Blocks 1 2 3 Blocks 1 2 Block 3 3 blocks ‘Column-wise’ 1-dim. of size 9 3 blocks rectangular 2-dim. of size 3*3 C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  10. Blocksize tests NEQ = 10000, Nproc = 4 NEQ = 20000, Nproc = 2 C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  11. Parallelization Flowchart over the Choleski factorisation with NES_MP and related subroutine(s) C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  12. Parallelization Results Results (Perf. test on two PCs, Compiler PGF90) C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  13. Integration in GEOCOL18 Geocol integration tests: Timing (in s) for equation solving only. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  14. Performance Increase C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  15. Conclusion • Generalized Cholesky-factorization enables the use of parallelization for solution and error-covariance computation. • Time gain using parallelization depends on number of processors, block-size and how busy the computer is doing other things. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

  16. Note: further use of multiprocessing • Evaluation of spherical harmonic series (N.Pavlis et al.). • Establishing the normal-equation matrix or computing a column of covariances • Factorisation may start as soon as a row of blocks has been established. • Gives realistic speeds of LSC applications (minutes instead of days). C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

More Related