1 / 22

Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors

Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors. Amik Singh ParLab, EECS CS 252 Project Presentation 05/04/2012. May the 4 th be with you. Outline. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work.

Sophia
Télécharger la présentation

Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors Amik Singh ParLab, EECS CS 252 Project Presentation 05/04/2012 May the 4th be with you

  2. Outline • Introduction • Experimental Setup • Challenges • Optimizations • Results & Conclusions • Future Work

  3. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Multigrid Method • Multilevel technique to accelerate iterative solver convergence • Conventional iterative solver operates on a grid at full resolution, require many iterations to converge • Multigrid iterates towards convergence via a hierarchy of grid resolutions

  4. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Multigrid Method Finer Level Coarser Level • Coarsened grids damp out errors at large spatial frequencies • Fine grids damp out high-frequency errors

  5. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Multigrid Method Multigrid method operates in what is called a V-cycle Consists of three main phases • Smooth :- relaxation such as Jacobi or the Gauss-Seidel, Red-Black (GSRB) used in our study

  6. Introduction Experimental Setup Challenges Optimizations Results Future Work Multigrid Method • Smooth • Restrict :- copy information from the finest grid to progressively coarsened grids • Interpolate :- reverse of restrict, copy the correction from a coarse grid to a finer grid

  7. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Team Samuel Williams Brian Van Straalen Ann Almgren John Shalf Leonid Oliker Computational Research Division Lawrence Berkeley National Laboratory Dhiraj D. Kalamkar Anand M. Deshpande Mikhail Smelyanskiy Pradeep Dubey Intel Corporation Amik Singh

  8. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Different Architectures used

  9. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Different Architectures used

  10. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Problem Specification Au = f • Variable-coefficient, finite volume discretization of canonical Helmholtz (Laplacian minus Identity) operator • Right-hand side for our benchmarking is sin(∏x)sin(∏y)sin(∏z) on the [0,1] cubical domain • Problem size fixed to 2563 discretization for time to solution comparison on different architectures

  11. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Smooth Pseudo-code • Read in 7 arrays, write out 1 array • 25 flops per update • Flops/Byte = 0.2 << 3.6 for GPUs

  12. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Challenges on GPU • No SIMDization due to red-black updates • Very small shared memory (48 KB) • Expensive inter thread block communication Red-Black Update Pattern

  13. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Baseline Implementation • Only 1 ghost zone • Communicate amongst different sub-domains after each smoothing operation

  14. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Baseline Implementation • Only 1 ghost zone • Communicate amongst different sub-domains after each smoothing operation

  15. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work More ghost zones • We do 2 red/black updates or 4 updates per smooth. • Have 4 ghost zones • Need not communicate after each update Communication Avoiding!

  16. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Wavefront Approach

  17. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work GPU Baseline vs GPU optimized

  18. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Different Architectures

  19. Introduction Experimental Setup Challenges Optimizations Results & Conclusions Future Work Conclusions CPU’s GPU’s • Hardware pre-fetchers decouple memory access through speculative loads • Sufficient on chip-memory for communication avoiding implementations • Parallelism achieved by multi-threaded paradigm • Limited on-chip memory hamper realization of communication – avoiding benefits

  20. Introduction Experimental Setup Challenges Optimizations Results Future Work Future Work • Have a multi-GPU, MPI enabled implementation of the solver • Explore the use of communication-avoiding techniques in matrix-free Krylov Subspace methods like BiCGstab for fast bottom solves

  21. Introduction Experimental Setup Challenges Optimizations Results Future Work Future Work • Have a multi-GPU, MPI enabled implementation of the solver • Explore the use of communication-avoiding techniques in matrix-free Krylov Subspace methods like BiCGstab for fast bottom solves Submit to SC’12 tonight!

  22. Thank You Questions?

More Related