1 / 71

Efficient Parallelization for AMR MHD Multiphysics Calculations

Efficient Parallelization for AMR MHD Multiphysics Calculations. Implementation in AstroBEAR. Outline. Motivation Better memory management Sweep updates Distributed tree Improved parallelization Level threading Dynamic load balancing. Outline. Motivation – HPC

muniya
Télécharger la présentation

Efficient Parallelization for AMR MHD Multiphysics Calculations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR

  2. Outline • Motivation • Better memory management • Sweep updates • Distributed tree • Improved parallelization • Level threading • Dynamic load balancing

  3. Outline • Motivation – HPC • Better memory management • Sweep updates • Distributed tree • Improved parallelization • Level threading • Dynamic load balancing

  4. Sweep Method for Unsplit Schemes

  5. Sweep Method for Unsplit Stencils • At the end of an update we need new values for the fluid quantities Q at time step n+1

  6. Sweep Method for Unsplit Stencils • At the end of an update we need new values for the fluid quantities Q at time step n+1 • The new values for Q will depend on the EMF’s calculated at the cell corners

  7. Sweep Method for Unsplit Stencils • At the end of an update we need new values for the fluid quantities Q at time step n+1 • The new values for Q will depend on the EMF’s calculated at the cell corners • As well as the x and y fluxes at the cell faces

  8. Sweep Method for Unsplit Stencils • At the end of an update we need new values for the fluid quantities Q at time step n+1 • The new values for Q will depend on the EMF’s calculated at the cell corners • As well as the x and y fluxes at the cell faces • The EMF’s themselves also depend on the fluxes at the faces adjacent to each corner.

  9. Sweep Method for Unsplit Stencils • At the end of an update we need new values for the fluid quantities Q at time step n+1 • The new values for Q will depend on the EMF’s calculated at the cell corners • As well as the x and y fluxes at the cell faces • The EMF’s themselves also depend on the fluxes at the faces adjacent to each corner. • Each calculation extends the stencil until we are left with the range of initial values Q at time step n needed for the update of a single cell

  10. Sweep Method for Unsplit Stencils • At the end of an update we need new values for the fluid quantities Q at time step n+1 • The new values for Q will depend on the EMF’s calculated at the cell corners • As well as the x and y fluxes at the cell faces • The EMF’s themselves also depend on the fluxes at the faces adjacent to each corner. • Each calculation extends the stencil until we are left with the range of initial values Q at time step n needed for the update of a single cell

  11. Sweep Method for Unsplit Stencils

  12. Sweep Method for Unsplit Stencils

  13. Sweep Method for Unsplit Stencils

  14. Sweep Method for Unsplit Stencils

  15. Sweep Method for Unsplit Stencils

  16. Sweep Method for Unsplit Stencils

  17. Sweep Method for Unsplit Stencils

  18. Sweep Method for Unsplit Stencils

  19. Distributed Tree

  20. Distributed Tree Level 0 base grid Level 0 node

  21. Distributed Tree Parent-Child Level 1 nodes are children of level 0 node Level 1 grids nested within parent level 0 grid

  22. Distributed Tree Neighbor Neighbor Conservative methods require synchronization of fluxes between neighboring grids as well as the exchanging of ghost zones

  23. Distributed Tree Level 2 nodes are children of level 1 nodes Level 2 grids nested within parent level 1 grids

  24. Distributed Tree Since the mesh is adaptive there are successive generations of grids and nodes

  25. Distributed Tree New generations of grids need access to data from the previous generation of grids on the same level that physically overlap

  26. Distributed Tree Current generation

  27. Distributed Tree Current generation Previous generation

  28. Distributed Tree Current generation Previous generation

  29. Distributed Tree Current generation Level 1 Overlaps Previous generation

  30. Distributed Tree Current generation Level 2 Overlaps Previous generation

  31. Distributed Tree Current generation AMR Tree Previous generation While the AMR grids and the associated computations are normally distributed across multiple processors, the tree is often not.

  32. Distributed Tree Current generation Previous generation Each processor only needs to maintain its local sub-tree with connections to each of its grid’s surrounding nodes (parent, children, overlaps, neighbors)

  33. Distributed Tree For example processor 1 only needs to know about the following sub-tree

  34. Distributed Tree And the sub-tree for processor 2

  35. Distributed Tree And the sub-tree for processor 3

  36. Distributed Tree • While the memory and communication required for each node is small to that required for each grid, the memory required for the entire AMR tree can overwhelm that required for the local grids when the number of processors becomes large. • Consider a mesh in which each grid is 8x8x8 where each cell contains 4 fluid variables. Lets also assume that each node requires 7 values describing its physical location and the host processor. The memory requirement for the local grids becomes comparable to the AMR tree when there are 8x8x8x4/7 = 292 processors. • Even if each processor prunes its local tree by discarding nodes not connected to its local grids, the communication required to globally share each new generation of nodes will eventually prevent the simulation from scaling.

  37. Distributed Tree • When successive generations of level n nodes are created by their level n-1 parent nodes, each processor must update its local sub-tree with the subset of level n nodes that connect to the nodes associated with its local grids. • Instead of globally communicating every new level n node to every other processor, each processor determines which other processors need to know about which new level n nodes. • Since children are always nested within parent grids, nodes that neighbor or overlap will have parents that neighbor or overlap respectively. This can be used to localize tree communication as follows.

  38. Sub-tree construction Tree starts at a single root node that persists from generation to generation Processor 2 Processor 3 Processor 1

  39. Sub-tree construction Root node creates next generation of level 1 nodes Processor 2 Processor 3 Processor 1

  40. Sub-tree construction Root node creates next generation of level 1 nodes It next identifies overlaps between new and old children Processor 2 Processor 3 Processor 1

  41. Sub-tree construction Root node creates next generation of level 1 nodes It next identifies overlaps between new and old children As well as neighbors among new children Processor 2 Processor 3 Processor 1

  42. Sub-tree construction New nodes along with their connections are then communicated to the assigned child processor. Processor 2 Processor 3 Processor 1

  43. Sub-tree construction Processor 2 Processor 3 Processor 1

  44. Sub-tree construction Processor 2 Processor 3 Processor 1

  45. Sub-tree construction Level 1 nodes then locally create new generation of level 2 nodes Processor 2 Processor 3 Processor 1

  46. Sub-tree construction Level 1 nodes then locally create new generation of level 2 nodes Processor 2 Processor 3 Processor 1

  47. Sub-tree construction Processor 2 Processor 3 Processor 1 Level 1 grids communicate new children to their overlaps so that new overlap connections on level 2 can be determined

  48. Sub-tree construction Processor 2 Processor 3 Processor 1 Level 1 grids communicate new children to their neighbors so that new neighbor connections on level 2 can be determined

  49. Sub-tree construction Processor 2 Processor 3 Processor 1 Level 1 grids and their local trees are then distributed to child processors.

More Related