1 / 22

Dynamic Load Balancing in Scientific Simulation

Dynamic Load Balancing in Scientific Simulation. Angen Zheng. Static Load Balancing: No Data Dependency. No Communication among PUs. PU 1. Computations. Initial Load. Unchanged Load Distribution. PU 2. PU 3. Distribute the load evenly across processing unit.

gyala
Télécharger la présentation

Dynamic Load Balancing in Scientific Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Load Balancing in Scientific Simulation Angen Zheng

  2. Static Load Balancing: No Data Dependency No Communication among PUs. PU 1 Computations Initial Load UnchangedLoad Distribution PU 2 PU 3 • Distribute the load evenly across processing unit. • Is this good enough? It depends! • No data dependency! • Load distribution remain unchanged! Initial Balanced Load Distribution

  3. Static Load Balancing: Data Dependency PUs need to communicate with each other to carry out the computation. PU 1 Computation Initial Load UnchangedLoad Distribution PU 2 PU 3 • Distribute the load evenly across processing unit. • Minimize inter-processing-unit communication! • By collocating the most communicating data into a single PU. Initial Balanced Load Distribution

  4. Load Balancing in Scientific Simulation PUs need to communicate with each other tocarry out the computation. PU 1 Iterative Computation Steps Repartitioning Initial Load PU 2 PU 3 • Distribute the load evenly across processing unit. • Minimize inter-processing-unit communication! • By collocating the most communicating data into a single PU. • Minimize data migration among processing units. ImbalancedLoad Distribution Balanced Load Distribution Initial Balanced Load Distribution Dynamic Load Balancing

  5. DynamicLoad Balancing: (Hyper)graph Partitioning • Given a (Hyper)graph G=(V, E). • (Hyper)graph Partitioning • Partition V into k partitions P0, P1, … Pk, such that all parts • Disjoint: P0 U P1 U … Pk = V and Pi ∩ Pj = Ø where i ≠ j. • Balanced: |Pi| ≤ (|V| / k) * (1 + ᵋ) • Edge-cut is minimized: edges crossing different parts. Bcomm= 3

  6. DynamicLoad Balancing: (Hyper)graph Repartitioning • Given a partitioned (Hyper)graph G=(V, E). • (Hyper)graph Repartitioning • Repartition V into k partitions P0, P1, … Pk, such that all parts • Disjoint. • Balanced. • Minimal Edge-cut. • Minimal Migration. Bcomm = 4 Bmig =2 Repartitioning Initial Partitioning

  7. DynamicLoad Balancing: (Hypergraph) Repartition-Based • Building the (Hyper)graph • Vertices represent data. • Vertex object size reflects the amount of the data per vertex. • Vertex weight accounts for computation per vertex. • Edges reflects data dependencies. • Edge weight represents the communication among vertices. PU1 Repartitioning the Updated (Hyper)graph Iterative Computation Steps Build the Initial (Hyper)graph PU2 PU3 Update the Initial (Hyper)graph Load Distribution After Repartitioning Initial Partitioning Reduce the Dynamic Load Balancing to a (Hyper)graph Repartitioning Problem.

  8. (Hypergraph) Repartition-Based DynamicLoad Balancing: Cost Model

  9. (Hypergraph) Repartition-Based DynamicLoad Balancing: Network Topology

  10. (Hypergraph) Repartition-Based DynamicLoad Balancing: Cache-Hierarchy PU1 Iterative Computation Steps Rebalancing Initial (Hyper)graph PU2 PU3 Updated (Hyper)graph Migration Once After Repartitioning Initial Partitioning

  11. Hierarchical Topology-Aware (Hyper)graph Repartition-BasedDynamicLoad Balancing • Inter-Node Repartitioning: • Goal: Group the most communicating data into compute nodes closed to each other. • Solution: • Regrouping. • Repartitioning. • Refinement. • Intra-Node Repartitioning: • Goal: Group the most communicating data into cores sharing more level or caches. • Solution#1: Hierarchical repartitioning. • Solution#2: Flat repartitioning.

  12. Hierarchical Topology-Aware (Hyper)graph Repartition-BasedDynamicLoad Balancing • Inter-Node Repartitioning • Regrouping. • Repartitioning. • Refinement. Regrouping

  13. Hierarchical Topology-Aware (Hyper)graph Repartition-BasedDynamicLoad Balancing • Inter-Node (Hyper)graph Repartitioning • Regrouping. • Repartitioning. • Refinement. Repartitioning Migration Cost: 2 (inter-node) + 2 (intra-node) Communication Cost: 3 (inter-node)

  14. Topology-Aware Inter-Node (Hyper)graph Repartitioning • Inter-Node (Hyper)graph Repartitioning • Regrouping. • Repartitioning. • Refinement. Refinement Migration Cost: 2 (intra-node) Communication Cost: 3 (inter-node)

  15. Hierarchical Topology-Aware Intra-Node (Hyper)graph Repartitioning • Main Idea: Repartition the subgraph assigned to each node hierarchically according to the cache hierarchy. 0 1 2 3 4 5 0 1 2 3 4 5 3 5 1 2 4 0 4 2 3 5 0 1

  16. Flat Topology-Aware Intra-Node (Hyper)graph Repartition

  17. Flat Topology-Aware Intra-Node (Hyper)graph Repartition Old Partition Assignment Old Partition

  18. Flat Topology-Aware Intra-Node (Hyper)graph Repartition Old Partition New Partition

  19. Flat Topology-Aware Intra-Node (Hyper)graph Repartition Old Partition Assignment Partition Migration Matrix New Partition Partition Communication Matrix

  20. Flat Topology-Aware Intra-Node (Hyper)graph Repartition Partition Migration Matrix New Partition Partition Communication Matrix

  21. Major References • [1] K. Schloegel, G. Karypis, and V. Kumar, Graph partitioning for high performance scientific simulations. Army High Performance Computing Research Center, 2000. • [2] B. Hendrickson and T. G. Kolda, Graph partitioning models for parallel computing," Parallel computing, vol. 26, no. 12, pp. 1519~1534, 2000. • [3] K. D. Devine, E. G. Boman, R. T. Heaphy, R. H.Bisseling, and U. V. Catalyurek, Parallel hypergraph partitioning for scientific computing," in Parallel and Distributed Processing Symposium, 2006. IPDPS2006. 20th International, pp. 10-pp, IEEE, 2006. • [4] U. V. Catalyurek, E. G. Boman, K. D. Devine,D. Bozdag, R. T. Heaphy, and L. A. Riesen, A repartitioning hypergraph model for dynamic load balancing," Journal of Parallel and Distributed Computing, vol. 69, no. 8, pp. 711~724, 2009. • [5] E. Jeannot, E. Meneses, G. Mercier, F. Tessier,G. Zheng, et al., Communication and topology-aware load balancing in charm++ with treematch," in IEEE Cluster 2013. • [6] L. L. Pilla, C. P. Ribeiro, D. Cordeiro, A. Bhatele,P. O. Navaux, J.-F. Mehaut, L. V. Kale, et al., Improving parallel system performance with a numa-aware load balancer," INRIA-Illinois Joint Laboratory on Petascale Computing, Urbana, IL, Tech. Rep. TR-JLPC-11-02, vol. 20011, 2011.

  22. Thanks!

More Related