1 / 22

A Parallelization of State-of-the-Art Graph Bisection Algorithms

A Parallelization of State-of-the-Art Graph Bisection Algorithms. Nan Dun , Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo. Problem Description. Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition)

Télécharger la présentation

A Parallelization of State-of-the-Art Graph Bisection Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. A Parallelization of State-of-the-Art Graph Bisection Algorithms Nan Dun, Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo

  2. Problem Description • Graph Partition • Goal: To minimize cut • K-partition • Bisection (Bipartition) • Problem Complexity • To find best partition or To find approximate partitions: NP-Hard1)2) • Solutions • Heuristics • Non-deterministic • On the Grid グラフ分割問題 L={1,2,3} R={4,5,6} 1 1 4 5 2 2 3 6 無向グラフ G=(V,E)が与えられたとき、|L|=|R|を満たすVの分割(L,R)で、LとR間の枝の本数を最小にするものを求める問題。 SWoPP 2006

  3. Practical Application • In Mathematics • Analysis of sparse system of linear equations • In Computer Science • Modeling data placement on distributed memory, to minimize communication • In other Various Domains • VLSI Design • Transportation Networks • Communication Networks SWoPP 2006

  4. Bisection Refinement Bisection Flow • Bisection Initialization • Random Initialization • Half-Half Initialization • Region Growing • Bisection Refinement • Kernighan-Lin3)4) • Tabu Search7) • Fixed Tabu Search • Reactive Tabu Search Bisection Initialization Initial Bisection Final Bisection SWoPP 2006

  5. Min-Max Greedy Growing7) addset A B A Max: Breaking ties by maximizing internal connections Min: Search vertices which cause minimal edge-cut C SWoPP 2006

  6. Kernighan-Lin3)4) A C • Calculate gain of each vertex • Search a serials of pairs which leads to maximal edge-cut reduction if being swapped • Swap pairs of vertices obtained in 2, lock them from further swap in current pass • Iterate step 1, 2, 3 until edge-cut stops to converge B D Swapping Pair of Vertices A B C D gain(B) = -1, gain(C) = -2 ΔCut of swapping B, C = gain(B) + gain(C) + 2 = -1 *gain := # of Internal Edges - # of External Edges SWoPP 2006

  7. Tabu Search7) • Kernighan-Lin Like • Swapping pairs of vertices according to their gains • Temporarily Forbidden • Previously swapped vertices are temporarily forbad to move for a period of time (Tabu Length) • Tabu Length: A fraction (Tabu Fraction) of |V| • E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = 10 Previously swapped pairs are allowed to move again after 10 other swaps • To exceed “Local-Minimum” SWoPP 2006

  8. Graph Types – Tabu Lengths |V| = 35000 |E| = 346572 Deg:Max 43 Min 3 Avg. 19.8 |V| = 17758 |E| = 54196 Deg:Max 573 Min 1 Avg. 6.1 Edge-Cut Tabu Fraction • Number of Vertex Degree • Denser random graphs tend to prefer smaller Tabu lengths, while denser geometric graphs tend to prefer larger tabu lengths8) • Distribution of Vertex Degree • Graphs having uniform distribution of vertex degree tend to have unique fitting tabu length SWoPP 2006

  9. RRTS7) • Synthesis of Heuristics • Heuristics perform as complementary for each other • Reactive • Try each Tabu-length to see which is better • Adaptive to various graphs • Best Quality • Beyond “Local-minimum” • Long Running Time • Scoring Phase REACTIVERANDOMIZEDTABUSEARCH Scoring each Tabu length by small runs of TS do I times Initial bisection by Min-Max do J timesTS with high-scored Tabu length Refine by Kernighan-Lin runs R. Battiti and A. A. Bertossi. Greedy, Prohibition, and Reactive Heuristics for Graph Partitioning. IEEE Transactions on Computers, Vol. 48, April 1999. SWoPP 2006

  10. Multi-level for Large Graphs • Coarsen Phase • Coarsen large graphs to smaller one by using “Match Scheme” • Multi-level coarsen • Bisection Phase • Bisecting small graphs is usually very fast • Uncoarsen Phase • Mapping back to original graph • Perform refinement in each uncoarsening phase • METIS5)12) Matching Scheme SWoPP 2006

  11. Comparison of Heuristics SWoPP 2006

  12. Comparison of Heuristics • METIS • Extremely Fast • Using Multi-level Technique • High-Quality Bisections but worse than RRTS • Multi-level lacks “Global-Optimizing” during coarsen phase • RRTS • Very Slow • Scoring Phase is time costing • “Ever-best” Bisections • Adaptive to kinds of graphs • FTS with Known Tabu-Length • Must faster than RRTS • Comparable result to RRTS SWoPP 2006

  13. A Naive Parallelization Dispatch Graphs RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 Synthesize Results • Run RRTS independently on each node • Simply equivalent to scale-up iterations • Generate Different seeds for different nodes • Heuristics are initial sensitive • 10% ~ 20% enhanced SWoPP 2006

  14. Statistical Properties of Cut-size • Incidence of Bests • Average quality is good • Only 0.25% is the best • General Property • Distribution becomes “Peak” as |V| grows • Distribution tends towards Gaussian8) • Mean and Variance scales linearly with |V| Count Edge-Cut |V| = 35000 |E| = 346572 Degree: Max 43 Min 3 Avg 19.80 RRTS100 on 400 nodes provided by Grid Challenge Federation SWoPP 2006

  15. Issues of Parallelizing Heuristics • Hard by Message-Passing Model (MPI) • J.R. Gilbert and E. Zmijewski9): A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming • Par-METIS (Parallel METIS) • Par-METIS only parallelized “coarsen-uncoarsen” part • Hard to Be Efficient (statistic property) • If we could parallelize heuristic efficiently • The fraction of reach the best bisections is still small among overall iterations • If we corporately run independent instance on Grid • How many nodes will leads to best partition • When will a good threshold come SWoPP 2006

  16. Contribution of Phases • Initial Phase • Reduce large portion of Edge-cut • Good initial partitions lead to good final partitions • Consistent time for different running, good initial partitions gain time for refinement • TS and KL Phase • Reductions tend be alike • More iterations, better results ΔEdge-Cut Best Edge-Cuts SWoPP 2006

  17. Results from Same Initial Bisections • Given Same Initial Partitions • Best initial partitions leads to best final partitions • FTS and KL tend to be deterministic • Fewer swapping are available • Diversity of edge-cut can be cancelled by distributing only one phase • Run FTS and KL on one node is enough Count Perform FTS and KL on same initial partitions, 50 nodes SWoPP 2006

  18. Multi-level Scoring Edge-Cut Edge-Cut Level-1 Tabu Fraction Level-2 Tabu Fraction • Mainly Used to Adapt Large-Scale Graphs • If |V| = 1000, Tabu = 0.01 x 1000 = 10If |V| = 100000, Tabu = 0.01 x 100000 = 1000 • Tuning Tabu-Length to fit specific graphs better • Level-1 Scoring distinguish graphs from their types • Level-2 Scoring test better Tabu-length from specific graphs SWoPP 2006

  19. Final Approaches • Not to Use Multi-level Partition • To preserve a “best” quality • Not to Parallelize Heuristics Itself • Not a good trade-off • To Parallelize Scoring Phase • One group of nodes score one tabu length • With multi-level scoring technique • To Parallelize Initial Phase Only • Remove diversity of edge-cut ASAP • Take advantage of running distribution to remove diversity of edge-cut • Reduce computing effort AMAP • Further refinement can be done on single node • To Use GXP Cluster Shell • “mw” command: mw M {{ W }} SWoPP 2006

  20. Full Picture S: 0.01 S: 0.02 S: 0.03 S: 0.04 S: 0.05 S: 0.06 S: 0.07 Multi-Level Scoring High-Scored Level-1 Tabu Fraction S:0.001 S: 0.002 S: 0.003 S: 0.004 S: 0.005 S: 0.006 S: 0.007 High-Scored Level-2 Tabu Fraction Initial Phase Init Init Init Init Init Init Best Initial Partitions Refinement Phase FTS and KL SWoPP 2006

  21. Conclusions • Bisection Quality • “Ever-Best” partitions • Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS • Bisection Time • Comparable and Reasonable • TimeMETIS< TimeOUR << TimeRRTS • Speed Up 10 comparing to RRTS • Adapted to Grid Environment • Scalable Performance • Convenient usage • Good Fault Tolerant SWoPP 2006

  22. 御静聴ありがとうございました! SWoPP 2006

More Related