1 / 15

PARALLEL GRAPH PARTITIONING ON A HYPERCUBE

PARALLEL GRAPH PARTITIONING ON A HYPERCUBE. DISTRIBUTED GENERATION OF PAIRWISE COMBINATIONS. F. Ercal, P. Sadayappan, and J. Ramanujan University of Missouri-Rolla and The Ohio State University. PROBLEM DEFINITION. Given a graph G(V,E), |V|=N |E|=e

shira
Télécharger la présentation

PARALLEL GRAPH PARTITIONING ON A HYPERCUBE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PARALLEL GRAPH PARTITIONING ON A HYPERCUBE DISTRIBUTED GENERATION OF PAIRWISE COMBINATIONS F. Ercal, P. Sadayappan, and J. Ramanujan University of Missouri-Rolla and The Ohio State University

  2. PROBLEM DEFINITION • Given a graph G(V,E), |V|=N |E|=e • Obtain a K partitions from G with the following constraints: • Balanced: Each partition has equal size • Minimum cut: number of edges across partition is minimized • arises in: TasK Allocation, VLSI layout, File Placement etc. • Intractable, no polynomial time algorithm is Known • Heuristics needed • Kernighan-Lin Mincut Heuristic (1970) • Time complexity: O(N2logN) • Extension by Fiduccia and Mattheyses (1982) • Used Buckets and moves. Linear time algorithm: O(e)

  3. P1 P2 0 -2 v7 v1 +2 +1 v2 v6 -2 v3 -1 v8 +1 +1 v4 v5 -2 0 v7 v1 v2 -1 v6 -2 v3 -3 v8 +1 +1 v4 v5 MINCUT ALGORITHM CUT=5 IF V2 MOVES GAIN=2 and TOT_GAIN=2 CUT=3 IF V5 MOVES GAIN=1 and TOT_GAIN=3

  4. MINCUT ALGORITHM (Contd..) -2 0 v7 v1 v2 -1 v6 v5 -2 v3 -3 v8 v4 -1 CUT=2 IF V1 MOVES GAIN=0 and TOT_GAIN=3

  5. RECURSIVE BISECTION

  6. TIME COMPLEXITY Sequential Time Complexity for Recursive Bisection N + 2*(N/2) + 4*(N/4) + …….2p*(N/2p) ===> O(N*logK) Parallel Time Complexity for Recursive Bisection N + N/2 + N/4 + ……. N/2p ===> O(N) • COMMENT: • speedup is very limited • to increase speedup, bisection algorithm must be parallelized

  7. P1 P2 P3 P6 P4 P5 P7 P8 PAIRWISE MINCUT PAIRS TO BE CONSIDERED FOR MINCUT (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8) (2,3) (2,4) ………….. (2,8) ……. (7,8)

  8. TIME COMPLEXITY Sequential Time Complexity for Pairwise Mincut Parallel Time Complexity for Recursive Bisection (100% processor utilization) • CONCLUSIONS • Sequential Recursive Bisection (RB) has much lower time complexity than Pairwise Mincut (PM) • but superior parallelizability of PM renders its parallel time complexity comparable to that of parallel RB

  9. 1) RECURSIVE BISECTION • Perform repeated bisection, each time doubling the number of partitions, until K partitions are obtained Time Complexity N+ 2*(N/2) + 4*(N/4)+….+2P*(N/2P) ==> O(N*logK) 2) PAIRWISE MINCUT • Initially obtain K partitions. Try to reduce the cut-size between each pair of partitions. K(K-1)/2 pairs (each of size 2N/K) must be considered Time Complexity 3) Any combination of RECURSIVE BISECTION+PAIRWISE MINCUT

  10. DISTRIBUTED GENERATION OF PAIRWISE COMBINATIONS ON A HYPERCUBE Problem • Given 2P disjoint items, P*(2P-1) distinct pairs can be formed. • How would you efficiently generate these pairs on the processors of a hypercube ? • Similar to the problem of distributed scheduling of a round-robin tournament between 2C players using C courts, where the paths between courts form a hypercube topology • maximum utilization of courts (processor utilization) • + • minimum walking between courts (min. comm. overhead)

  11. C2 C1 C2 C1 B00 B01 B10 B11 P00 P01 A00 A01 A10 A11 P10 P11 C1 C2 C1 C2 C1 C2 C1 C2 A00 A01 P01 A10 A11 P10 B00 B01 P11 B10 B11 C1 C2 d=0 d=1 d=2 P00 P01 P10 P11 A00 A01 A10 A11 B00 B01 B10 B11 P00 Distributed PC Algorithm on a 2d Hypercube (4 Processors)

  12. A1 A2 A3 : : AK/2 AK/2+1 : : AK B1 B2 B3 : : BK/2 BK/2+1 : : BK 1 CYCLIC-TOUR RING-FRAGMENTATION 2 A1 A2 : AK/4 AK/4+1 : AK/2 AK/2 AK/2+1 : A3K/4 A3K/4+1 : AK B1 B2 : BK/4 BK/4+1 : BK/2 BK/2 BK/2+1 : B3K/4 B3K/4+1 : BK CYCLIC-TOUR CYCLIC-TOUR RING-FRAGMENTATION

  13. Ring Communication in different phases of Distributed PC algorithm 0110 0111 1110 1111 0100 1100 0110 1110 0010 0011 1010 1011 0000 1000 0001 1001 (a) d=0 1 ring of size 16 1110 1111 0110 0111 1100 0100 1110 0110 1010 1011 0010 0011 1000 0000 1001 0001 (b) d=1 2 rings of size 8

  14. Ring Communication in different phases of Distributed PC algorithm (Contd..) 1110 1111 0110 0111 1100 0100 1110 0110 1010 1011 0010 0011 1000 0000 1001 0001 (c) d=2 4 rings of size 4 1110 1111 0110 0111 1100 0100 1110 0110 1010 1011 0010 0011 1000 0000 1001 0001 (d) d=3 8 rings of size 2

  15. Ring Communication in different phases of Distributed PC algorithm (Contd..) 1110 1111 0110 0111 1100 0100 1110 0110 1010 1011 0010 0011 1000 0000 1001 0001 (e) d=4 16 rings of size 1

More Related