1 / 31

Multithreaded Clustering for Multi-level Hypergraph Partitioning

Multithreaded Clustering for Multi-level Hypergraph Partitioning. Ümit V. Çatalyürek 1,2 , Mehmet Deveci 1,3 , Kamer Kaya 1 , Bora Uçar 4 1 Dept. of Biomedical Informatics, The Ohio State University 2 Dept . of Electrical & Computer Engineering, The Ohio State University

jett
Télécharger la présentation

Multithreaded Clustering for Multi-level Hypergraph Partitioning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multithreaded Clustering for Multi-level Hypergraph Partitioning Ümit V. Çatalyürek1,2, Mehmet Deveci1,3, Kamer Kaya1, Bora Uçar4 1Dept. of Biomedical Informatics, The Ohio State University 2Dept. of Electrical & Computer Engineering, The Ohio State University 3Dept. of Computer Science & Engineering, The Ohio State University 4CNRS and LIP, ENS Lyon

  2. Introduction • Hypergraph partitioning • Used for parallelization of complex and irregular applications • balanced load distribution • good communication pattern • Other applications • VLSI design • Sparse matrix reordering • Static and dynamic load balancing • Cryptosystem design • … Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  3. Hypergraph Partitioning • Hypergraph: H = (V, N) • A net is a subset of vertices. • Each net n has cost c(n) and each vertex v has weight w(v). • λn: Connectivity of a net n, i.e., the number of parts net n is connected. • Objective: Find a partition of the vertices. • minimizes the cut size: • provides load balance: Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  4. Hypergraph Partitioning: Example P2 v1 v2 n3 n1 n4 n2 n5 v3 v5 v4 P1 P3 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  5. Multi-level Approach • Three phases: • Coarsening: obtain smaller and similar hypergraphs to the original, until either a minimum vertex count is reached or reduction on vertex number is lower than a threshold. • Initial Partitioning: find a solution for the smallest hypergraph. • Uncoarsening: Project the initial solution to the finer hypergraphsand refine it iteratively until a solution for the original hypergraph obtained.  Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  6. Parallelization of Coarsening • Why Coarsening? • Coarsening phase is an important phase of multi-level approach. • Worst case time complexity is higher than other phases. • Quality of coarsening affects the run-time of other phases. • A good coarsening requires less local moves in the uncoarsening phase. • Affects the quality of partitioning result. • Two classes of clustering algorithms are parallelized. • Matching-based: only allows two vertices to be clustered. • faster • Agglomerative-based: allows any number of vertices. • better quality Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  7. Clustering Algorithms in PaToH • Heavy connectivity matching • Unmatched vertex u is matched with an unmatched adjacent vertex v with maximum connectivity. • Creates adjacency list on fly. • Then traverses adjacency list and picks the most heavily connected vertex. • Removes the matched vertices of a net from the pins for efficiency. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  8. Heavy Connectivity Matching v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} v2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v4)= 2 conn(v5)= 2 n3 n1 n4 v* = v3 v* = v4 n2 n5 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  9. Clustering Algorithms in PaToH Seq. agglomerative matching • Unmatched vertex u is matched with a unmatched v or a cluster of vertices. • Traverses adjacent vertices and computes the connectivity of vertices. • Calculates connectivity of clusters. • Picks the heaviest connected vertex or vertex cluster and checks for maxW criteria. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  10. Agglomerative Clustering v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} u = v4 adj= {v1,v2,v3,v5} v2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v1)= 1 conn(v2)= 2 conn(v3)= 2 conn(v5)= 1 conn(v1+v2+v3)= 5 conn(v5)= 1 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 n3 conn(v1+v3)= 3 conn(v4)= 2 conn(v5)= 2 n1 n4 v* = v3 n2 n5 v* = v1 v* = v5 (maxW criteria) v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  11. Multithreaded Clustering Algorithms • Matching-based algorithms • Parallel lock-based • Parallel resolution-based • Parallel agglomerative-based algorithm Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  12. Parallel Lock-based Matching • Similar to sequential • But it uses an atomic CHECKANDLOCK operation. • Current vertex u, and candidate vertex v are required to be locked to be matched. • If a better candidate is found, unlocks the previous candidate. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  13. Parallel Lock-based Matching v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} v2 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 n3 n1 n4 n2 n5 v* = v3(locked) v* = v4 v* = v3 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  14. Parallel Resolution-based Matching • Sequential algorithm in parallel. • At the end, check for the conflicts. • A conflict cost a pair, therefore incurs a reduction on the cardinality and quality of matching. • To reduce the number of conflicts, matchings of vertices are checked more frequently. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  15. Parallel Resolution-based Matching v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} Conflict at v1 u = v4 adj= {v1,v2,v3,v5} u = v5 adj = {v2,v4} v2 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v4)= 1 conn(v5)= 1 n3 v* = v5 n1 v* = v4 n4 n2 n5 v* = v3 v* = v3 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  16. Parallel Agglomerative Clustering • Traverses neighbors, creates adjacency list. • Sums up the connectivity of vertex clusters. • If the candidate vertex or vertex cluster is available, • Double checks for matching occurrence and vertex weight criteria. • Selects as the best candidate, unlock previous best candidate. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  17. Parallel Agglomerative Clustering v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} u = v5 adj= {v2,v4} v2 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v2)= 2 conn(v4)= 1 conn(v2+v4)= 3 n3 n1 v* = v2 n4 n2 n5 v* = v3(locked) v* = v4 v* = v3 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  18. Experimental Setup • All the algorithms are implemented in PaToH. • Implemented in C and OpenMP. • Compiled with icc version 11.3 and –O3 flag. • All the algorithms are tested on an in-house cluster consisting of 64 nodes: • Each node has 2 Intel Xeon E5520 (quad-core clocked at 2.27 Ghz with hyper-threading) processors. • The experiments are run on 70 large hypergraphs corresponding to matrices from UFL Sparse Matrix Collection. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  19. Comparison Metrics • Clustering metrics: • Cardinality: the number of clustering decisions. • Quality: the sum of similarities between each vertex pair which resides in the same cluster. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  20. Comparison with Max. Weight. Matching • Matching algorithms are compared against Gabow’smaximum matching algorithm. • Tested on 289 small (at most 10000 rows) matrices different than the dataset used in other experiments, due to the complexity of Gabow’s algorithm O(n3). • Table gives the relative performances of algorithms w.r.t. Gabow’s maximum matching algorithm. • Parallel algorithms are 17-26% worse in terms of quality. • The proposed lock-based parallelization does not hamper the performance of the sequential algorithm. • The resolution-based algorithm is outperformed by the other two. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  21. Performance Profiles compared against Gabow’s algorithm The probability to obtain matching quality 1.25 times less than Qmaxis ~75% • Point (x,y) in the graph: With y probability, the quality of matching found is more than Where Qmax is the quality obtained by Gabow’s algorithm. 45% for resolution-based algorithm Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  22. Resolution-Based Algorithm Conflicts • Average number of matched vertices and conflicts for the proposed resolution-based algorithm. • Number of conflicts increases with number of threads. • However, compared to the cardinality of the matching the conflicts are uncommon. • Max conflict/match in a single graph is as low as 0.7% (mean is as low as 0.008%). Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  23. Matching Speedup • 5.87, 5.82 and 5.23 for resolution-based, parallel agglomerative, and lock-based algorithm respectively. • 5-7 % overhead due to OpenMP and atomic operations. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  24. Matching Speedup Profiles • Matching speed up profiles for #threads = 8. • The resolution-based algorithm is the best among the proposed ones in terms of scalability. P = 33% to obtain at least 6.6 speedup 20% and 16% for parallel agglomerative and lock-based algorithms Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  25. Overall PaToH Speedup • Only the coarsening level is parallelized. • The speedup for overall execution time can be found with Amdahl’s law: where r is the ratio of total coarsening time to total sequential execution time. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  26. Overall Speedup for Match. Algorithms • Lock-based algorithm is more efficient as its speedup is closer to ideal. • Although resolution-based algorithm scales better, it obtains worse quality matching. This increases the running time of initial partitioning and uncoarsening phases. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  27. Overall Speedup for Agglo. Clustering • 6%, 6%, and 10% slower than the best possible parallel execution time for 2, 4, and 8 threads respectively. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  28. Overall Running Time Matching time of agglomerative clustering is 25% higher, but overall execution time is 13% lower. • Overall execution times normalized w.r.t. that of sequential agglomerative clustering. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  29. Minimum Cut Size At most 1% difference At most 3% difference • Minimum cut sizes are almost equal to that of sequential algorithms. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  30. Conclusion • Clustering algorithms are the most time consuming part of a multi-level hypergraphpartitioner. • We have presented two different multithreaded implementations of matching-based clustering algorithms and a multithreaded agglomerative clustering algorithm. • We have presented different sets of experiments. • The proposed algorithms have decent speedups. • They perform as good as sequential counter parts, sometimes even better. • We observe that clustering with better quality helps the partitioner to obtain better cut sizes and reduces the time of the other phases. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

  31. Thanks • For more information • Email umit@bmi.osu.edu • Visit http://bmi.osu.edu/~umit or http://bmi.osu.edu/hpc • Research at the HPC Lab is funded by Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

More Related