1 / 153

Graph P artitioning a nd Clustering for Community Detection

Graph P artitioning a nd Clustering for Community Detection. Presented By: Group One. Outline. Introduction:  Hong Hande Graph Partitioning: Muthu Kumar C and Xie Shudong Partitional Clustering: Agus Pratondo Spectral Clustering: Li Furong and Song Chonggang

diamond
Télécharger la présentation

Graph P artitioning a nd Clustering for Community Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph PartitioningandClustering forCommunity Detection Presented By: Group One

  2. Outline • Introduction:  Hong Hande • Graph Partitioning: Muthu Kumar C and Xie Shudong • PartitionalClustering: AgusPratondo • Spectral Clustering: Li Furong and Song Chonggang • Summary and Applications of Community Detection: AleksandrFarseev

  3. introduction -BY HONG HANDE

  4. Facebook Group https://www.facebook.com/thebeatles?rf=111113312246958

  5. Flickr group http://www.flickr.com/groups/49246928@N00/pool/with/417646359/#photo_417646359

  6. CS6234 Advanced Algorithms Whole class as a community Sub-community

  7. Graph construction from web data(1) Webpage www.x.com href = “www.y.com” href = “www.z.com” x Webpage www.y.com href = “www.x.com” href = “www.a.com” href = “www.b.com” z y a b Webpage www.z.com href = “www.a.com”

  8. Graph construction from web data(2)

  9. Web pages as a graph Cnn.com Lots of links, lots of images. (1316 tags) http://www.aharef.info/2006/05/websites_as_graphs.htm

  10. Internet as a graph nodes = service providers edges = connections hierarchical structure S. Carmi,S. Havlin, S. Kirkpatrick, Y. Shavitt, E. Shir. A model of Internet topology using k-shell decomposition. PNAS 104 (27), pp. 11150-11154, 2007

  11. Emerging structures • Graph (from web, daily life) present certain structural characteristics • Group of nodes interacting with each other Dense inter-connections functional/topical associations Community a.k.a. group, subgroup, module, cluster

  12. Community Types • Explicit • The result of conscious human decision • Implicit • Emerging from the interactions & activities of users • Need special methods to be discovered

  13. Defining Communities • Often communities are defined with respect to a graph, G = (V,E) representing a set of objects (V) and their relations (E). • Even if such graph is not explicit in the raw data, it is usually possible to construct, e.g. feature vectors distances graph

  14. Communities and graphs • Given a graph, a community is defined as a set of nodes that are more densely connected to each other than to the rest of the network nodes Internal edge External edge

  15. Graph cuts • A cut is a partition of the vertices of a graph into two disjoint subsets. • The cut-set of the cut is the set of edges whose end points are in different subsets of the partition.

  16. Community detection methods • Graph partitioning • Node clustering • K-means clustering • Spectral clustering

  17. Graph partitioning MUTHU KUMAR C

  18. Graph Partitioning • Dividing vertices into groups of predefined size. • Given a graph G = (V, E, WE), with vertices V, edges E and edge weights WE.Choose a partition such that: • V = V1U V2U … U VP • V1∩ V2 …. ∩ Vp= Ø • Bisectioning: Partitioning into twoequal sizedgroups of vertices.

  19. How many partitions? • There exists many possible partitioning to search. • Just to divide into 2 partitions there are: which is exponential in n. • Choosing optimal partitioning is NP-complete. 5 5 5 5 1 1 1 1 6 6 6 6 2 2 2 2 3 3 3 3 7 7 7 7 4 4 4 4 8 8 8 8

  20. Kernighan/Lin Algorithm1 • An iterative, 2-way, balanced partitioning (bi-sectioning) heuristic. • The algorithm can also be extended to solve more general partitioning problems. • Given and find a partition such that: • Cutsize T between A and B is minimized.where 1. Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. Bell system technical journal, 49(2), 291-307.

  21. Kernighan-Lin: Definitions • Let and be two vertices. • External Cost • Internal Cost • Moving a node from A to B increases T by and decreases T by • This is measured as • , and are defined analogously for b in B.

  22. K/L Algorithm: Swap a b a b A B Cutsize b a A A B B Cutsize

  23. Kernighan-Lin Algorithm • // Kernighan-Lin Page 1 of 2 • Compute T = Cost(A,B) for Initial A, B • Repeat // sweep begins • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (ai,bi) with maximal gai,bi(i) • Mark ‘a’ and ‘b’ • Update D(v) for all unmarked v • Endwhile Each sweep greedily computes |V|/2 possible X A, Y  B to swap, picks a sequence of best such swaps. but do not swap them. as though ‘a’ and ‘b’ had been swapped.  (1) (2)

  24. Kernighan-Lin Algorithm • // Kernighan-Lin Page 2 of 2 • We have now computed: • *) a sequence of pairs(a1,b1), … , (ak,bk) and • *) gains g(1),…., g(k) where k = |V|/2, • numbered in the order in which we marked them • Pick m ≤ k, which maximizes gain. • Gain= • If Gain > 0 then // it is worth swapping • Update newA = A - { a1,…,am } U { b1,…,bm } • Update newB = B - { b1,…,bm } U { a1,…,am } • Update T = T – Gain • endif • Until Gain <= 0 // sweep ends Gain is reduction in cost from swapping (a1,b1) through (am,bm)

  25. Kernighan-Lin Example Edges are unweighted in this example 5 1 6 2 3 7 4 8 Cut cost: 9 Unmarked: 1,2,3,4,5,6,7,8

  26. Kernighan/Lin Example 5 1 6 2 3 7 4 8 Cut cost: 9 Unmarked : 1,2,3,4,5,6,7,8 Calculate D values to find best pair Costs D(v) of each node: D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1 Nodes that lead to maximum gain

  27. Kernighan/Lin Example 5 1 6 2 3 7 4 8 Cut cost: 9 Unmarked : 1,2,3,4,5,6,7,8 Mark the identified pair as a candidate swap. Costs D(v) of each node: D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3 Nodes that lead to maximum gain Gain after node swapping Gain in the current pass

  28. Kernighan/Lin Example 5 5 1 1 6 6 2 2 3 3 7 7 4 4 8 8 Cut cost: 9 Unmarked: 1,2,3,4,5,6,7,8 Cut cost: 6 Unmarked: 1,2,4,6,7,8 New partitions and cut cost D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3

  29. Kernighan/Lin Example 5 5 1 1 6 6 2 2 3 3 7 7 4 4 8 8 Cut cost: 9 Unmarked: 1,2,3,4,5,6,7,8 Cut cost: 6 Unmarked: 1,2,4,6,7,8 D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3 D(1) = -1 D(6) = 2D(2) = -1 D(7)=-1D(4) = 3D(8)=-1

  30. Kernighan/Lin Example 5 5 5 1 1 1 6 6 6 2 2 2 3 3 3 7 7 7 4 4 4 8 8 8 Cut cost: 9 Unmarked: 1,2,3,4,5,6,7,8 Cut cost: 6 Unmarked: 1,2,4,6,7,8 D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3 D(1) = -1 D(6) = 2D(2) = -1 D(7)=-1D(4) = 3D(8)=-1g2 = 3+2-0 = 5 Swap (4,6) G2 = G1+g2=8 Nodes that lead to maximum gain Gain after node swapping Gain in the current pass

  31. Nodes that lead to maximum gain Kernighan/Lin Example 5 5 5 5 1 1 1 1 6 6 6 6 2 2 2 2 3 3 3 3 7 7 7 7 4 4 4 4 8 8 8 8 Cut cost: 9 Unmarked: 1,2,3,4,5,6,7,8 Cut cost: 6 Unmarked: 1,2,4,6,7,8 Cut cost: 1 Unmarked: 1,2,7,8 Cut cost: 7 Unmarked: 2,8 D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3 D(1) = -1 D(6) = 2D(2) = -1 D(7)=-1D(4) = 3D(8)=-1g2 = 3+2-0 = 5 Swap (4,6) G2 = G1+g2=8 D(1) = -3D(7)=-3D(2) = -3 D(8)=-3g3 = -3-3-0 = -6 Swap (1,7) G3= G2+g3= 2 Gain after node swapping Gain in the current pass

  32. Kernighan/Lin Example 5 5 5 5 5 1 1 1 1 1 6 6 6 6 2 6 2 2 2 2 3 3 3 3 7 3 7 7 7 7 4 4 4 4 4 8 8 8 8 8 Cut cost: 6 Unmarked: 1,2,4,6,7,8 Cut cost: 1 Unmarked: 1,2,7,8 Cut cost: 7 Unmarked: 2,8 Cut cost: 9 Unmarked: – Cut cost: 9 Unmarked: 1,2,3,4,5,6,7,8 D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3 D(1) = -1 D(6) = 2D(2) = -1 D(7)=-1D(4) = 3D(8)=-1g2 = 3+2-0 = 5 Swap (4,6) G2 = G1+g2=8 D(1) = -3D(7)=-3D(2) = -3 D(8)=-3g3 = -3-3-0 = -6 Swap (1,7) G3= G2+g3= 2 D(2) = -1D(8)=-1 g4 = -1-1-0 = -2 Swap (2,8) G4 = G3+g4 = 0

  33. Kernighan/Lin Example D(1) = 1 D(5) = 1D(2) = 1 D(6) = 2D(3) = 2D(7) = 1D(4) = 1 D(8) = 1g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3 D(1) = -1 D(6) = 2D(2) = -1 D(7)=-1D(4) = 3D(8)=-1g2 = 3+2-0 = 5 Swap (4,6) G2 = G1+g2=8 D(1) = -3D(7)=-3D(2) = -3 D(8)=-3g3 = -3-3-0 = -6 Swap (1,7) G3= G2+g3= 2 D(2) = -1D(8)=-1 g4 = -1-1-0 = -2 Swap (2,8) G4 = G3+g4 = 0 Maximum positive gain Gm = 8 with m = 2. 5 1 Since Gm> 0,the first m = 2 swaps (3,5) and (4,6) are executed. 6 2 3 7 Since Gm> 0, more passes are needed until Gm 0. 4 8

  34. Escaping Local minima • Non monotonically increasing gains, that is, in the sequence of m swaps chosen, some may be negative. • Possibly escape “local minima”. • But there is no guarantee of optimal solution.

  35. Demerits • Bi-sectioning does not generalize well to k-way partitioning. • Partition to predefined sizes limits utility to niche applications.

  36. Analysis of K/L Algorithm XIE SHUDONG

  37. K/L Algorithm: Analysis Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then … it is worth swapping • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0

  38. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then … it is worth swapping • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 A B Edges |V|/2 Nodes |V|/2 All Ext Edges |V|²/4 = = * *

  39. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 a O(|V|²) For one node a: D(a) = – I(a) O(|V|) E(a) For all |V| nodes O(|V|²) b A B

  40. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 O(|V|²) O(|V|)

  41. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 O(|V|²) O(|V|) O(|V|²)

  42. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 O(|V|²) O(|V|) O(|V|²) O(1)

  43. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 newD(a’) = D(a’) + 2*w(a’, a) - 2*w(a’, b) O(1) O(|V|²) (i+1)-th loop: |V|-2i Unmarked Nodes O(|V|) O(|V|²) O(1) O(|V|)

  44. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm} • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 O(|V|²) |V|/2 pairs to be found O(|V|) O(|V|³) O(|V|²) O(1) O(|V|)

  45. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B g(1) g(1) + g(2) … … g(1) + g(2) + … + g(m) + … + g(|V|/2) • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm } • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 O(|V|²) O(|V|) g(1) + g(2) + … + g(m) → G O(|V|³) O(|V|) O(|V|²) O(1) O(|V|) O(|V|)

  46. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm } • Update newB = B - { b1, …, bm } ∪ { a1, …, am} • Update T = T – G • endif • Until Gain <= 0 O(|V|²) O(|V|) O(|V|³) O(|V|²) O(1) O(|V|) O(|V|) O(|V|) O(|V|) A

  47. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm } • Update newB = B - { b1, …, bm } ∪ { a1, …, am } • Update T = T – G • endif • Until Gain <= 0 O(|V|²) O(|V|) O(|V|³) O(|V|²) O(1) O(|V|) O(|V|) O(|V|) O(|V|) O(|V|) O(1)

  48. K/L Algorithm: Analysis O(|V|²) Compute T = Cost(A,B) for Initial A, B • Repeat • Compute costs D(v) for all v in V • Unmark all vertices in V • While there are unmarked nodes • Find an unmarked pair (a, b) with maximal g(a, b) • Mark ‘a’ and ‘b’ (but do not swap them) • Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped • Endwhile • Pick m maximizing • If Gain > 0 then • Update newA = A - { a1, …, am } ∪ { b1, …, bm } • Update newB = B - { b1, …, bm } ∪ { a1, …, am } • Update T = T – G • endif • Until Gain <= 0 (p iterations) O(p |V|³) O(|V|²) O(|V|) O(|V|³) Empirical testing by Kernighan and Lin on small graphs (|V|<=360) showed convergence after 2 to 4 passes O(|V|²) O(1) O(|V|) O(|V|) How many Iterations? O(|V|) O(|V|) O(|V|) O(1)

  49. K-means Clustering by Agus Pratondo

  50. Graph in Rn 1 2 5 a 2 1 3 b y 1 e x d c

More Related