1 / 51

Fast counting of triangles in large networks without counting: Algorithms and laws

Fast counting of triangles in large networks without counting: Algorithms and laws. Charalampos E. Tsourakakis School of Computer Science Carnegie mellon university. Triangle related problems.

doctor
Télécharger la présentation

Fast counting of triangles in large networks without counting: Algorithms and laws

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast counting of triangles in large networks without counting:Algorithms and laws Charalampos E. Tsourakakis School of Computer Science Carnegie mellon university ICDM, Dec. '08

  2. Triangle related problems • Given an undirected, simple graph G(V,E) a triangle is a set of three vertices such that any two of them are connected by an edge of the graph. • Related problems • Decide if a graph is triangle-free. • Count the total number of triangles Δ(G). • Count the number of triangles Δ(v) that vertex v participates in. • List the triangles that each vertex v participates in. Our focus Generality ICDM, Dec. '08

  3. Why is Triangle Counting important?From the Graph Mining Perspective • Clustering coefficient • Transitivity ratio Social Network Analysis fact: “Friends of friends are friends” [WF94] Other applications include: • Hidden Thematic Structure of the Web [EM02] • Motif Detection e.g. biological networks [YPSB05] • Web Spam Detection [BPCG08] A C B ICDM, Dec. '08

  4. Outline • Related Work • Proposed Method • Theorems • Algorithms • Explaining efficiency • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Conclusions ICDM, Dec. '08

  5. Related Work Densegraphs S p a r s e g r a p h s ICDM, Dec. '08

  6. Outline • Related Work • Proposed Method • Theorems • Algorithms • Explaining efficiency • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Conclusions ICDM, Dec. '08

  7. Theorem [EigenTriangle] Theorem 1 Δ(G) = # triangles in graph G(V,E) = eigenvalues of adjacency matrix AG ICDM, Dec. '08

  8. Theorem [EigenTriangleLocal] Theorem 2 Δ(i) = #Δs vertex i participates at. = i-th eigenvector = j-th entry of i Δ(i) = 2 ICDM, Dec. '08

  9. Outline • Related Work • Proposed Method • Theorems • Algorithms • Explaining efficiency • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Conclusions ICDM, Dec. '08

  10. EigenTriangle Algorithm (interactively) Compute the k-th eigenvalue. Is much smaller than ? Is the cube of the second one significantly smaller than the cube of the first? Algorithm terminates! The estimated # of Δs is the sum of cubes of λi’s divided by 6! Use Lanczos to compute the first two eigenvalues please! YES! I want to compute the number of triangles! • NO Iterate then! After some iterations…(hopefully few!) ICDM, Dec. '08

  11. EigenTriangle Algorithm ICDM, Dec. '08

  12. EigenTriangleLocal Algorithm Why are these two algorithms efficient on power law networks? ICDM, Dec. '08

  13. Typical Spectra of Power Law Networks Political blogs Airports ICDM, Dec. '08

  14. 1st Reason :Top Eigenvalues of Power-Law Graphs • Very important for us because: • Few eigenvalues contribute a lot! • Cubes amplify this even more! • Lanczos converges fast due to large spectral gaps [GL89]! ICDM, Dec. '08

  15. 1st Reason :Top Eigenvalues of Power-Law Graphs • One of the first to observe that the top eigenvalues follow a power-law were Faloutsos, Faloutsos and Faloutsos [FFF99]. • Some years later Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] gave an explanation of this fact. ICDM, Dec. '08

  16. 2nd Reason :Bulk of eigenvalues • Almost symmetric around 0! • Sum of cubes almost cancels out! Omit! Political Blogs Keep only 3! 3 ICDM, Dec. '08

  17. Outline • Related Work • Proposed Method • Theorems • Algorithms • Explaining efficiency • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Conclusions ICDM, Dec. '08

  18. Datasets ICDM, Dec. '08

  19. Datasets Social Networks ICDM, Dec. '08

  20. Datasets Social Networks Co-authorship network ICDM, Dec. '08

  21. Datasets Social Networks Co-authorship network Information Networks ICDM, Dec. '08

  22. Datasets Social Networks Co-authorship network Information Networks Web Graphs ICDM, Dec. '08

  23. Datasets Social Networks Co-authorship network Information Networks Web Graphs Internet Graphs ICDM, Dec. '08

  24. Datasets ~3.15M nodes ~37M edges ICDM, Dec. '08

  25. Competitor: Node Iterator • Node Iterator algorithm For each node, look at its neighbors, then check how many edges among them. • Complexity: O( ) • We report the results as the speedup vs. Node Iterator. ICDM, Dec. '08

  26. Results: #Eigenvalues vs. Speedup ICDM, Dec. '08

  27. Results: #Edges vs. Speedup Observe the trend ICDM, Dec. '08

  28. Some interesting observations • 6.2 typical rank for at least 95% • Speedups are between 33.7x and 1159x. • The mean speedup is 250. • Notice the increasing speedup as the size of the network grows. ICDM, Dec. '08

  29. Evaluating the Local Counting Method Triangles node i participates according to our estimation Triangles node i participates ICDM, Dec. '08

  30. #Eigenvalues vs. ρ for three networks 2-3 eigenvalues almost ideal results! ICDM, Dec. '08

  31. Outline • Related Work • Proposed Method • Theorems • Algorithms • Explaining efficiency • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Conclusions ICDM, Dec. '08

  32. Triangle Participation Power Law (TPPL) EPINIONS Count of nodes participating in δ triangles δ = #Triangles ICDM, Dec. '08

  33. Triangle Participation Power Law (TPPL) HEP_TH (coauthorship) Flickr ICDM, Dec. '08

  34. Degree Triangle Power Law (DTPL) EPINIONS Mean #Δs over all nodes with degree d d , all degrees appearing in the graph ICDM, Dec. '08

  35. Degree Triangle Power Law (DTPL) Flickr Reuters ICDM, Dec. '08

  36. Observations on TPPL & DTPL • TTPL: Many nodes few triangles Few nodes many triangles ICDM, Dec. '08

  37. Observations on TPPL & DTPL • DTPL: • Power law fits nicely to the Degree-Triangle plot. • Slope is the opposite of the slope of the degree distribution (slope complementarity). ICDM, Dec. '08

  38. Outline • Related Work • Proposed Method • Theorems • Algorithms • Explaining efficiency • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Conclusions ICDM, Dec. '08

  39. Kronecker graphs • Kronecker graphs is a model for generating graphs that mimic properties of real-world networks. The basic operation is the Kronecker product([LCKF05]). Adjacency matrix A[0] KroneckerProduct Repeat k times Adjacency matrix A[1] Adjacency matrix A[2] Adjacency matrix A[k] Initiator graph ICDM, Dec. '08

  40. Triangles in Kronecker Graphs • Theorem[KroneckerTRC ] Let B = A[k] k-th Kronecker product and Δ(GA), Δ(GΒ) the total number of triangles in GA , GΒ. Then, the following equality holds: ICDM, Dec. '08

  41. Outline • Related Work • Proposed Method • Theorems • Algorithms • Explaining efficiency • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Conclusions ICDM, Dec. '08

  42. Conclusions • Triangles can be approximated with high accuracy in power law networks by taking a few, constant number of eigenvalues. • The method is easily parallelizable (matrix-vector multiplications only) and converges fast due to large spectral gaps. • New triangle-related power laws • Closed formula for triangles in Kronecker graphs. ICDM, Dec. '08

  43. Future Work • Import in HADOOP • On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research. PEGASUS (Peta-Graph Mining) ICDM, Dec. '08

  44. Acknowledgements • Christos Faloutsos • Ioannis Koutis For the helpful discussions ICDM, Dec. '08

  45. Acknowledgements • Maria Tsiarli For the PEGASUS logo ICDM, Dec. '08

  46. ICDM, Dec. '08

  47. References • [WF94] Wasserman, Faust: “Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences)” • [EM02] Eckmann, Moses: “Curvature of co-links uncovers hidden thematic layers in the World Wide Web” • [YPSB05] Ye, Peyser, Spencer, Bader: “Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast” ICDM, Dec. '08

  48. References • [BPCG08] Becchetti, Boldi, Castillo, Gionis Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs • [LCKF05] Leskovec, Chakrabarti, Kleinberg, Faloutsos: “Realistic, Mathematically Tractable Graph Generation and Evolution using Kronecker Multiplication” • [FFF09] Faloutsos, Faloutsos, Faloutsos: “On power-law relationships of the Internet topology” ICDM, Dec. '08

  49. References • [MP02] Mihail, Papadimitriou: “On the Eigenvalue Power Law” • [CLV03] Chung, Lu, Vu: “Spectra of Random Graphs with given expected degrees” • [GL89] Golub, Van Loan: “Matrix Computations” ICDM, Dec. '08

  50. References • For more references, paper and slides: http://www.cs.cmu.edu/~ctsourak ICDM, Dec. '08

More Related