1 / 31

Large Graph Algorithms

Large Graph Algorithms. Christos Faloutsos CMU. Akoglu, Leman Chau, Polo Kang, U. McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis. Graphs - why should we care?. Internet Map [lumeta.com]. Food Web [Martinez ’91]. Protein Interactions [genomebiology.com].

peigi
Télécharger la présentation

Large Graph Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Graph Algorithms Christos Faloutsos CMU Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis C. Faloutsos (CMU)

  2. Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] C. Faloutsos

  3. T1 D1 ... ... DN TM Graphs - why should we care? • IR: bi-partite graphs (doc-terms) • web: hyper-text graph • Social networking sites (Facebook, twitter) • Users posing and answering questions • Click-streams (user – page bipartite graph) • ... and more – any M:N db relationship C. Faloutsos

  4. Our goal: One-stop solution for mining huge graphs: PEGASUS project (PEta GrAph mining System) • www.cs.cmu.edu/~pegasus • Open-source code and papers C. Faloutsos (CMU)

  5. Outline – Algorithms & results C. Faloutsos (CMU)

  6. HADI for diameter estimation • Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10 • Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B) • Our HADI: linear on E (~10B) • Near-linear scalability wrt # machines • Several optimizations -> 5x faster C. Faloutsos (CMU)

  7. ???? Count ?? 19+? [Barabasi+] Radius • YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) • Largest publicly available graph ever studied. C. Faloutsos (CMU)

  8. YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) • effective diameter: surprisingly small. • Multi-modality: probably mixture of cores . C. Faloutsos (CMU)

  9. YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) • effective diameter: surprisingly small. • Multi-modality: probably mixture of cores . C. Faloutsos (CMU)

  10. Radius Plot of GCC of YahooWeb. C. Faloutsos (CMU)

  11. Running time - Kronecker and Erdos-Renyi Graphs with billions edges. C. Faloutsos (CMU)

  12. Outline – Algorithms & results C. Faloutsos (CMU)

  13. Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). C. Faloutsos (CMU)

  14. Generalized Iterated Matrix Vector Multiplication (GIMV) • PageRank • proximity (RWR) • Diameter • Connected components • (eigenvectors, • Belief Prop. • … ) Matrix – vector Multiplication (iterated) C. Faloutsos (CMU)

  15. Example: GIM-V At Work • Connected Components Count Size C. Faloutsos (CMU)

  16. Example: GIM-V At Work • Connected Components Count 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? Size C. Faloutsos (CMU)

  17. Example: GIM-V At Work • Connected Components Count suspicious financial-advice sites (not existing now) Size C. Faloutsos (CMU)

  18. Outline – Algorithms & results C. Faloutsos (CMU)

  19. Triangles Real social networks have a lot of triangles C. Faloutsos

  20. Triangles Real social networks have a lot of triangles Friends of friends are friends Q1: how to compute quickly? Q2: Any patterns? C. Faloutsos

  21. Triangles : Computations [Tsourakakis ICDM 2008] Q: Can we do that quickly? Triangles are expensive to compute (3-way join; several approx. algos) C. Faloutsos

  22. Triangles : Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( li3 ) (and, because of skewness, we only need the top few eigenvalues! C. Faloutsos

  23. Triangles : Computations [Tsourakakis ICDM 2008] 1000x+ speed-up, high accuracy C. Faloutsos

  24. Triangles • Easy to implement on hadoop: it only needs eigenvalues (working on it, using Lanczos) C. Faloutsos (CMU)

  25. Triangles Real social networks have a lot of triangles Friends of friends are friends Q1: how to compute quickly? Q2: Any patterns? C. Faloutsos

  26. Triangle Law: #1 [Tsourakakis ICDM 2008] HEP-TH ASN X-axis: # of Triangles a node participates in Y-axis: count of such nodes Epinions C. Faloutsos

  27. Triangle Law: #2 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles Notice: slope ~ degree exponent (insets) Epinions C. Faloutsos

  28. Outline – Algorithms & results C. Faloutsos (CMU)

  29. Visualization: ShiftR • Supporting Ad Hoc Sensemaking: Integrating Cognitive, HCI, and Data Mining ApproachesAniket Kittur, Duen Horng (‘Polo’) Chau, Christos Faloutsos, Jason I. HongSensemaking Workshop at CHI 2009, April 4-5. Boston, MA, USA. C. Faloutsos (CMU)

  30. Conclusions One-stop shopping for large graph mining: • www.cs.cmu.edu/~pegasus Akoglu, Leman Tsourakakis, Babis Kang, U Chau, Polo McGlohon, Mary THANKS: NSF, Yahoo (M45), LLNL C. Faloutsos (CMU)

More Related