1 / 102

Large Graph Mining

Large Graph Mining. Christos Faloutsos CMU. Thank you!. Dr. Yan Liu Dr. Jimeng Sun. Outline. Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Generators Problem#3: Scalability Conclusions. Graphs - why should we care?. Internet Map [lumeta.com].

overton
Télécharger la présentation

Large Graph Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Graph Mining Christos Faloutsos CMU

  2. Thank you! • Dr. Yan Liu • Dr. Jimeng Sun C. Faloutsos

  3. Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Problem#2: Generators • Problem#3: Scalability • Conclusions C. Faloutsos

  4. Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] C. Faloutsos

  5. T1 D1 ... ... DN TM Graphs - why should we care? • IR: bi-partite graphs (doc-terms) • web: hyper-text graph • ... and more: C. Faloutsos

  6. Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • .... C. Faloutsos

  7. Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • Weighted graphs • Time evolving graphs • Problem#2: Generators • Problem#3: Scalability • Conclusions C. Faloutsos

  8. Problem #1 - network and graph mining • How does the Internet look like? • How does the web look like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? C. Faloutsos

  9. Graph mining • Are real graphs random? C. Faloutsos

  10. Laws and patterns • Are real graphs random? • A: NO!! • Diameter • in- and out- degree distributions • other (surprising) patterns • So, let’s look at the data • (‘it is amazing what you hear, when you listen’) C. Faloutsos

  11. -0.82 Solution# S.1 • Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) C. Faloutsos

  12. Solution# S.2: Eigen Exponent E • A2: power law in the eigenvalues of the adjacency matrix Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue C. Faloutsos

  13. Solution# S.2: Eigen Exponent E • [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue C. Faloutsos

  14. But: How about graphs from other domains? C. Faloutsos

  15. users sites More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic log(count) Zipf ``ebay’’ log(in-degree) C. Faloutsos

  16. epinions.com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000-people user (out) degree C. Faloutsos

  17. Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • degree, diameter, eigen*, • triangles • cliques • Weighted graphs • Time evolving graphs • Problem#2: Generators C. Faloutsos

  18. Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles C. Faloutsos

  19. Triangle ‘Laws’ Real social networks have a lot of triangles Friends of friends are friends Any patterns? C. Faloutsos

  20. Triangle Law: #S.3 [Tsourakakis ICDM 2008] HEP-TH ASN X-axis: # of Triangles a node participates in Y-axis: count of such nodes Epinions C. Faloutsos

  21. Triangle Law: #S.4 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles Notice: slope ~ degree exponent (insets) Epinions C. Faloutsos

  22. Triangle Law: Computations [Tsourakakis ICDM 2008] details But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? C. Faloutsos

  23. Triangle Law: Computations [Tsourakakis ICDM 2008] details But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( li3 ) (and, because of skewness, we only need the top few eigenvalues! C. Faloutsos

  24. Triangle Law: Computations [Tsourakakis ICDM 2008] details 1000x+ speed-up, high accuracy C. Faloutsos

  25. Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • degree, diameter, eigen*, • triangles • cliques • Weighted graphs • Time evolving graphs • Problem#2: Generators C. Faloutsos

  26. Large Human Communication NetworksPatterns and a Utility-Driven Generator Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu KDD 2009

  27. 0 2 4 1 3 Cliques • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. C. Faloutsos

  28. 0 2 4 1 3 Clique • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. C. Faloutsos

  29. 0 2 4 1 3 Clique • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. C. Faloutsos

  30. 0 2 4 1 3 Clique • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. • {0,1,2}, {0,1,3}, {1,2,3}{2,3,4}, {0,1,2,3} are cliques; • {0,1,2,3} and {2,3,4} are the maximal cliques. C. Faloutsos

  31. S.5: Clique-Degree Power-Law • Power law: degree of node i # maximal cliques of node i Dataset: who-calls-whom anonymized, Over several time units More friends, even more social circles ! C. Faloutsos

  32. S.5 Clique-Degree Power-Law • Outlier Detection C. Faloutsos

  33. 1.5 Clique-Degree Power-Law • Outlier Detection C. Faloutsos

  34. Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • degree, diameter, eigen*, • triangles • cliques • Weighted graphs • Time evolving graphs • Problem#2: Generators C. Faloutsos

  35. Observation W.1 Question : Nodes in a triangle are topologically equivalent. Will they also give equal number of phone calls to each other ? Max Weight Min Weight Mid Weight C. Faloutsos

  36. Observation W.1:Triangle Weight Law Periods S1 – S3 C. Faloutsos

  37. Other observations on weighted graphs? • A: yes - even more ‘laws’! M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 C. Faloutsos

  38. Observation W.2: fortification Q: How do the weights of nodes relate to degree? C. Faloutsos

  39. Observation W.2: fortification:Snapshot Power Law weight of a node is super-linear on the in-degree with PL exponent ‘iw’: i.e. 1.01 < iw < 1.26, super-linear Orgs-Candidates More donors, even more $ e.g. John Kerry, $10M received, from 1K donors In-weights ($) $10 $5 Edges (# donors) C. Faloutsos

  40. Are there deviations from P.L.? • A: yes – but they also correspond to very skewed distributions (‘black/gray swans’) C. Faloutsos

  41. “Mobile Call Graphs: Beyond Power Law and Lognormal Distributions” K D D ’ 0 8 Mukund Seshadri , Sridhar Machiraju, Ashwin Sridharan, Jean Bolot Christos Faloutsos, Jure Leskovec

  42. + Observed Data .... LogNormal Fit Dataset • Who-calls-whom (anonymized) • Degree distribution, in green .... Observed Data --- Power Law Fit Area S1 Time period T1 1 month Count Degree of Mobile-phone call graph C. Faloutsos

  43. + Observed Data .... LogNormal Fit A poor fit? • Power Laws: p(x) ~ x^(-a) • Lognormal: log(x) is Normal .... Observed Data --- Power Law Fit Area S1 Time period T1 1 month Count Degree of Mobile-phone call graph C. Faloutsos

  44. Solution: DPLN • Double Pareto Log Normal (Reed, 2003) • 4 parameters: [α,β,ν,τ] • Linear head and tail. Area S1, Time Period T1 count “Rich get Richer” BUT Population lifetimes NOT identical β α C. Faloutsos Degree

  45. Datasets • Anonymized monthly aggregates of call metrics per user • Collected at 4 coverage areas S1,S2,S3,S4 • Collected for two month-long periods T1 and T2, separated by 6 months • Total coverage: • >1 million users • >10 million calls • ~7000 square miles. C. Faloutsos

  46. Metrics • Degree (No. of Call Partners) • Calls • Talk Time Total over a month, per user Area S1 Time-period T1 Degree Distribution C. Faloutsos

  47. Persistence –Across Metric, Time, Space Calls Area S1, Time T1 Partners S1, T2 Partners S2, T2 C. Faloutsos

  48. Applications • Outlier detection • Many un-answered calls => multiples of 27 sec! Per-user distribution of total talk time (@ T1, S1) C. Faloutsos

  49. Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • Weighted graphs • Time evolving graphs • Problem#2: Generators • … C. Faloutsos

  50. Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) C. Faloutsos

More Related