1 / 39

Fast Discovery of Connection Subgraphs

Fast Discovery of Connection Subgraphs. Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins (IBM). Outline. Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions. Introduction. What are the best path s between ‘Kidman’ and ‘Diaz’?. Diaz.

Télécharger la présentation

Fast Discovery of Connection Subgraphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins (IBM) Faloutsos, McCurley & Tomkins

  2. Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions Faloutsos, McCurley & Tomkins

  3. Introduction What are the best paths between ‘Kidman’ and ‘Diaz’? Diaz Kidman Faloutsos, McCurley & Tomkins

  4. s t f Problemdefinition • Given a graph, and two nodes s and t, and a 'budget' b of nodes • Find the best b nodes that capture the relationship between s and t Faloutsos, McCurley & Tomkins

  5. s t f Problemdefinition • Given a graph, and two nodes s and t, and a 'budget' b of nodes • Find the best b nodes that capture the relationship between s and t Faloutsos, McCurley & Tomkins

  6. s t f Problem definition • Part 1: How to quantify the goodness? • Part 2: How to pick ‘best few’ nodes? • Part 3: Scalability: large graphs (10**7 nodes) Faloutsos, McCurley & Tomkins

  7. Survey Graph Partitioning [Karypis+Kumar]; [Newman+]; [Virtanen]; … Communities [Flake+]; [Tomkins, Kleinberg+] External distances [Palmer+] Faloutsos, McCurley & Tomkins

  8. Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions Faloutsos, McCurley & Tomkins

  9. part 1: measuring goodness: electricity part 2: finding good paths dynamic programming part 3: scalability heuristics Proposed method Faloutsos, McCurley & Tomkins

  10. Electricity Why not shortest path? s t f Faloutsos, McCurley & Tomkins

  11. Electricity Why not shortest path? Why not net. flow? s t f Faloutsos, McCurley & Tomkins

  12. Electricity Why not shortest path? Why not net. flow? Why not plain ‘voltages’? s t f +1V 0V Faloutsos, McCurley & Tomkins

  13. Electricity Why not shortest path? Why not net. flow? Why not plain ‘voltages’? s t f +1V 0V +0.5V Faloutsos, McCurley & Tomkins

  14. Electricity, cont’d Proposed method: voltages with universal sink: ~ ‘tax collector’ goodness of a path: its electric current(*)! s t f ... +1V 0V 0V Faloutsos, McCurley & Tomkins

  15. Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions Faloutsos, McCurley & Tomkins

  16. Electricity – Algorithm Voltages/Amperages can be computed easily ( O(E) ) without universal sink: v(i) = Σumj [v(j) * C(i,j) / C(i,*) ] i != source, sink v(source)=1; v(sink)=0 Faloutsos, McCurley & Tomkins

  17. Electricity – Algorithm With universal sink: v(i) = 1/(1+a)Σumj [v(j) * C(i,j) / C(i,*) ] (~ insensitive to a (=1)) Faloutsos, McCurley & Tomkins

  18. Given the voltages and amperages Which b nodes to keep? (and how to spot them quickly?) Part 2: DisplayGen Faloutsos, McCurley & Tomkins

  19. Part2:DisplayGen Faloutsos, McCurley & Tomkins

  20. Part 2: DisplayGen ‘delivered current’ of a path: ~ ‘how many electrons’ choose this path =4/5 *1/2A Faloutsos, McCurley & Tomkins

  21. Part 2: DisplayGen • find subgraph that max’s delivered current • Incrementally, add nodes with max marginal delivered current Faloutsos, McCurley & Tomkins

  22. Part 3: Scalability ‘CandidateGen’ Starting from the large graph Eliminate nodes that are too far away to matter How? Faloutsos, McCurley & Tomkins

  23. Part 3: Scalability By successive, careful expansions sink t source s Faloutsos, McCurley & Tomkins

  24. Part 3: Scalability t s Faloutsos, McCurley & Tomkins

  25. Part 3: Scalability t s Faloutsos, McCurley & Tomkins

  26. Part 3: Scalability t s Faloutsos, McCurley & Tomkins

  27. Pseudo-code Until (stoppingCriterion) use pickHeuristic() to pick a node n expand node n Faloutsos, McCurley & Tomkins

  28. Pseudo-code pickHeuristic() favors • Nearby nodes with • Strong connections to source or sink and with • Small degree Faloutsos, McCurley & Tomkins

  29. Outline Introduction / Motivation Survey Proposed Method Algorithms Experiments Conclusions Faloutsos, McCurley & Tomkins

  30. Experiments on large real graph ~15M nodes, ~100M edges, weighted ‘who co-appears with whom’ (from 500M web pages) Q1: Quality of ‘voltage’ approach? Q2: Speed/accuracy trade-off? Faloutsos, McCurley & Tomkins

  31. Q1: Quality Actors (A); Computer-Scientists (CS) Kidman-Diaz (A-A) Negreponte-Palmisano (CS-CS) Turing-Stone (CS-A) Faloutsos, McCurley & Tomkins

  32. (A-A) Kidman-Diaz What are the best paths between ‘Kidman’ and ‘Diaz’? Diaz Kidman Strong, direct link Faloutsos, McCurley & Tomkins

  33. CS-CS: Negreponte - Palmisano SP NN • Mainly: CEOs of major Computer companies • (Dell, Gates, Fiorina, ++) Faloutsos, McCurley & Tomkins

  34. CS-CS: Negreponte - Palmisano SP NN Esther Dyson LouisGerstner Faloutsos, McCurley & Tomkins

  35. CS-A: Turing - Stone Anderson Turing Stone Faloutsos, McCurley & Tomkins

  36. Outline Introduction / Motivation ... Experiments Q1: quality Q2: speed/accuracy trade-off Conclusions Faloutsos, McCurley & Tomkins

  37. Speed/Accuracy Trade-off delivered current Kleinberg-Newell Rivest-Hoffman Turing-Stone Kidman-Diaz number of nodes kept (‘b’) Faloutsos, McCurley & Tomkins

  38. Speed/accuracy trade-off • 80/20-like rule: • the first few nodes/paths contribute the vast majority of ‘delivered current’ • Thus: CandidateGen makes sense Faloutsos, McCurley & Tomkins

  39. Conclusions Defined the problem Part 1: Electricity-based method to measure quality Part 2: Dynamic programming to spot best paths (‘DisplayGen’) Part 3: Scalability with good accuracy (‘CandidateGen’) Operational system Faloutsos, McCurley & Tomkins

More Related