1 / 47

The Very Small World of the Well-Connected

The Very Small World of the Well-Connected. Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert. Outline. VIGS: Vertex-Importance Graph Synopsis Testing VIGS with different datasets and importance measures Analytical expectations Making guarantees about VIGS

Télécharger la présentation

The Very Small World of the Well-Connected

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Very Small Worldof theWell-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert

  2. Outline • VIGS: Vertex-Importance Graph Synopsis • Testing VIGS with different datasets and importance measures • Analytical expectations • Making guarantees about VIGS • Connectedness: KeepOne, KeepAll • Related Work • Graph Sampling, Rich Club, K-cores, Web Measure

  3. Network or Hairball? • Huge networks difficult to study, store, share.. • Can we shrink or summarize a network? • Starting point: important vertices • Vertex-Importance Graph Synopsis

  4. Vertex-Importance Graph Synopsis • Create subgraph of important vertices • Study both key nodes and entire graph • Which vertices are important? • High-traffic routers? The most quoted blog? • Standard, well-defined measures • Degree, Betweenness, Closeness, PageRank

  5. VIGS In Action • Starting point: random graph with 100 vertices • Select an importance measure - Degree • pick 9 highest degree vertices • keep only edges between these 9 vertices average degree = 4 average degree = 0.9

  6. Motivating example: citations among ACM papers 500 random papers 500 most cited papers

  7. Datasets Erdos-Renyi random graph and three real networks • BuddyZoo - collection of buddy lists • TREC - links between blogs • Web - an older web crawl from PARC

  8. Importance measures • degree (number of connections) denoted by size • betweenness (number of shortest paths a vertex lies on) denoted by color

  9. Importance measures • degree (number of connections) denoted by size • closeness (length of shortest path to all others) denoted by color

  10. Correlation among measures • High correlation between different importance measurements • Undirected graphs - higher correlation • Closeness has lowest correlation in all datasets

  11. Correlation among measures • High correlation between different importance measurements • Undirected graphs – higher orrelation • Closeness has lowest correlation in all datasets

  12. Assortativity • In an assortative graph, high-value nodes tend to connect to other high-value nodes • Example: degree assortative disassortative

  13. Assortativity - Degree • ER: Neutral • BZ: Assortative • TREC and Web: • Disassortative

  14. Assortativity

  15. Degree distributions

  16. Subgraphs • Apply VIGS! Select Degree, top 100 nodes • Example: degree • Substantial difference between datasets!

  17. Subgraphs • The selection of an importance measure may have an impact, even in the same dataset

  18. Connectivity: size of largest component Proportion of nodes that are connected either directly or indirectly

  19. Subgraph Connectivity - ER • Highly connected, even with only a few vertices • All importance measures almost completely connected by 2000 nodes • Better performance than random

  20. Subgraph Connectivity

  21. subgraphs: density • What is the proportion of edges to nodes in the original graphs vs. subgraphs? average degree = 4 average degree = 0.9

  22. Subgraph Density - ER • Black line slope = Edges/Vertices in entire network • Lower dotted line = subgraph of random vertices • VIGS subgraphs: lower than total density, higher than random subgraph density

  23. Subgraph Density

  24. Average Shortest Path‘ASP’

  25. Subgraph Average Shortest Path‘ASP’ for Erdos Renyi whole network ASP ASP between IV’s in subgraph. ASP between IV’s in whole graph ER ASP shorter between IV’s, but higher in subgraph

  26. Subgraph ASP’s

  27. Relative Rank of Vertices in Subgraph - ER • Do IV’s maintain their relative rank in subgraphs? • IV and edges only • ER - little correlation, steadily increasing until all vertices are included

  28. Relative Rank in Subgraph

  29. TREC anomaly - closeness

  30. Four Regions • Four regions, highlighted in density plot: Closeness only, Regions highlighted Original

  31. Cause: Blog Aggregator • One node has connections to 99% of the nodes between 1 and 7961! (regions 1, 2, 3) • This same node has only 1 connection to a node beyond 7961 (region 4) • Nodes between 5828 and 7961 (region 3) have only 1 connection: to the aggregator • Spam blogs? New blogs? Private blogs?

  32. Examining Density • The first 3 regions feature nodes connected to the aggregator • R1: well connected blogs • Average increase in total edges per node added: 12.93 • R2: far less connected, but not quite barren • Average increase per node: 3.2 • R3: isolated spam/new blogs • 1 edge per node increase

  33. Examining Density • R4: well connected, but not linked to aggregator • Average increase even higher than region 1: 17.8 • Aggregator inflated the closeness scores of connected nodes (R1, 2, 3) above those in region 4

  34. Examining Avg Shortest Paths (ASP) • R1: ASP slightly below 2 • Some nodes directly connected, 99%+ within 2 hops via aggregator • R2 and 3: ASP levels at ~2 • Fewer and fewer direct links, but all accessible via aggregator • R4: ASP’s begin to increase • ASP doesn’t explode: ~70% of R4 links are to R1 or R2 nodes • R3 only reachable from R4 via agr. • Access to aggregator through connected R1/R2 nodes: adds a hop to path

  35. Examining Relative Ranking Correlation • R1-3: correlation steadily decreases • R4: rapid increase in correlation! • Spam blogs importance in subgraph initially inflated • Realigns when blogs in 4 connect with real blogs in 1-2

  36. Localized to closeness • Region 1, 2 and 3 nodes have high closeness thanks to the aggregator • Recall ASP graph - short distance to many, many nodes via aggr. • Connection to aggregator doesn’t confer high degree, PageRank or Betweenness - nodes must ‘fend for themselves’ • Degree: link to aggr. Is just 1 link. • PR: aggr. ‘vote’ diluted by high degree • Bet: Aggr. Is gateway to its children, could use any child to reach aggr.

  37. Empirical AnalysisSummary • VIGS results vary by graph and importance measure • Still, subgraphs tended towards • High connectivity • Average or higher density • Shorter ASP’s • Maintain relative importance rank of vertices • “spam” affects closeness primarily

  38. Preserving Properties Preserving Properties • So far, just studying subgraphs • Applying VIGS - may need guarantees • Hard to make a guarantee? • Example property: subgraph is connected

  39. Preserving Properties • Is it difficult to guarantee the connectedness of a VIGS subgraph? • NP-complete: reducible to Steiner Minimum Spanning Tree (MST) problem • Resort to heuristics • KeepOne, KeepAll from Gilbert and Levchenko (2004)

  40. KeepOne and KeepAll • KeepOne - build an MST: drop as many vertices/edges as possible while maintaining connectivity. • Problem! ASP/diameter could increase • Solution: KeepAll - MST, but add all vertices/edges on a shortest path

  41. Heuristic Performance - ER ASP • KO - did not have to add many vertices, but shortest path rather large (ER ASP was 4.26) • KA - good improvement in path length, but huge increase in vertices

  42. Heuristic Performance - BZ ASP • Similar performance to ER - KO results in significantly longer shortest paths, but KA adds many vertices • Is 4000 too many vertices to add? Small compared to total graph, but huge compared to number of important vertices

  43. Heuristic Performance - TREC ASP • Almost completely connected from the start • KA adds only a few vertices, doesn’t change much • Results for Web dataset similar

  44. Related Work • Graph sampling - Similar objective: synopsis • Concerned only with original graph • Random sampling, snowball sampling… • Lee, Kim, Jeong (2006), • Leskovec, Faloutsos (2006), • Li, Church, Hastie (2006) • Rich-club • Concerned only with high degree nodes • Zhou, Mondragon (2004), • Colizza, Flammini, Serrano, Vespignani (2006)

  45. Related Work • K-cores • Subgraphs where each vertex has at least k-connections within the subgraph • Dorogovstev, Goltsev, Mendes (2006) • Core connectivity • Smallest number of important vertices to remove before destroying largest component • Mislove, Marcon, Gummadi, Druschel, Bhattacharjee (2007)

  46. VIGS wrap up • vertex-importance graph synopsis • create a subgraph of important vertices to study both the full graph and these vertices in particular • properties of VIGS depend on entire network and importance measure • real world networks have dense, closely knit VIGS • in some cases easy to meet connectivity & ASP guarantees

  47. Thanks to • Xiaolin Shi • Matthew Bonner • Lada Adamic NSF DMS 0547744

More Related