1 / 29

Challenges and Opportunities Posed by Power Laws in Network Analysis

Challenges and Opportunities Posed by Power Laws in Network Analysis. Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011. Power Laws in Networks. Network topology: power law distribution of node degrees AS topology, social networks (Facebook, etc)

stan
Télécharger la présentation

Challenges and Opportunities Posed by Power Laws in Network Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26th Oct 2011

  2. Power Laws in Networks • Network topology: • power law distribution of node degrees • AS topology, social networks (Facebook, etc) • Network traffic: • Flow: subset of packets • Power law distribution of flow sizes Flickr dataset P[deg > d ] packet stream router vertex degree - d

  3. Characterizing Networks from Incomplete Data This talk • Estimate distributions (of degrees, of flow sizes, …) from incomplete data (sampled edges, sampled packets, …) • Uncover central nodes in the network

  4. Outline • Challenge: Estimating subset size distributions from incomplete data • Incomplete data: • randomly sampled edges, randomly sampled packets, … • Impact of power laws on estimation accuracy • Impact of other distributions on estimation accuracy • Opportunity: Uncovering central nodes in power law networks

  5. Part 1: Challenge Estimating SUBSET SIZE distributions from incomplete data

  6. Subset size distributions types of fish (subsets) Number of fishes (subset size) Set of fishes fraction of subsets (types of fish) with size x distribution x - subset size (number of fishes)

  7. Estimating subset size distributions sampled fishes randomly sample N fishes (uniformly) Set of fishes fraction of subsets (types of fish) with size x distribution unbiased estimate x - subset size (number of fishes)

  8. Questions How many fishes need to catch toobtain accurate distribution estimates? What is impact of distribution shapeon estimation accuracy?

  9. Incomplete Data Estimation flow set of IP packets random sampling Sampled packets IP flow size distribution estimation

  10. Network-related subset sizedistributions (webgraph) • Distribution of # incoming links to a webpage • Q: do we need to crawl most of web graph? • Incoming links observed as outgoing links from other webpages • set = set of links • subset = incoming links to a webpage • sampling: link sampling outgoing links ? in-degree: # of links to webpage

  11. Network-related subset sizedistributions (IP traffic) • Distribution number of packets in a TCP flow • Set = IP packets • Subset = a IP flow • Sampling: packet sampling packet stream router

  12. Incomplete Data, Edge Sampling Example Original graph Original In-Degree Distribution Sampled in-degrees 3x Estimator

  13. Incomplete data model • Set elements sampled with probability p • without replacement • independently • Model • : probability that j outof i subset elements are sampled • i : fraction of subsets with i elements • e.g.: fraction of nodes with degree i, fraction of flows with i packets

  14. Model (cont) • bij– binomial(i,j) • i : fraction of subsets with i elements • W : maximum subset size • : fraction of subsets with j sampled elements • d0 is not observable

  15. Mean Squared Error Question • i: unbiased estimate of of i • p :sampling probability • N : sampled subsets (e.g. N sampled flows) Exists an unbiased estimator that has small mean squared error: MSE(i)? Try Maximum Likelihood Estimator (MLE)?

  16. Maximum Likelihood Estimation • Simulation: edge sampling • Flickr network (photo-sharing), 1.5M nodes i in-degree

  17. Cramer-Rao Lower Bound (CRLB) • Let B = [bij] , d = [dj] ,  = [i] • Then d = B • D = diag(d) : diagonal matrix Djj = dj • i : unbiased estimate of ofi • J : Fisher information matrix of N subsets • J = BT D B • lower bound Mean Squared Error of i: MSE(i) (J-1)ii/N Need to find J-1

  18. Recap • Interested in the inverse of Fisher information matrix because MSE(i) (J-1)ii/N • N : # of subsets sampled (# of nodes, # of TCP flows) •  : subset size distribution estimate (what we seek) • p : sampling probability (edges, packets) • W : maximum subset size

  19. Results

  20. Heavier than exponential subset size distribution tail • Theorem 1: Suppose that W decreases more slowly than exponential. More precisely assume –log(W) = o(W) error grows with subset size W

  21. Exponential subset size distribution tail • Theorem 2: Suppose that W decreases exponentially in W. More precisely assume-log(W) = W log a + o(W) as W  ∞ for some 0 < a < 1

  22. Lighter than exponential subset size distribution tail • Theorem 3: Suppose that W decreases faster than exponentially in W. More precisely assume -log(W) = 𝜔(W). Then it follows that0 < p≤ 1

  23. Infinite support  & power laws • If  is power law with infinite support (W  ∞) • if p < ½ any unbiased estimator has “infinite” MSE • might as well output random estimates • if p > ½ estimates can be accurate if enough samples are collected

  24. Estimating Subset Size Average • I : randomly chosen subset size • Average subset size E[I]: • E[I] ≤ ∞ & E[I2] = ∞ then estimation error is unbounded • Reason: inspection paradox • Sampling biased towards very large subsets • Average size of sampled subsets  E[I2]/2E[I] • otherwise, error is bounded

  25. Part 2: Opportunity Impact of power laws on sampling central network nodes

  26. Central Nodes • Central nodes important in networks • Communication bottlenecks, trend setters, information aggregators • Notions of centrality. • betweenness, closeness, PageRank, degree Challenge: identify top k central nodes exploring small fraction of network central nodes

  27. Degree as a proxy for centrality • Betweenness centrality: node is central if it belongs to many shortest paths • Closeness centrality: node iscentral if has short paths to all other nodes • Rank correlation measures the degree of similarity between two rankings • Low rank correlation inplanar graphs (e.g. power grid) Rank correlation with Degree

  28. Looking for high degree nodes Random walk in steady state visits node with probability proportional to node degree In power law graphs such bias towards high degree nodes is strong We observe that RWs more efficient than more evolved techniques (AXS, RXS) % of network sampled

  29. Thank you

More Related