Challenges and Opportunities Posed by Power Laws in Network Analysis

Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26th Oct 2011

Power Laws in Networks • Network topology: • power law distribution of node degrees • AS topology, social networks (Facebook, etc) • Network traffic: • Flow: subset of packets • Power law distribution of flow sizes Flickr dataset P[deg > d ] packet stream router vertex degree - d

Characterizing Networks from Incomplete Data This talk • Estimate distributions (of degrees, of flow sizes, …) from incomplete data (sampled edges, sampled packets, …) • Uncover central nodes in the network

Outline • Challenge: Estimating subset size distributions from incomplete data • Incomplete data: • randomly sampled edges, randomly sampled packets, … • Impact of power laws on estimation accuracy • Impact of other distributions on estimation accuracy • Opportunity: Uncovering central nodes in power law networks

Part 1: Challenge Estimating SUBSET SIZE distributions from incomplete data

Subset size distributions types of fish (subsets) Number of fishes (subset size) Set of fishes fraction of subsets (types of fish) with size x distribution x - subset size (number of fishes)

Estimating subset size distributions sampled fishes randomly sample N fishes (uniformly) Set of fishes fraction of subsets (types of fish) with size x distribution unbiased estimate x - subset size (number of fishes)

Questions How many fishes need to catch toobtain accurate distribution estimates? What is impact of distribution shapeon estimation accuracy?

Incomplete Data Estimation flow set of IP packets random sampling Sampled packets IP flow size distribution estimation

Network-related subset sizedistributions (webgraph) • Distribution of # incoming links to a webpage • Q: do we need to crawl most of web graph? • Incoming links observed as outgoing links from other webpages • set = set of links • subset = incoming links to a webpage • sampling: link sampling outgoing links ? in-degree: # of links to webpage

Network-related subset sizedistributions (IP traffic) • Distribution number of packets in a TCP flow • Set = IP packets • Subset = a IP flow • Sampling: packet sampling packet stream router

Incomplete Data, Edge Sampling Example Original graph Original In-Degree Distribution Sampled in-degrees 3x Estimator

Incomplete data model • Set elements sampled with probability p • without replacement • independently • Model • : probability that j outof i subset elements are sampled • i : fraction of subsets with i elements • e.g.: fraction of nodes with degree i, fraction of flows with i packets

Model (cont) • bij– binomial(i,j) • i : fraction of subsets with i elements • W : maximum subset size • : fraction of subsets with j sampled elements • d0 is not observable

Mean Squared Error Question • i: unbiased estimate of of i • p :sampling probability • N : sampled subsets (e.g. N sampled flows) Exists an unbiased estimator that has small mean squared error: MSE(i)? Try Maximum Likelihood Estimator (MLE)?

Maximum Likelihood Estimation • Simulation: edge sampling • Flickr network (photo-sharing), 1.5M nodes i in-degree

Cramer-Rao Lower Bound (CRLB) • Let B = [bij] , d = [dj] ,  = [i] • Then d = B • D = diag(d) : diagonal matrix Djj = dj • i : unbiased estimate of ofi • J : Fisher information matrix of N subsets • J = BT D B • lower bound Mean Squared Error of i: MSE(i) (J-1)ii/N Need to find J-1

Recap • Interested in the inverse of Fisher information matrix because MSE(i) (J-1)ii/N • N : # of subsets sampled (# of nodes, # of TCP flows) •  : subset size distribution estimate (what we seek) • p : sampling probability (edges, packets) • W : maximum subset size

Results

Heavier than exponential subset size distribution tail • Theorem 1: Suppose that W decreases more slowly than exponential. More precisely assume –log(W) = o(W) error grows with subset size W

Exponential subset size distribution tail • Theorem 2: Suppose that W decreases exponentially in W. More precisely assume-log(W) = W log a + o(W) as W  ∞ for some 0 < a < 1

Lighter than exponential subset size distribution tail • Theorem 3: Suppose that W decreases faster than exponentially in W. More precisely assume -log(W) = 𝜔(W). Then it follows that0 < p≤ 1

Infinite support  & power laws • If  is power law with infinite support (W  ∞) • if p < ½ any unbiased estimator has “infinite” MSE • might as well output random estimates • if p > ½ estimates can be accurate if enough samples are collected

Estimating Subset Size Average • I : randomly chosen subset size • Average subset size E[I]: • E[I] ≤ ∞ & E[I2] = ∞ then estimation error is unbounded • Reason: inspection paradox • Sampling biased towards very large subsets • Average size of sampled subsets  E[I2]/2E[I] • otherwise, error is bounded

Part 2: Opportunity Impact of power laws on sampling central network nodes

Central Nodes • Central nodes important in networks • Communication bottlenecks, trend setters, information aggregators • Notions of centrality. • betweenness, closeness, PageRank, degree Challenge: identify top k central nodes exploring small fraction of network central nodes

Degree as a proxy for centrality • Betweenness centrality: node is central if it belongs to many shortest paths • Closeness centrality: node iscentral if has short paths to all other nodes • Rank correlation measures the degree of similarity between two rankings • Low rank correlation inplanar graphs (e.g. power grid) Rank correlation with Degree

Looking for high degree nodes Random walk in steady state visits node with probability proportional to node degree In power law graphs such bias towards high degree nodes is strong We observe that RWs more efficient than more evolved techniques (AXS, RXS) % of network sampled

Thank you

Challenges and Opportunities Posed by Power Laws in Network Analysis

Challenges and Opportunities Posed by Power Laws in Network Analysis

Presentation Transcript

The Legal and RIM Challenges Posed by Social Networking

The Legal and RIM Challenges Posed by Social Networking

Opportunities and challenges in telehealth

Indian Power Sector- Performance, Challenges and Opportunities

Opportunities and Challenges in Partnerships

Village Power Opportunities and Challenges

Opportunities and Challenges

Opportunities and Challenges

Challenges and Opportunities in Sheffield

Challenges and Opportunities in Wind Power in Indiana

Opportunities and Challenges

Challenges and Opportunities

Opportunities and Challenges

Challenges posed by Structural Equation Models

USA: Village Power Opportunities and Challenges

Opportunities and Challenges

Ill-Posed Problems and Numerical Analysis

Village Power Opportunities and Challenges

Transformational Challenges posed by Climate Change

Climate Challenges Posed by the Economic Crisis

Challenges and Opportunities in Wind Power in Indiana

Power Take-off Market analysis growth challenges opportunities and forthcoming developments