Dynamics of networks

Jure Leskovec Machine Learning Department Carnegie Mellon University Dynamics of networks

Cyberspace • The emergence of ‘cyberspace’ and the World Wide Web is like the discovery of a new continent. • Jim Gray, 1998 Turing Award address

Today: Rich data • Large on-line systems have detailed records of human activity • On-line communities: • Facebook (64 million users, billion dollar business) • MySpace (300 million users) • Communication: • Instant Messenger (~1 billion users) • News and Social media: • Blogging (250 million blogs world-wide, presidential candidates run blogs) • On-line worlds: • World of Warcraft (internal economy 1 billion USD) • Second Life (GDP of 700 million USD in ‘07) Opportunities for impact in science and industry

The networks c) Social networks b) Internet (AS) a) World wide web d) Communication e) Citations f) Protein interactions

Networks: What do we know? • We know lots about the network structure: • Properties: Scale free [Barabasi ’99], 6-degrees of separation [Milgram ’67], Navigation [Adamic-Adar ’03, LibenNowell ’05],Bipartite cores [Kumar et al. ’99], Network motifs [Milo et al. ‘02], Communities [Newman ‘99], Conductance [Mihail-Papadimitriou-Saberi ‘06], Hubs and authorities [Page et al. ’98, Kleinberg ‘99] • Models: Preferential attachment [Barabasi ’99], Small-world [Watts-Strogatz ‘98], Copying model [Kleinberg el al. ’99], Heuristically optimized tradeoffs [Fabrikant et al. ‘02],Congestion [Mihail et al. ‘03], Searchability [Kleinberg ‘00],Bowtie [Broder et al. ‘00], Transit-stub [Zegura ‘97], Jellyfish [Tauro et al. ‘01] We know much less about processes and dynamics of networks

My research: Network dynamics • Network Dynamics: • Network evolution • How network structure changes as the network grows and evolves? • Diffusion and cascading behavior • How do rumors and diseases spread over networks?

My research: Scale matters • We need massive network data for the patterns to emerge: • MSN Messenger network[WWW ’08, Nature ‘08] (the largest social network ever analyzed) • 240M people, 255B messages, 4.5 TB data • Product recommendations [EC ‘06] • 4M people, 16M recommendations • Blogosphere[work in progress] • 60M posts, 120M links

My research: The structure

Diffusion and Cascades • Behavior that cascades from node to node like an epidemic • News, opinions, rumors • Word-of-mouth in marketing • Infectious diseases • As activations spread through the network they leave a trace – a cascade Cascade (propagation graph) Network

[w/ Adamic-Huberman, EC ’06] 10% credit 10% off Setting 1: Viral marketing • People send and receive product recommendations, purchase products • Data:Large online retailer:4 million people, 16 million recommendations, 500k products

[w/ Glance-Hurst et al., SDM ’07] Setting 2: Blogosphere • Bloggers write posts and refer (link) to other posts and the information propagates • Data:10.5 million posts, 16 million links

[w/ Kleinberg-Singh, PAKDD ’06] Q1) What do cascades look like? • Are they stars? Chains? Trees? • Information cascades (blogosphere): propagation • Viral marketing (DVD recommendations): (ordered by frequency) • Viral marketing cascades are more social: • Collisions (no summarizers) • Richer non-tree structures

Q2) Human adoption curves • Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Granovetter ’78] • What is the shape? • Distinction has consequences for models and algorithms To find the answer we need lots of data Prob. of adoption Prob. of adoption k = number of friends adopting k = number of friends adopting Diminishing returns? Critical mass?

[w/ Adamic-Huberman, EC ’06] Q2) Adoption curve: Validation DVD recommendations (8.2 million observations) Probability of purchasing # recommendations received Adoption curve follows the diminishing returns. Can we exploit this? Later similar findings were made for group membership [Backstrom-Huttenlocher-Kleinberg ‘06], and probability of communication [Kossinets-Watts ’06]

My research: The structure  

Cascade & outbreak detection • Blogs – information epidemics • Which are the influential/infectious blogs? • Viral marketing [Domingos-Richardson] • Who are the trendsetters? • Influential people? • Disease spreading • Where to place monitoring stations to detect epidemics?

[w/ Krause-Guestrin et al., KDD ’07] The problem: Detecting cascades (best student paper) How to quickly detect epidemics as they spread? c1 c3 c2

[w/ Krause-Guestrin et al., KDD ’07] Two parts to the problem (best student paper) • Cost: • Cost of monitoring is node dependent • Reward: • Minimize the number of affected nodes: • If A are the monitored nodes, let R(A) denote the number of nodes we save We also consider other rewards: • Minimize time to detection • Maximize number of detected outbreaks A R(A) ( )

Optimization problem • Given: • Graph G(V,E), budget M • Data on how cascades C1, …, Ci,…,CKspread over time • Select a set of nodes A maximizing the reward subject to cost(A) ≤ M • Solving the problem exactly is NP-hard • Max-cover [Khuller et al. ’99] Reward for detecting cascade i

[w/ Krause-Guestrin et al., KDD ’07] Detection: Solution outline (best student paper) • Submodularity of the reward functions • CELF: • algorithm with approximate guarantee • Lazy evaluation to speed-up CELF We extend the result of [Kempe-Kleinberg-Tardos ’03]

Background: Greedy hill-climbing • If objective function exhibits diminishing returns: Submodularity (think of it as “convexity”) • Theorem[Nemhauser et al. ‘78] If R is a function that is monotone and submodular, then k-step greedy produces set A for which R(A) is within (1-1/e) ~63% of optimal. Greedy Add sensor with highest marginal gain d a Slow! O(kN) c b a c b d e Marginal reward

Reward function is submodular • R(A) is monotone: monitoring more nodes never hurts • We must show R is submodular: A B • Natural example: • Sets A1, A2,…, An • R(A) = size of union ofAi (size of covered area) • If R1,…,RKare submodular, then ∑pi Ri is submodular R(A  {u}) – R(A) ≥ R(B  {u}) – R(B) Gain of adding a node to a small set Gain of adding a node to a large set B A u

[w/ Krause-Guestrin et al., KDD ’07] Reward function is submodular (best student paper) • Theorem: • Reward function is submodular • Consider cascade i: • Ri(uk) = set of nodes saved from uk • Ri(A) = size of union Ri(uk), ukA • Ri is submodular • Global optimization: • R(A) =  Prob(i) Ri(A)  R is submodular Ri(u2) u2 u1 Ri(u1) Cascade i

[w/ Krause-Guestrin et al., KDD ’07] Solution: CELF Algorithm (best student paper) • We extend the setting to variable node cost case • We develop CELF(cost-effective lazy forward-selection) algorithm: • Two independent runs of a modified greedy • Solution set A’: ignore cost, greedily optimize reward • Solution set A’’: greedily optimize reward/cost ratio • Pick best of the two: arg max(R(A’), R(A’’)) • Theorem: CELF is near optimal • CELF achieves ½(1-1/e) factor approximation For the size of our problems naïve CELF is too slow

Scaling up: Lazy evaluation • Observation: Submodularity guarantees that marginal rewards decrease with the solution size • Idea: • Use marginal reward from previous step as a upper bound on current marginal reward d b d a e Marginal reward c

CELF: Lazy evaluation • CELF algorithm: • Keep an ordered list of marginal rewards ri from previous step • Re-evaluate rionlyfor the top node a d b b a c e c d e Marginal reward

b c d e CELF: Lazy evaluation • CELF algorithm: • Keep an ordered list of marginal rewards ri from previous step • Re-evaluate rionlyfor the top node a d b a e c Marginal reward

b e CELF: Lazy evaluation • CELF algorithm: • Keep an ordered list of marginal rewards ri from previous step • Re-evaluate rionlyfor the top node a d d b a e c c Marginal reward

Blogs: Information epidemics • Which blogs should one read to catch big stories? For more info see our website: www.blogcascade.org CELF Reward (higher is better) In-links (used by Technorati) Out-links # posts Random Number of selected blogs (sensors)

CELF: Scalability Exhaustive search Greedy Run time (seconds) (lower is better) CELF runs 700x faster than simple greedy algorithm CELF Number of selected blogs (sensors)

[w/ Krause et al., J. of Water Resource Planning] Same problem: Water Network • Given: • a real city water distribution network • data on how contaminants spread over time • Place sensors (to save lives) • Problem posed by the US Environmental Protection Agency c1 S S c2

[w/ Ostfeld et al., J. of Water Resource Planning] Water network: Results CELF • Our approach performed best at the Battle of Water Sensor Networks competition Degree Random Population saved (higher is better) Population Flow Number of placed sensors

My research: The structure   

Background: Network models • Empirical findings on real graphs led to new network models • Such models make assumptions/predictions about other network properties • What about network evolution? Model Explains log prob. Power-law degree distribution Preferential attachment log degree

[w/ Kleinberg-Faloutsos, KDD ’05] Q5) Network evolution (best research paper) Internet • Networks are denser over time • Densification Power Law: a … densification exponent (1 ≤ a ≤ 2) • What is the relation between the number of nodes and the edges over time? • Prior work assumes: constant average degree over time E(t) a=1.2 N(t) Citations E(t) a=1.6 N(t)

[w/ Kleinberg-Faloutsos, KDD ’05] Q5) Network evolution (best research paper) Internet • Prior models and intuition say that the network diameter slowly grows (like log N, log log N) • Diameter shrinks over time • as the network grows the distances between the nodes slowly decrease diameter size of the graph Citations diameter time

Q6) Need a new network model • None of the existing models generates graphs with these properties • We need a new model • Idea: Generate graphs recursively Recursion (fractals) Skewed distributions (power-laws)

[w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05] The model: Kronecker graphs Edge probability Edge probability • We prove Kronecker graphs mimic real graphs: • Power-law degree distribution, Densification, Shrinking/stabilizing diameter, Spectral properties pij Kronecker product of graph adjacency matrices (actually, there is also a nice social interpretation of the model) (3x3) (9x9) (27x27) Initiator

Q6) Generating realistic graphs • Want to generate realistic networks: • Why synthetic graphs? • Anomaly detection, Simulations, Predictions, Null-model, Sharing privacy sensitive graphs, … • Q:Which network properties do we care about? • A:Don’t commit, let’s match adjacency matrices Given a real network Generate a synthetic network Compare graphs properties, e.g., degree distribution

[w/ Faloutsos, ICML ’07] Kronecker graphs: Estimation • Maximum likelihood estimation • Naïve estimation takes O(N!N2): • N!for different node labelings: • Our solution: Metropolis sampling: N! (big) const • N2 for traversing graph adjacency matrix • Our solution: Kronecker product (E << N2):N2E • Do stochastic gradient descent P( | ) Kronecker arg max We estimate the model in O(E)

[w/ Faloutsos, ICML ’07] Estimation: Epinions (N=76k, E=510k) • We search the space of ~101,000,000 permutations • Fitting takes 2 hours • Real and Kronecker are very close Degree distribution Path lengths “Network” values probability # reachable pairs network value node degree rank number of hops

My research: The structure    A3, A4:CELF  

Two applications • P1) Web Projections: • Predicting search result quality without page content • P2) MSN Instant Messenger: • Social network of the whole world

[w/ Dumais-Horvitz, WWW ’07] P1) Web Projections • User types in a query to a search engine • Search engine returns results: Result returned by the search engine Hyperlinks Non-search results connecting the graph Is this a good set of search results?

[w/ Dumais-Horvitz, WWW ’07] P1) Web Projections: Results • We can predict search result quality with 80% accuracyjust from the connection patterns between the results Good? Poor? Predict “Good” Predict “Poor”

[w/ Horvitz, WWW ’08, Nature ‘08] P2) Planetary look on a small-world • Small-world experiment [Milgram ‘67] • People send letters from Nebraska to Boston • How many steps does it take? • Messenger social network – largest network analyzed • 240M people, 30B conversations, 4.5TB data MSN Messenger network Milgram’s small world experiment • Lots of engineering effort and good hardware: • 4 dual-core Opteron server • 48GB memory • 6.8TB of fast SCSI disks • How to learn short paths? Number of steps between pairs of people (i.e., hops + 1)

My research: The structure    Big data matters  

Future direction 1: Diffusion Obscure technology story Small tech blog • How do news and information spread • New ranking and influence measures for blogs • Sentiment analysis from cascade structure • Crawling 2 million blogs/day (with Spinn3r) Slashdot Wired New Scientist New York Times CNN BBC

Future direction 1: Diffusion • Models of information diffusion • When, where and what post will create a cascade? • Where should one tap the network to get the effect they want? • Social Media Marketing • How to handle richer classes of graphs • Interaction of network structure and node/edge attributes

Dynamics of networks