Nonparametric Link Prediction in Dynamic Graphs

Nonparametric Link Prediction in Dynamic Graphs PurnamritaSarkar (UC Berkeley) DeepayanChakrabarti (Facebook) Michael Jordan (UC Berkeley)

Link Prediction • Who is most likely to be interact with a given node? Should Facebook suggest Alice as a friend for Bob? Alice Bob Friend suggestion in Facebook

Link Prediction Alice Should Netflix suggest this movie to Alice? Bob Charlie Movie recommendation in Netflix

Link Prediction • Prediction using simple features • degree of a node • number of common neighbors • last time a link appeared • What if the graph is dynamic?

Related Work • Generative models • Exp. family random graph models [Hanneke+/’06] • Dynamics in latent space [Sarkar+/’05] • Extension of mixed membership block models [Fu+/10] • Other approaches • Autoregressive models for links [Huang+/09] • Extensions of static features [Tylenda+/09]

Goal • Link Prediction • incorporating graph dynamics, • requiring weak modeling assumptions, • allowing fast predictions, • and offering consistency guarantees.

Outline • Model • Estimator • Consistency • Scalability • Experiments

The Link Prediction Problem in Dynamic Graphs YT+1 (i,j)=? Y1 (i,j)=1 Y2 (i,j)=0 …… G2 G1 YT+1(i,j) | G1,G2, …,GT ~ Bernoulli(gG1,G2,…GT(i,j)) GT+1 Features of previous graphsand this pair of nodes Edge in T+1

Including graph-based features • Example set of features for pair (i,j): • cn(i,j) (common neighbors) • ℓℓ(i,j) (last time a link was formed) • deg(j) • Represent dynamics using “datacubes” of these features. • ≈ multi-dimensional histogram on binned feature values ηt = #pairs in Gt with these features 1 ≤ cn ≤ 33 ≤ deg ≤ 61 ≤ ℓℓ ≤ 2 high ηt+/ηt this feature combination is more likely to create a new edge at time t+1 cn deg ηt+ = #pairs in Gt with these features, which had an edge in Gt+1 ℓℓ

Including graph-based features 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 • How do we form these datacubes? • Vanilla idea: One datacube for Gt→Gt+1aggregated over all pairs (i,j) • Does not allow for differently evolving communities YT+1 (i,j)=? Y2 (i,j)=0 Y1 (i,j)=1 …… G2 G1 GT

Our Model 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 • How do we form these datacubes? • Our Model: One datacube for each neighborhood • Captures local evolution Y2 (i,j)=0 YT+1 (i,j)=? Y1 (i,j)=1 …… G2 G1 GT

Our Model 1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2 Neighborhood Nt(i)= nodes within 2 hops Features extracted from (Nt-p,…Nt) Datacube Number of node pairs- with feature s- in the neighborhood of i- at time t Number of node pairs- with feature s- in the neighborhood of i- at time t- which got connected at time t+1

Our Model • Datacubedt(i) captures graph evolution • in the local neighborhood of a node • in the recent past • Model: • What is g(.)? g(dt(i), st(i,j)) YT+1(i,j) | G1,G2, …,GT ~ Bernoulli( gG1,G2,…GT(i,j)) Features of the pair Local evolution patterns

Kernel Estimator for g G1 G2 …… { { { { { { { { { { { { { { { { { { { { { { { { { { query data-cube at T-1 and feature vector at time T compute similarities GT-1 GT-2 … … … GT datacube, feature pair t=3 datacube, feature pair t=2 datacube, feature pair t=1

K( , )I{ == } Kernel Estimator for g • Factorize the similarity function • Allows computation of g(.) via simple lookups } } }

Kernel Estimator for g G1 G2 …… compute similarities only between data cubes w1 η1 , η1+ w2 η2 , η2+ GT-1 η3 , η3+ w3 datacubes t=3 datacubes t=2 datacubes t=1 GT-2 η4 , η4+ w4 GT

K( , )I{ == } Kernel Estimator for g • Factorize the similarity function • Allows computation of g(.) via simple lookups • What is K( , )? } } }

Similarity between two datacubes Idea 1 • For each cell s, take(η1+/η1 – η2+/η2)2 and sum • Problem: • Magnitude of η is ignored • 5/10 and 50/100 are treated equally • Consider the distribution η1 , η1+ η2 , η2+

Similarity between two datacubes Idea 2 • For each cell s, compute posterior distribution of edge creation prob. • dist = total variation distance between distributions • summed over all cells η1 , η1+ η2 , η2+ 0<b<1 As b0, K( , ) 0 unless dist( , ) =0

Kernel Estimator for g Want to show:

Consistency of Estimator • Lemma 1: As T→∞, for some R>0, • Proof using: As T→∞,

Consistency of Estimator • Lemma 2: As T→∞,

Consistency of Estimator • Assumption: finite graph • Proof sketch: • Dynamics are Markovian with finite state space • the chain must eventually enter a closed, irreducible communication class • geometric ergodicity if class is aperiodic(if not, more complicated…) • strong mixing with exponential decay • variances decay as o(1/T)

Consistency of Estimator • Theorem: • Proof Sketch: • for some R>0 • So

Scalability • Full solution: • Summing over all n datacubesfor all T timesteps • Infeasible • Approximate solution: • Sum over nearest neighbors of query datacube • How do we find nearest neighbors? • Locality Sensitive Hashing (LSH)[Indyk+/98, Broder+/98]

Using LSH • Devise a hashing function for datacubes such that • “Similar” datacubestend to be hashed to the same bucket • “Similar” = small total variation distance between cells of datacubes

Using LSH • Step 1: Map datacubes to bit vectors Use B1 buckets to discretize [0,1] Use B2 bits for each bucket Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells For probability mass p the first bits are set to 1

Using LSH • Step 1: Map datacubes to bit vectors • Total variation distance L1 distance between distributions Hamming distance between vectors • Step 2: Hash function = k out of MB1B2 bits

Fast Search Using LSH 0000 0001 1111111111000000000111111111000 0011 10000101000011100001101010000 . . . . 1011 10101010000011100001101010000 101010101110111111011010111110 1111111111000000000111111111001 1111

Experiments • Baselines • LL: last link (time of last occurrence of a pair) • CN: rank by number of common neighbors in • AA: more weight to low-degree common neighbors • Katz: accounts forlonger paths • CN-all: apply CN to • AA-all, Katz-all: similar s s

Setup • Pick random subset S from nodes with degree>0 in GT+1 • , predict a ranked list of nodes likely to link to s • Report mean AUC (higher is better) G2 G1 GT Test data Training data GT+1

Simulations • Social network model of Hoff et al. • Each node has an independently drawn feature vector • Edge(i,j) depends on features of i and j • Seasonality effect • Feature importance varies with season • different communities in each season • Feature vectors evolve smoothly over time • evolving community structures

Simulations • NonParamis much better than others in the presence of seasonality • CN, AA, and Katz implicitly assume smooth evolution

Sensor Network* * www.select.cs.cmu.edu/data

Summary • Link formation is assumed to depend on • the neighborhood’s evolution • over a time window • Admits a kernel-based estimator • Consistency • Scalability via LSH • Works particularly well for • Seasonal effects • differently evolving communities

Thanks!

Problem statement • We are given {G1, G2,…, Gt}. Want to predict Gt+1 • Model 1: Yt+1(i,j) = f(Yt-p+1(i,j), …, Yt(i,j)) • Takes all edges as independent • Only looks at one feature. • Model2: Gt+1 = f(Gt-p+1, Gt-p+2,…, Gt) • Huge dimensionality • Probably intractable • Middle ground • Learn local prediction model for Yt+1(i,j) using a few features and patch these together to predict the entire graph.

Our Model • Idea: Yt+1(i,j) depends on features of (i,j) and the neighborhood of i in the ‘’p’’ previous graphs. Features specific to (i,j) in t {deg(i), deg(j), cn(i,j), ℓℓ(i,j)} Features of the neighborhood of i Should be amenable to fast algorithms. Should reflect the evolution of the graph. But should also be similar to the features of (i,j).

Estimation • Kernel Estimator of g } Once you have computed the kernel similarities between two datacubes, everything boils down to table lookups.

Distance between two datacubes • Can just compare rates of link formation, i.e. η+/η, but this does not take into account the variance. • Instead, make a normal approximation to η+/η and look at the total variation distance. As b0, K(dt(i), dt’(i’)) 0 unless D(K(dt(i), dt’(i’)) =0

Consistency of Estimator • Define • Kind of behaves like a bias term.

Consistency of Estimator • Show • Assumption 1. b0 as nT∞ [similar to kernel density estimation] • Show that for bounded q, • Assumption 2. Introduce strong mixing coefficient α(k), roughly this bounds the degree of dependence between two neighborhoods at distance k. • The total covariance between all neighborhoods is bounded. • Assume

Including graph-based features 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 YT+1 (i,j)=? Y2 (i,j)=0 Y1 (i,j)=1 …… G2 G1 Too local, not to mention expensive Too global. Idea2: Make one datacube for each pair of nodes. Idea1: Make one datacube per (Gt ,Gt+1 ) transition. Learn how successful this feature combination has been in generating links over the past. GT

Our Model 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 sT (i,j) • Datacubedt(i)captures the evolution of a small (2-hop) neighborhood around node i • Close nodes will have overlapping neighborhoods  similar datacubes. Y2 (i,j)=0 Y1 (i,j)=1 YT+1 (i,j)=? …… G2 G1 GT {dT-1(i) ,sT (i,j)}

Building neighborhood features • Let S=range of s(i,j). Assume S is finite. Datacube Number of pairs with feature s in the neighborhood of i at time t Number of pairs which got connected at time t+1 out of ηit (s) • Captures the evolution of the neighborhood from tt+1 • We use the past evolution pattern of a neighborhood in predicting future evolution. • But how do we estimate g efficiently? We will show that the inference of g will boil down to table lookups in the datacubesdt(i)

Nonparametric Link Prediction in Dynamic Graphs

Nonparametric Link Prediction in Dynamic Graphs

Presentation Transcript

CoupledLP : Link Prediction in Coupled Networks

Dynamic Branch Prediction

Dynamic Matchings in Convex Bipartite Graphs

Link Prediction in Co-Authorship Network

Dynamic Branch Prediction

Dynamic Branch Prediction

Dynamic Branch Prediction

Dynamic Branch Prediction

Dynamic Branch Prediction

Fast Dynamic Reranking in Large Graphs

Dynamic link libraries

Hardware Dynamic Branch Prediction

Dynamic Hardware Branch Prediction

Dynamic Branch Prediction

Dynamic Branch Prediction

Nonparametric Latent Feature Models for Link Prediction

Graphs in Colorectal Cancer Prediction

Modeling massive dynamic graphs

Dynamic Branch Prediction

Dynamic Link Quality Measurements