160 likes | 281 Vues
10-831 Event and Pattern Detection Topic Presentation - II FAST RANDOM WALK WITH RESTART and its applications Hanghang Tong, Christos Faloutsos , Jia -Yu (Tim) Pan ICDM 2006. Leman Akoglu lakoglu@cs.cmu.edu. February 2010. .3. .2. .05. .01. .002. .01. Random Walk with Restart.
E N D
10-831 Event and Pattern Detection Topic Presentation - IIFAST RANDOM WALK WITH RESTARTand its applicationsHanghang Tong, Christos Faloutsos, Jia-Yu (Tim) PanICDM 2006 Leman Akoglu lakoglu@cs.cmu.edu February 2010
.3 .2 .05 .01 .002 .01 Random Walk with Restart • Neighborhood formation • Given a node q, what are the relevance scores of all the nodes to q? V1 V2 q
RWR for Anomaly Detection t t high normality low normality • Anomaly detection (AD) • Given a node q, what are the normality scores for nodes that link to q?
RWR algebra: Steady-state probability vector (relevance scores) Starting vector Transition matrix Restart probability ? ?
10 9 12 2 8 1 11 3 0.04 0.03 10 9 0.10 12 4 0.13 0.08 2 0.02 8 1 11 0.13 3 6 0.04 5 4 0.05 6 5 0.13 7 7 0.05 OntheFly: No pre-computation/ light storage Slow on-line response O(mE)
0.04 0.03 10 9 0.10 12 0.13 0.08 2 0.02 8 1 11 0.13 3 0.04 4 0.05 6 5 0.13 7 0.05 PreCompute 10 9 12 2 8 1 11 R: 3 4 6 5 7 [Haveliwala]
10 9 12 2 8 1 11 3 0.04 0.03 10 9 0.10 12 4 0.13 0.08 2 0.02 8 1 11 0.13 3 6 0.04 5 4 0.05 6 5 0.13 7 7 0.05 PreCompute: Fast on-line response Heavy pre-computation/storage cost O(n ) 3 O(n ) 2
Sherman–Morrison–Woodbury Lemma says: • Can we write as (A+UCV) • for which • P1: A is easy to invert, and • P2: C is small?
10 9 12 2 8 1 11 3 4 6 5 7 Intiution 10 9 12 2 8 1 11 3 4 6 5 7 Within-partition links cross-partition links
10 9 12 2 8 1 11 3 4 6 5 7 P1: block-diagonal 10 9 12 2 8 1 11 3 4 6 5 7
10 9 12 2 8 1 11 3 4 6 5 7 P2: Low-Rank-Approx. for 10 9 12 2 8 1 11 3 4 6 5 7 ~ |S| <<|W2|
= +
Is easily convertible? YES! A few small, instead of ONE BIG, matrix inversions
Back to SM Lemma + ? ~ + ~
On-Line Stage: A handful of Matrix-Vector mult.s ? + Query Result Pre-Computation
Query Time vs. Pre-Storage • Quality: 90%+ • On-line: • Up to 150x speedup • Pre-storage: • 3orders saving Log Query Time • Dataset • DBLP/authorship • 315k nodes • 1,800k edges Log Storage