1 / 25

Neighborhood Formation and Anomaly Detection in Bipartite Graphs

Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos . Neighborhood Formation and Anomaly Detection in Bipartite Graphs. Speaker: Jimeng Sun. Bipartite Graphs. G={ V 1 + V 2 , E} such that edges are between V 1 and V 2

dai
Télécharger la présentation

Neighborhood Formation and Anomaly Detection in Bipartite Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jimeng Sun Huiming Qu Deepayan ChakrabartiChristos Faloutsos Neighborhood Formation and Anomaly Detection in Bipartite Graphs Speaker: Jimeng Sun

  2. Bipartite Graphs • G={V1 +V2, E} such that edges are between V1 and V2 • Many applications can be modeled using bipartite graphs • The key is to utilize these links across two natural groups for data mining

  3. .3 .25 .2 .25 .05 .05 .01 .002 .01 Problem Definition • Neighborhood formation (NF) • Given a query node a in V1, what are the relevance scores of all the nodes in V1 to a ? • Anomaly detection (AD) • Given a query node a in V1, what are the normality scores for nodes in V2 that link to a ? V1 V2 a

  4. Application I: Publication network • Authors vs. papers in research communities • Interesting queries: • Which authors are most related to Dr. Carman? • Which is the most unusual paper written by Dr. Carman?

  5. Application II: P2P network • Users vs. files in P2P systems • Interesting queries: • Find the users with similar preferences to me • Locate files that are downloaded by users with very different preferences users files

  6. Application III: Financial Trading • Traders vs. stocks in stock markets • Interesting queries: • Which are the most similar stocks to company A? • Find most unusual traders (i.e., cross sectors)

  7. Application IV: Collaborative filtering • collaborative filtering • recommendation system Customers Products

  8. Outline • Problem Definition • Motivation • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  9. Outline • Problem Definition • Motivation • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  10. .3 .2 .05 .01 .002 .01 Neighborhood formation – intuition Input: a graph G and a query node q Output: relevance scores to q • random-walk with restart from q in V1 • record the probability visiting each node in V1 • the nodes with higher probability are the neighbors V1 V2 q

  11. c c q c c c Exact neighborhood formation Input: a graph G and a query node q Output: relevance scores to q • Construct the transition matrix P where • every node in the graph becomes a state • every state has a restart probability c to jump back to the query node q. • transition probability • Find the steady-state probability u which is the relevance score of all the nodes to q (1-c)

  12. Approximate neighborhood formation • Scalability problem with exact neighborhood formation: • too expensive to do for every single node in V1 • Observation: • Nodes that are far away from q have almost 0 relevance scores. • Idea: • Partition the graphs and apply neighborhood formation for the partition containing q.

  13. Outline • Problem Definition • Motivation • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  14. Anomaly detection - intuition • t in V2 is normal if all a in V1 that link to t belong to the same neighborhood • e.g. t t high normality low normality

  15. Anomaly detection - method Input: a query node q from V2 Output: the normality score of q • Find the set of nodes connected to q, say S • Compute relevance scores of elements in S, denoted as rs • Apply score function f(rs) to obtain normality scores: • e.g. f(rs) = mean(rs) S q

  16. Outline • Problem Definition • Motivation • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  17. Datasets

  18. Goals [Q1]: Do the neighborhoods make sense? (NF) [Q2]: How accurateis the approximate NF? [Q3]: Do the anomalies make sense? (AD) [Q4]: What about the computational cost?

  19. relevance score most relevant neighbors [Q1] Exact NF Robert DeNiro (IMDB) ICDM (CA) • The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node. • The relevance scores can quantify how close/related the node is to the query node. relevance score most relevant neighbors

  20. [Q2] Approximate NF • Precision = fraction of overlaps between ApprNF and NF among top k neighbors • The precision drops slowly while increasing the number of partition • The precision remain high for a wide range of neighborhood size neighborhood size = 20 num of partitions = 10 Precision Precision # of partitions neighborhood size

  21. [Q3] Anomaly detection • Randomly inject some nodes and edges (biased towards high-degree nodes) • The genuine ones on average have high normality score than the injected ones normality score

  22. [Q4] Computational cost • Even with a small number of partitions, the computational cost can be reduced dramatically. Approximate NF Time(sec) # of Partitions

  23. Related Work • Random walk [Brin & Page98] [Haveliwala WWW02] • Graph partitioning [Karypis and Kumar98] [Kannan et al. FOCS00] • Collaborative filtering [Shardanand&Maes95] … • Anomaly detection [Aggarwal&Yu. SIMOD01] [Noble&Cook KDD03][Newman03]

  24. Conclusion • Two important queries on bipartite graphs: NF and AD • An efficient method for NF using random-walk with restart and graph partitioning techniques • Based the result of NF, we can also spot anomalies (AD) • Effectiveness is confirmed on real datasets

  25. Future work and Q & A • Future work • What about time-evolving graphs? • Contact: Jimeng Sun jimeng@cs.cmu.edu http://www.cs.cmu.edu/~jimeng

More Related