1 / 31

Neighborhood Formation and Anomaly Detection in Bipartite Graphs

Neighborhood Formation and Anomaly Detection in Bipartite Graphs. Jimeng Sun, Huiming Qu , Deepayan Chakrabarti & Christos Faloutsos. Presented By Bhavana Dalvi. Outline. Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work

osman
Télécharger la présentation

Neighborhood Formation and Anomaly Detection in Bipartite Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun, HuimingQu, DeepayanChakrabarti & Christos Faloutsos Presented By BhavanaDalvi

  2. Outline • Motivation • Problem Definition • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  3. Bipartite graphs and interesting questions Author Paper graph Papers Authors a

  4. Bipartite graphs and interesting questions Author Paper graph Papers Authors Which authors are most related to ‘a’ ? a

  5. Bipartite graphs and interesting questions Author Paper graph Papers Authors Which authors are most related to ‘a’ ? a

  6. Bipartite graphs and interesting questions Author Paper graph Papers Authors Which authors are most related to ‘a’ ? a b 0.8

  7. Bipartite graphs and interesting questions Author Paper graph Papers Authors Which authors are most related to ‘a’ ? a 0.4 b 0.8 0.6 0.2

  8. Bipartite graphs and interesting questions Author Paper graph Papers Authors a 0.4 0.8 0.6 Which is the uncommon paper written by ‘a’ ? 0.2

  9. Bipartite graphs and interesting questions Author Paper graph Papers Authors a 0.4 0.8 0.6 Which is the uncommon paper written by ‘a’ ? 0.2

  10. Bipartite graphs and interesting questions P2P Network Which users have similar preferences as a particular user? users files • Which files are downloaded by users with very different preferences? Jimeng Sun’s presentation at ICDM 2005

  11. Outline • Motivation • Problem Definition • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  12. Problem definition V1 V2 • Neighborhood formation (NF) • Input : query node q in V1 • Output : relevance scores of all the nodes in V1 to q • Anomaly detection (AD) • Input : query node q in V1, • Output : normality scores for nodes in V2 that link to q E q

  13. Outline • Motivation • Problem Definition • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  14. Neighborhood formation • Relevance (b, q)  (# short length paths from q to b) q q b b The connection that links only b and q brings more relevance than the connection which links b, q and other nodes.

  15. c c q c c c Exact NF algorithm : random walk with restart Input : a graph G and a query node q Output : relevance scores to q • Construct the transition matrix where • every node in the graph becomes a state • every state has a restart probability c to jump back to the query node q. • transition probability • Find the steady-state probability u which is the relevance score of all the nodes to q Jimeng Sun’s presentation at ICDM 2005

  16. Finding Steady State Probabilities • |V1| = k , |V2| = n • M : k*n matrix representing weighted graph G • Adjacency matrix : • PA= col_norm(MA) • qA : transform query node ‘a’ to (k+n)*1 vector where only ath column has 1 and rest are 0. • uA : steady state probability vector with restart probability c • Bipartite structure : • k << n then savings are significant

  17. Extensions to NF Algorithm • Parallel NF • If multiple queries, computation can be done in parallel. • Approximate NF • Cluster the nodes in to k partitions (preprocessing) • Given query node q, find partition Gi it belongs to • Run Exact NF algorithm only on Gi • Set relevance = 0 for nodes not in Gi

  18. Outline • Motivation • Problem Definition • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  19. Anomaly Detection • A node x in V2 is normal if • Nodes in V1 that links to x • are in same neighbourhood. • e.g. V1 V2 • V1 V2 x x high normality low normality

  20. Anomaly Detection Algorithm • Input : node t in V2, Bipartite transition matrix P, • Output : Normality score(t) • Set St = neighbours of t in V1 • RSt : Pairwise relevance scores for nodes in St • Normality score ns(t) = function (RSt) e.g. mean over non-diagonal elements in RSt

  21. Outline • Motivation • Problem Definition • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  22. Datasets

  23. relevance score Do the neighborhoods make sense? relevance score relevance score most relevant neighbors most relevant neighbors The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node.

  24. How accurateis the approximate NF? neighborhood size = 20 num of partitions = 10 • Precision = fraction of overlaps between ApprNF and • NF among top k neighbors • The precision drops slowly while increasing the number of partition • The precision remain high for a wide range of neighborhood size

  25. Do the anomalies make sense? avg. normality score • Injection : • Inject 100 nodes in V2connecting k nodes each in V1 • where k = avg. degree of nodes in V2 • Nodes in V1 are randomly picked such that • degree = 10 * avg. degree of nodes in V1 • Assumption : will induce connections across neighbourhoods

  26. What about the computational cost? Computational cost drops significantly even with small increment in number of partitions

  27. Outline • Motivation • Problem Definition • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  28. RelatedWork • Random walk on Graphs • Page-Rank [ISDN 1998], • Topic Sensitive Page-Rank [WWW 2002] • Outlier detection • Outlier detection in high dimensional data : Aggarwal and Yu [SIGMOD 2001] • Outlier Detection Using Random Walks [ICTAI 2006] • Find outlier clusters • Graph partitioning : • METIS package • Spectral clustering methods • Neighbourhoods can become personalized clusters

  29. Outline • Motivation • Problem Definition • Neighborhood formation • Anomaly detection • Experiments • Related work • Conclusion and future work

  30. Conclusions and Future Work • Solution to two problems for Bipartite Graphs • Neighborhood Formation (NF) • Anomaly Detection (AD) • Random walk with restart along with graph partitioning can be used to solve NF efficiently. • AD can be done based on relevance scores generated by NF • Experiments on real datasets show good results. • Proximity Tracking on Time-Evolving Graphs (SIAM 2008 paper) • Defines proximity scores in dynamic setting. • Efficient incremental updates

  31. Thank you

More Related