1 / 75

Analyzing Probabilistic Graphs

Analyzing Probabilistic Graphs. Michalis Potamias. Information propagation. Protein-protein interaction network. Mobile ad hoc network. The probabilistic view of graphs. social, biological, mobile ad hoc …. Research Approach. Uncertainty. Graph Analysis. Data.

misu
Télécharger la présentation

Analyzing Probabilistic Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Probabilistic Graphs Michalis Potamias

  2. Information propagation Protein-protein interaction network Mobile ad hoc network The probabilistic view of graphs Michalis Potamias : Analyzing Probabilistic Graphs

  3. social, biological, mobile ad hoc … Research Approach Uncertainty Graph Analysis Data Nearest-neighbors, clustering, learning Define tasks Design algorithms Practical? Useful? Michalis Potamias : Analyzing Probabilistic Graphs 3

  4. Outline • Distance and k-nearest neighbors • Distance definition • Sampling • kNN Pruning • Predicting known relationships from PPI networks • Clustering • Edit distance and cluster graphs • Clustering probabilistic graphs • Learning in information propagation • The problem • Anecdotes • Ongoing and Future work • Information propagation • Random walks • Daily deals Michalis Potamias : Analyzing Probabilistic Graphs

  5. Outline • Distance and k-nearest neighbors • Distance definition • Sampling • kNN Pruning • Predicting known relationships from PPI networks • Clustering • Edit distance and cluster graphs • Clustering probabilistic graphs • Learning in information propagation • The problem • Anecdotes • Ongoing and Future work • Information propagation • Random walks • Daily deals Michalis Potamias : Analyzing Probabilistic Graphs

  6. Distance and Nearest Neighbors How do we define a distance function in probabilistic graphs? • Reliability(Valiant, SIAM J. Comput. 1979) (Asthana et al., Genome Research 2004) • Probability of the most probable path (Sevon et al., DILS 2006) • Based on the shortest path PDF(Potamias et al., VLDB 2010) • PPI Nearest Neighbors: Filter candidate interactions. Michalis Potamias : Analyzing Probabilistic Graphs

  7. Possible worlds Michalis Potamias : Analyzing Probabilistic Graphs

  8. Distance Functions based on the PDF .44 .3 .26 1 2 inf the graph a world w PDF • Compute the PDF of the shortest path distance between B and D • Find the shortest path distance between B and D in each world shortest path distance (B,D) (Frank, OR 1969) Michalis Potamias : Analyzing Probabilistic Graphs

  9. Distance Functions based on the PDF .44 .3 .26 1 2 inf • A distance function is a scalar. • Use the Shortest Path PDF: • Expected • Infinity problem • Median • Mode (majority) • ExpectedReliable • Hard (Valiant, SIAM J. Comput. 1979) the graph PDF shortest path distance (B,D) Michalis Potamias : Analyzing Probabilistic Graphs

  10. Computing the functions: Sampling • Algorithm: • Sample M worlds. • In each world w perform a Dijkstra traversal to compute the shortest path distance. • Compute the sample median distance. • In practice, a small number of worlds yields a good approximation. Michalis Potamias : Analyzing Probabilistic Graphs

  11. K-Nearest Neighbor Query kNN query example:Given a probabilistic PPI, and a source protein find the set of kproteins closest to the source. Algorithm: Sample M worlds. In each world perform a Dijkstratraversal. Approximate the median shortest path distance of the source to all nodes in the graph. Process the kNN query. Michalis Potamias : Analyzing Probabilistic Graphs

  12. kNN Processing 1nn - median Source node: A Sample: 5 worlds Michalis Potamias : Analyzing Probabilistic Graphs

  13. kNN Processing 1nn - median Source node: A Sample: 5 worlds Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  14. kNN Processing 1nn - median Source node: A Sample: 5 worlds 3 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  15. kNN Processing 1nn - median Source node: A Sample: 5 worlds 3 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  16. kNN Processing 1nn - median Source node: A Sample: 5 worlds 2 3 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  17. kNN Processing 1nn - median Source node: A Sample: 5 worlds 2 3 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  18. kNN Processing 1nn - median Source node: A Sample: 5 worlds 2 3 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  19. kNN Processing 1nn - median Source node: A Sample: 5 worlds 3 2 3 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  20. kNN Processing 1nn - median Source node: A Sample: 5 worlds 3 2 1 1 1 2 2 3 2 2 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  21. kNN Processing 1nn - median Source node: A Sample: 5 worlds 3 2 1 1 1 2 2 3 2 2 Shortest Path Distance PDF from A to each other node Michalis Potamias : Analyzing Probabilistic Graphs

  22. kNN Pruning 1nn - median Source node: A Sample: 5 worlds • The pruning algorithm: • Idea: Increase the horizon of each dijkstra one hop at a time and maintain truncated PDFs. • Node insertion to kNN set: Once its median distance is found. • Termination condition: kNN set has size equal to k. Michalis Potamias : Analyzing Probabilistic Graphs

  23. kNN Pruning 1nn - median Source node: A Sample: 5 worlds kNN set Michalis Potamias : Analyzing Probabilistic Graphs

  24. kNN Pruning 1nn - median Source node: A Sample: 5 worlds 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  25. kNN Pruning 1nn - median Source node: A Sample: 5 worlds 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  26. kNN Pruning 1nn - median Source node: A Sample: 5 worlds 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  27. kNN Pruning 1nn - median Source node: A Sample: 5 worlds 1 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  28. kNN Pruning 1nn - median Source node: A Sample: 5 worlds • 1NN set is complete with B • 2 nodes visited • Same answer as previously • Overhead: dijkstra state needs to be maintained in memory for all worlds B 1 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  29. kNN Pruning 1nn - median Source node: A Sample: 5 worlds 1 B 1 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  30. kNN Pruning 1nn - median Source node: A Sample: 5 worlds 1 B 1 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  31. kNN Pruning 1nn - median Source node: A Sample: 5 worlds B 1 1 kNN set Shortest Path Distance PDF from A to discovered nodes Michalis Potamias : Analyzing Probabilistic Graphs

  32. kNN Pruning Practical? 5NN query with a sample of 200 worlds. Speedups: 247x (BIOMINE), 111x (FLICKR), 269x (DBLP) v v BIOMINEDatabase of biological entities and uncertain interactions fromUHelsinki1M nodes, 10M edges FLICKRUsers from flickr.com. Edges have been created assuming homophily based on jaccard of flickr groups77K nodes, 20M edges DBLPAuthors from dblp. Probabilities have been assigned based on number of coauthored papers226K nodes, 1.4M edges Michalis Potamias : Analyzing Probabilistic Graphs

  33. Useful Distance Functions Dataset: Probabilistic PPI network [Krogan et al, Nature 06] Known protein co-complex relationships (ground truth) [Mewes et al, Nuc Acids Res 04] Experiment: Choose a ground truth pair of proteins (A,B) Choose a protein C such that there is no ground truth relationship for pair (A,C) Classification task: Distinguish between the two types of pairs using the PPI network Useful? Michalis Potamias : Analyzing Probabilistic Graphs

  34. Outline • Distance and k-nearest neighbors • Distance definition • Sampling • kNN Pruning • Predicting known relationships from PPI networks • Clustering • Edit distance and cluster graphs • Clustering probabilistic graphs • Learning in information propagation • The problem • Anecdotes • Ongoing and Future work • Information propagation • Random walks • Daily deals Michalis Potamias : Analyzing Probabilistic Graphs

  35. Clustering Probabilistic Graphs Michalis Potamias : Analyzing Probabilistic Graphs

  36. Graph Edit Distance and Cluster Graphs Cluster Graph: A set of disjoint cliques Michalis Potamias : Analyzing Probabilistic Graphs

  37. ClusterEdit Parameter free; the number of clusters is part of the output. ClusterEdit:[Shamir et al., Disc. Applied Math., 2004]: Given graph G, find a cluster-graph C such that the edit distance between G and C is minimized pClusterEdit: Given probabilistic graph G, find a cluster-graph C such that the expected edit distance between a world W and C is minimized Michalis Potamias : Analyzing Probabilistic Graphs

  38. pClusterEdit Find cluster graph C such that minimizes Correlation clustering.[Bansal et al., ML 2004] Linear time randomized expected 5-approximation algorithm. [Ailon et al., JACM 2008] Michalis Potamias : Analyzing Probabilistic Graphs

  39. Work on Probabilistic Graphs • Most Probable Path [Sevon et al., DILS 2006] • Reliable Subgraphs [Hintsanen and Toivonen, ECML/PKDD 2008] • Nearest Neighbors [Potamias et al., VLDB 2010] • Frequent Subgraphs [Zou et al., KDD 2010] • Top-k Maximal Cliques [Zou et al., ICDE 2010] • Clustering [Potamias et al., ongoing work] Michalis Potamias : Analyzing Probabilistic Graphs

  40. Probabilistic Graphs Roadmap • Distance and k-nearest neighbors • Distance definition • Sampling • kNN Pruning • Predicting known relationships from PPI networks • Clustering • Edit distance and cluster graphs • Clustering probabilistic graphs • Learning in information propagation • The problem • Anecdotes • Ongoing and Future work • Information propagation • Random walks • Daily deals Michalis Potamias : Analyzing Probabilistic Graphs

  41. Information propagation Learning the probabilities of the edges in Information Propagation ? ? ? ? ? Michalis Potamias : Analyzing Probabilistic Graphs

  42. The Problem • Observe the time series of an information item’s spread in a given network. How endogenous and exogenous is the information item? • Previous studies on inference of propagation probabilities: • Treat all information items as if they were identical. • Assume that the network explains the observed spread. [Rodriguez et al., KDD 2010][Goyalet al., WSDM 2010] Michalis Potamias : Analyzing Probabilistic Graphs

  43. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

  44. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

  45. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

  46. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

  47. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

  48. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

  49. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

  50. Endogenous Information Michalis Potamias : Analyzing Probabilistic Graphs

More Related