1 / 29

Estimating PageRank on Graph Streams

Estimating PageRank on Graph Streams. Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research). PageRank. PageRank Determine Ranking of nodes in graphs Typically large graphs - WWW, Social Networks Run daily by commercial search engines.

elroy
Télécharger la présentation

Estimating PageRank on Graph Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)

  2. PageRank • PageRank • Determine Ranking of nodes in graphs • Typically large graphs - WWW, Social Networks • Run daily by commercial search engines

  3. PageRank computation a b u c

  4. PageRank Computation a b u c Our Approach: No Matrix-Vector Multiplication!

  5. Our Result Many Random Walk Samples Efficiently. Approximate PageRank u

  6. Other results from Random Walks We can estimate: Mixing Time Conductance Using Streams G u

  7. Streaming Input is a “stream” e1, e2, e3, e4, e5, e6, e7, …. Few Passes Frequency moments, quantiles 010001011 011101011 0100110111 Graphs: Edges, arbitrary order Small RAM working memory

  8. Related Work • Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08) • Given an undirected graph, produces a sparse one • approximately preserves x’Lx • Can be used to compute sparse cuts • Streaming version of BK96 (Ahn, Guha 09) • Sparse cuts in 1 pass and O(n) space. • Accelarated Page Rank (McSherry 08) • heuristics ~

  9. Key Idea One walk from u length l efficiently v l Later extend to Many walks u

  10. Single Random Walk - Naive Algo. One Step with every Pass! s Constant Space Passes

  11. Second Naive Algo Single Pass Sample sufficient edges! s If , then sample 2 out-edges from each node. (store order)

  12. Comparison Naive (single walk): l Our Result: u In fact walks! Automatically:

  13. Insight: Merge Short Walks Sample fraction of nodes (centers) w w s passes - length walks w a b w w w Merge and extend short walks! Two problems: End up at node second time End up at non-sampled node w

  14. Stuck Nodes w Sample an edge from stuck. w s w Again. w w w And again... Slow? If new nodes, good in passes! w

  15. Stuck nodes Stuck on same Nodes? w w s Sample s edges from each s s w w Must include to set previous seen centers w w s w w s s s progress OR new node! w s

  16. Summary • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node : center with prob • Amortized progress, every pass w w s s s w w w w s w w s s w s

  17. Summary w Total number of passes : Total Space : w s s s w w w w s w w s s w s

  18. Summary w Set Number of passes = Space = w s s s w w w w s w w s s w s

  19. Many Walks Naive Space Bound: w We show: w s s s w w w w Observation: Many short walks not used in Single RW. s w w s s w s

  20. Many Random Walks • : probability node ’s short walk used in single RW. • If known : save lot of space! • Perform K random walks • Total number of short walks required is about • Don’t know . But can estimate.

  21. Estimating • Run K = (log n) walks of length • Gives a crude estimate of • Sufficient to double K • Continue doubling K • Gives K walks in space • Passes l u

  22. Distributions samples Space Passes Distribution: u

  23. Mixing Time, Conductance • Undirected graphs: Compare Distribution with Steady State. • Estimating difference: samples. [Batu et. al.’ 01] • approximate mixing time. • Directed, till distribution “stabilizes”: samples. • Conductance: • Recall space for walks:

  24. Results recap • - Mixing Time for Undirected Graphs : • Quadratic Approximation to Conductance • PageRank to accuracy

  25. Open Questions? • Improve passes for random walks. In particular, sub-linear space and constant passes. • Graph Cuts and Graph Sparsification for directed graphs • Better (streaming) algorithms for computing eigenvectors

  26. Thank You!

  27. Summary • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node = nodes gives center • Amortized, every pass -

  28. Summary • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node = nodes gives center • Amortized, every pass -

  29. Analysis • Total number of passes : • Total Space : • Set • Number of passes = • Space =

More Related