1 / 34

Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods

Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods. Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen. Outline. Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm

jory
Télécharger la présentation

Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed PageRank ComputationBased on Iterative Aggregation-Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen

  2. Outline • Quick Review of PageRank • Distributed PageRank Computation • Motivation • Basic Idea • Algorithm • Experiments • Conclusion and Future Work

  3. PageRank - Background Ranking Web pages • Content-based methods • Link-based methods • PageRank [Page & Brin, 1998] • HITS [Kleinberg, 1998] • SALSA [Lempel & Moran, 2000]

  4. PageRank - Intuition • Page A points to B means that the author of A recommends B. • A page is of high quality if it is • referred to by many other pages • referred to by pages of high quality

  5. PageRank - Model • Random Surfer - Markov Chain

  6. PageRank - Algorithm • Power method

  7. Outline • Quick Review of PageRank • Distributed PageRank Computation • Motivation • Basic Idea • Algorithm • Experiments • Conclusion and Future Work

  8. Motivation • Compass search engine confederation

  9. Motivation (cont.)

  10. Basic Idea • Divide and conquer • Make use of the natural block structure of web graphs

  11. DPC Algorithm • Step 1 - Initialization Local nodes compute local PageRank vectors.

  12. DPC Algorithm (cont.) • Step 2 - Aggregation Central node computes the NodeRank vector.

  13. DPC Algorithm (cont.) • Step 3 - Disaggregation Local nodes compute extended local PageRank vectors. X: External nodes

  14. DPC Algorithm (cont.) • Step 4 - Central node computes the L1 distance between current global PageRank vector and previous one.

  15. Advantages • DPC mainly consists of standard PageRank computation. • Small matrices fit into main memory. • Low communication overhead.

  16. Outline • Quick Review of PageRank • Distributed PageRank Computation • Motivation • Basic Idea • Algorithm • Experiments • Conclusion and Future Work

  17. Experimental Setup • Simulation on a single Linux box. • Group web pages by sites. • For comparison • Classic power method • LPR-Ref-2 algorithm in [Wang, VLDB 2004]

  18. Data Sets • ST01/03 - crawled in 2001/2003 by Stanford WebBase Project • CN04 - crawled in 2004 from web sites in China.

  19. Evaluation Metrics • L1 distance • Kendall's τ-distance if page i and j are in different order in the two ranking lists.

  20. Accuracy of the First Iteration • L1 • Kendall

  21. Convergence Rate Number of iteration for convergence ( )

  22. Outline • Quick Review of PageRank • Distributed PageRank Computation • Experiments • Conclusion and Future Work

  23. Conclusion • A distributed PageRank computation algorithm based on iterative aggregation-disaggregation (IAD) methods with Block Jacobi smoothing. • Experiments on real web graphs show that DPC outperforms LPR-Ref-2[Wang, VLDB'04], and converges 5~7 times faster than Power method.

  24. Future Work • Implement DPC in distributed system. Integrate with Compass search engine confederation. • How to update PageRank vectors efficiently within DPC framework?

  25. Thank you !

  26. General PageRank Algorithm

  27. IAD Method - Notations • Aggregation matrix(n×N) • Disaggregation matrix(N×n)

  28. IAD Method

  29. DPC Algorithm

  30. DPC Algorithm (Cont.)

  31. DPC Algorithm (Cont.)

  32. DPC -Convergence Analysis • The global convergence of IAD method is still an open problem. • The difficulty partly comes from that the disaggregation step is non-linear. • The paper proves the global convergence of Block Jacobi method in PageRank scenario when n > 2.

  33. Experiments - Basic Facts • Distribution over number of pages hosted by sites of different size • Distribution over size of sites

  34. Experiments - Communication Overhead Pos(•) - Number of positive elements L/U - Block strictly lower/upper triangular part of P Power LPR-Ref-2 / DPC

More Related