1 / 54

Using Adaptive Methods for Updating/Downdating PageRank

Using Adaptive Methods for Updating/Downdating PageRank. Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala. Motivation. Problem: Compute PageRank after the Web has changed slightly Motivation: “Freshness”.

Télécharger la présentation

Using Adaptive Methods for Updating/Downdating PageRank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala

  2. Motivation • Problem: • Compute PageRank after the Web has changed slightly • Motivation: • “Freshness” Note: Since the web is growing, PageRank Computations don’t get faster as computers do.

  3. 0.4 0.4 Power Method: 0.2 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

  4. Link Counts Gene’s Home Page Martin’s Home Page Donald Rumsfeld George W. Bush Iain Duff’s Home Page Yahoo! Linked by 2 Unimportant pages Linked by 2 Important Pages

  5. importance of page i importance of page j number of outlinks from page j pages j that link to page i Definition of PageRank • The importance of a page is given by the importance of the pages that link to it.

  6. 1/2 1/2 1 1 0.05 0.25 0.1 0.1 0.1 Definition of PageRank Gene Martin SCCM Yahoo! Duff

  7. PageRank Diagram 0.333 0.333 0.333 Initialize all nodes to rank

  8. PageRank Diagram 0.167 0.333 0.333 0.167 Propagate ranks across links (multiplying by link weights)

  9. PageRank Diagram 0.5 0.333 0.167

  10. PageRank Diagram 0.167 0.5 0.167 0.167

  11. PageRank Diagram 0.333 0.5 0.167

  12. PageRank Diagram 0.4 0.4 0.2 After a while…

  13. .1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 = 0 .2 0 .3 0 0 .1 .4 0 .1 .2 Matrix Notation

  14. .1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 0 .2 0 .3 0 0 .1 .4 0 .1 = .2 Matrix Notation Find x that satisfies:

  15. Eigenvalue Distribution • The matrix PT has several eigenvalues on the unit circle. This will make power method-like algorithms less effective.

  16. Rank-1 Correction • PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET. • E is a rank 1 matrix, and in general, c=0.85. • This ensures a unique solution and fast convergence. • For matrix A, l2=c. 1 1From “The Second Eigenvalue of the Google Matrix” (http://dbpubs.stanford.edu/pub/2003-20)

  17. 0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

  18. Power Method • Initialize: • Repeat until convergence:

  19. Power Method Express x(0) in terms of eigenvectors of A u1 1 u2 a2 u3 a3 u4 a4 u5 a5

  20. Power Method u1 1 u2 a22 u3 a33 u4 a44 u5 a55

  21. Power Method u1 1 u2 a222 u3 a332 u4 a442 u5 a552

  22. Power Method u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

  23. Power Method u1 1 u2 0 u3 0 u4 0 u5 0

  24. Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A. u1 1 u2 a2 u3 a3 u4 a4 u5 a5 Why does it work? • Imagine our n x n matrix A has n distinct eigenvectors ui.

  25. All less than 1 Why does it work? • From the last slide: • To get the first iterate, multiply x(0) by A. • First eigenvalue is 1. • Therefore:

  26. u1 1 u2 a22 u3 a33 u4 a44 u5 a55 u1 1 u2 a222 u3 a332 u4 a442 u5 a552 Power Method u1 1 u2 a2 u3 a3 u4 a4 u5 a5

  27. 0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

  28. Convergence • The smaller l2, the faster the convergence of the Power Method. u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

  29. Quadratic Extrapolation (Joint work with Kamvar and Haveliwala) Estimate components of current iteratein the directions of second two eigenvectors, and eliminate them. u1 u2 u3 u4 u5

  30. Facts that work in our favor • For traditional problems: • A is smaller, often dense. • l2 often close to l1, making the power method slow. • In our problem, • A is huge and sparse • More importantly, l2 is small1. 1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

  31. How do we do this? • Assume x(k) can be written as a linear combination of the first three eigenvectors (u1, u2, u3) of A. • Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’

  32. Sequence Extrapolation • A classical and important field in numerical analysis: techniques for accelerating the convergence of slowly convergent infinite series and integrals.

  33. Example: Aitken Δ2 - Process Suppose A=An+aλn+rn where rn=bμn+o(min{1,|μ|n}), a, b, λ, μ all nonzero, |λ|>|μ|. It can be shown that Sn = (AnAn+2–An+12)/(An-2An+1+An+2) satisfies (as n goes to infinity) | Sn-A| ---------  O( (|μ|/|λ|)n = o(1). |An-A| ….

  34. In other words… Assuming a certain pattern for the series is helpful in accelerating convergence. We can apply this component-wise in order to get a better estimate of the eigenvector.

  35. Another approach • Assume the x(k) can be represented by three eigenvectors of A:

  36. Linear Combination • We take some linear combination of these 3 iterates.

  37. Rearranging Terms • We can rearrange the terms to get: Goal: Find b1,b2,b3 so that coefficients of u2 and u3are 0, and coefficient of u1is 1.

  38. Rearranging Terms • We can rearrange the terms to get: Goal: Find b1,b2,b3 so that coefficients of u2 and u3are 0, and coefficient of u1is 1.

  39. Results Quadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times.

  40. Estimating the coefficients Procedure 1: Set ß1=1 and solve the least squares problem. Procedure 2: Use the SVD for computing the coefficient of the characteristic polynomial.

  41. Results Extrapolation dramatically speeds up convergence, for high values of c (c=.99)

  42. Take-home message • Quadratic Extrapolation estimates the components of current iterate in the direction of the second and third eigenvector, and subtracts them off. • Achieves significant speedup, and ideas are useful for further speedup algorithms.

  43. Summary of this part • We make an assumption about the current iterate. • Solve for dominant eigenvector as a linear combination of the next three iterates. • We use a few iterations of the Power Method to “clean it up”.

  44. 0.4 0.4 0.2 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results Power Method:

  45. Most Pages Converge Quickly

  46. Most Pages Converge Quickly

  47. Basic Idea • When a the PageRank of a page has converged, stop recomputing it.

  48. Adaptive PageRank Algorithm

  49. Updates • Use the previous vector as a start vector. • Speedup not that great. • Why? The old pages converge quickly, but the new pages still take long to converge. • But, if you use Adaptive PageRank, you save the computation on the old pages.

  50. 0.4 0.4 Repeat: 0.2 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

More Related