1 / 29

Google’s PageRank

Google’s PageRank. By Zack Kenz. Outline. Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating the PageRank Wrapping up. Some Search Engine History. Early basis of searching was on page content only

berenice
Télécharger la présentation

Google’s PageRank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Google’s PageRank By Zack Kenz

  2. Outline • Intro to web searching • Review of Linear Algebra • Weather example • Basics of PageRank • Solving the Google Matrix • Calculating the PageRank • Wrapping up

  3. Some Search Engine History • Early basis of searching was on page content only • Bonuses for word placement • Paying for placement • Natural language searches (Think: Ask Jeeves) • Meta search engines

  4. Why Google? • No one exploited the link structure of the internet • Relatively easy to exploit content-based engines with concealed text • Adaptive to a growing internet • Simpler, faster

  5. PageRank, According to Google • “PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B.” • “Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves ‘important’ weigh more heavily and help to make other pages ‘important.’”

  6. Linear Algebra Terms • Row Stochastic Matrix • Eigenvector: A nonzero vector x such that Ax=λx for a scalar λ • Eigenvalue: A scalar λ that gives a nontrivial solution x for Ax=λx • Dominant Eigenvalue (eigenvector)

  7. Tomorrow’s Weather • Example on the board

  8. Scoring Web Pages • Random web surfer • Goal: Assign a score to over 25 billion web pages, store the scores • Score based on the probability of going to a particular page

  9. Surf’s Up!

  10. Hyperlink Matrix

  11. Hyperlink Matrix

  12. Dangling Nodes

  13. Dangling Nodes

  14. Dangling Nodes

  15. Web Link Surfer Matrix

  16. One More Fix • Need to account for the fact that a surfer can type in URLs instead of using links • Add in a personalization vector, • When multiplied by a column vector of ones, we get an additional personalization matrix

  17. One More Fix • Need to account for the fact that a surfer can type in URLs instead of using links • Add in a personalization vector, • When multiplied by a column vector of ones, we get an additional personalization matrix

  18. Google Matrix Recall is a damping factor, usually .85

  19. The True Google Matrix?

  20. Solution of the Google Matrix • Since the Google matrix is row stochastic, it has an eigenvalue of λ=1 • λ=1 is biggest and not repeated • Let be the corresponding eigenvector • The eigensystem has a unique solution for • , then, is a row probability vector

  21. contains every page’s PageRank Solution of the Google Matrix • Since the Google matrix is row stochastic, it has an eigenvalue of λ=1 • λ=1 is biggest and not repeated • Let be the corresponding eigenvector • The eigensystem has a unique solution for • , then, is a row probability vector

  22. Computing Scores:The Linear Algebra Way Recall

  23. Computing Scores:The Power Method • λ = 1 is the dominant eigenvalue of G and is the dominant left eigenvector • As a result the power method applied to G converges to the PageRank vector • Given a starting vector like , the power method calculates successive iterates until a stopping condition is reached

  24. Speeding Things Up

  25. Wrapping Up:The Overall Page Scoring • PageRank is still only a portion of what determines the order of search results • Results are based off of many factors, especially page content

  26. Wrapping Up:Improving PageRank • Avoiding link spamming – tweak the personalization vector and α • Power method convergence algorithms • Dummy node

  27. Questions?

  28. Questions? Sources • Rebecca S Wills. Google’s PageRank: The Math Behind the Search Engine. Department of Mathematics, North Carolina State University. 1 May 2006. • Amy N. Langvilleand Carl D. Meyer. Fiddling with PageRank. Department of Mathematics, North Carolina State University. 15 August 2003 • http://www.searchenginehistory.com/ • http://www.google.com/technology/ and http://www.google.com • David C Lay. Linear Algebra and Its Applications, 3ed. Pearson Education: 2003. • Dr. Biebighauser • http://eperformance.co.uk/uploaded_images/google%20beta-786468.jpg • http://webmechanics.uoregon.edu/Images/Surf%20web.jpg • http://www.modmyifone.com/iphone_wallpapers/file.php?n=282&w=l • http://www.smashingmagazine.com/images/pagerank/google-pagerank.jpg

More Related