1 / 11

How Google Relies on Discrete Mathematics

Explore how Google orders search results using traditional information retrieval techniques and PageRank algorithm, dependent on the link structure of the web. This article provides an overview of PageRank and its mathematical definition.

gvanleer
Télécharger la présentation

How Google Relies on Discrete Mathematics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Google Relies on Discrete Mathematics Gerald Kruse Juniata College Huntingdon, PA kruse@juniata.edu http://faculty.juniata.edu/kruse

  2. How does Google order search results so well? • A mix of traditional information retrieval techniques and PageRank • PageRank is not a simple citation index • The algorithm to determine a web-page’s PageRank depends SOLELY on the link structure of the web, and NOT the content of the web-page • Link information can be determined after web-crawlers traverse each link on each web-page • Primary Source: Larry Page, Sergei Brin, et. al., The PageRank Citation Ranking: Bringing Order to the Web, Stanford Digital Library Technologies Project, 1998.

  3. PageRank analogous to popularity • The web as a graph: each page is a vertex, each hyperlink a directed edge • I am a popular page if a few very popular pages point (via hyperlinks) to me • I am a popular page if many not-necessarily popular pages point (via hyperlinks) to me Page A Page B Which of these three has the highest page rank? Page C

  4. So what is the mathematical definition of PageRank? In particular, my page’s rank is equal to the sum of the ranks of all the pages pointing to me note the scaling of each page rank

  5. Writing out the equation for each web-page in our example gives: Page A Page B Page C

  6. Even though this is a circular definition we can calculate the ranks.Re-write the system of equations as a Matrix-Vector product. The PageRank vector is simply an eigenvector (scalar*vector = matrix*vector) of the coefficient matrix! (Note: we choose the vector with )

  7. PageRank = 0.4 PageRank = 0.2 Page A Page B Page C PageRank = 0.4

  8. Note that the coefficient matrix is stochastic The eigenvector giving the rank is associated with the dominant eigenvalue of 1.Some computational issues remain: - Rank-sinks (endless hyperlink loops) - Eigenvector calculation on huge matrix

  9. Surf’s Up! Add a random-surfer term to the simple PageRank formula This models the behavior of a real web-surfer, who might jump to another page by directly typing in a URL or by choosing a bookmark, rather than clicking on a hyperlink.

  10. This gives a regular matrix • In matrix notation we have • Since we can rewrite as • The new coefficient matrix is regular, so we can calculate the eigenvector iteratively. • This iterative process is a series of matrix-vector products, beginning with an initial vector (typically the previous PageRank vector). These products can be calculated without explicitly creating the huge coefficient matrix.

  11. Any Questions? Handouts Slides also available at http://faculty.juniata.edu/kruse

More Related