1 / 12

PageRank

PageRank. Roshnika Fernando. Why PageRank?. The internet is a global system of networks linking to smaller networks. This system keeps growing, so there must be a way to sort though all the information available.

maude
Télécharger la présentation

PageRank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PageRank Roshnika Fernando

  2. Why PageRank? • The internet is a global system of networks linking to smaller networks. • This system keeps growing, so there must be a way to sort though all the information available. • PageRank is the algorithm used by the search engine Google to sort through internet webpages • A webpage’s rank determines the order it appears when a keyword search is performed on Google • Fun Fact: PageRank is named after Larry Page, one of the founders of Google, not after webpages

  3. Popularity Contest • Rank, at its simplest, is the probability that a webpage will be visited • Sum of rank of all pages is 1 • Rank of linked pagesaffects rank of page • Initially, rank = 1/(total # of pages available) ≈ 0 for internet

  4. Determining Rank • Let P be an i x j stochastic matrix where pi,j is the probability of going to webpage j from webpage i. • pi,j = (# of links to page j from page i) (# of links on page i) • Note: i and j are integers and positive values • Note: There are around 25 billion pi,j combinations on the internet

  5. Long Term Probability • After a very long time, what is the probability that web surfers will be at a certain website? • Let be the stationary distribution vector where is the probability of being at state k. • Since stochastic matrices have eigenvalueλ = 1, • Solve for to determine long term probability of being at each webpage (aka the rank)

  6. Small Scale Example 7 pages linked to one another

  7. Linear Program • Solve for x vector using (P - I)x = 0 to obtain Page Rank • x vector is the eigenvector for eigenvalueλ = 1

  8. Small Scale Solution As t → ∞ pi,j given PageRank: x1 = .304 x2 = .166 x3 = .141 x4 = .105 x5 = .179 x6 = .045 x7 = .061

  9. Sensitivity Analysis • What if a page has no links? What happens to the probability matrix P? • P is stochastic, meaning the sum of the columns must equal 1. • If a page has no links leading out, then pi,j for that given column will be distributed evenly to all rows in j so that • This assumes when someone reaches a dead end, the possibility of him/her going to a new page is entirely random

  10. Probability and Rank The stationary distribution vector contains the rank of each webpage, which determines the order it appear when a keyword search is performed This rank is the probability that a person will be at each of the billions of pages available online. This takes several powerful computers to compute.

  11. Questions?

  12. Citations • Austin, David. "How Google Finds Your Needle in the Web's Haystack." AMS.org. American Mathematical Society. Web. 09 Nov. 2009. <http://www.ams.org/featurecolumn/archive/page rank.html>. • "PageRank." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/PageRank#False_or_ spoofed_PageRank>. • Photograph. PageRanks-Example. Wikipedia, 8 July 2009. Web. 9 Nov. 2009. <http://upload.wikimedia.org/wikipedia/commons/f/fb /PageRanks-Example.svg>. • "Stochastic matrix." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/Stochastic_matrix>.

More Related