130 likes | 353 Vues
PageRank. Roshnika Fernando. Why PageRank?. The internet is a global system of networks linking to smaller networks. This system keeps growing, so there must be a way to sort though all the information available.
E N D
PageRank Roshnika Fernando
Why PageRank? • The internet is a global system of networks linking to smaller networks. • This system keeps growing, so there must be a way to sort though all the information available. • PageRank is the algorithm used by the search engine Google to sort through internet webpages • A webpage’s rank determines the order it appears when a keyword search is performed on Google • Fun Fact: PageRank is named after Larry Page, one of the founders of Google, not after webpages
Popularity Contest • Rank, at its simplest, is the probability that a webpage will be visited • Sum of rank of all pages is 1 • Rank of linked pagesaffects rank of page • Initially, rank = 1/(total # of pages available) ≈ 0 for internet
Determining Rank • Let P be an i x j stochastic matrix where pi,j is the probability of going to webpage j from webpage i. • pi,j = (# of links to page j from page i) (# of links on page i) • Note: i and j are integers and positive values • Note: There are around 25 billion pi,j combinations on the internet
Long Term Probability • After a very long time, what is the probability that web surfers will be at a certain website? • Let be the stationary distribution vector where is the probability of being at state k. • Since stochastic matrices have eigenvalueλ = 1, • Solve for to determine long term probability of being at each webpage (aka the rank)
Small Scale Example 7 pages linked to one another
Linear Program • Solve for x vector using (P - I)x = 0 to obtain Page Rank • x vector is the eigenvector for eigenvalueλ = 1
Small Scale Solution As t → ∞ pi,j given PageRank: x1 = .304 x2 = .166 x3 = .141 x4 = .105 x5 = .179 x6 = .045 x7 = .061
Sensitivity Analysis • What if a page has no links? What happens to the probability matrix P? • P is stochastic, meaning the sum of the columns must equal 1. • If a page has no links leading out, then pi,j for that given column will be distributed evenly to all rows in j so that • This assumes when someone reaches a dead end, the possibility of him/her going to a new page is entirely random
Probability and Rank The stationary distribution vector contains the rank of each webpage, which determines the order it appear when a keyword search is performed This rank is the probability that a person will be at each of the billions of pages available online. This takes several powerful computers to compute.
Citations • Austin, David. "How Google Finds Your Needle in the Web's Haystack." AMS.org. American Mathematical Society. Web. 09 Nov. 2009. <http://www.ams.org/featurecolumn/archive/page rank.html>. • "PageRank." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/PageRank#False_or_ spoofed_PageRank>. • Photograph. PageRanks-Example. Wikipedia, 8 July 2009. Web. 9 Nov. 2009. <http://upload.wikimedia.org/wikipedia/commons/f/fb /PageRanks-Example.svg>. • "Stochastic matrix." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/Stochastic_matrix>.