290 likes | 424 Vues
This article delves into the mechanics of Google's PageRank algorithm, described through the concept of an imaginary web surfer. The surfer begins at any webpage, and through a combination of random surfing and link navigation, determines the rank of each page based on the number and quality of links. It discusses how PageRank's iterative algorithm converges and estimates the storage capacity and computational time required for calculations. Explore the fascinating world of web ranking and the mathematics behind it, along with insights into its impact on search engines.
E N D
The $25 Billion Eigenvector How does Google do Pagerank?
The Imaginary Web Surfer: • Starts at any page, • Randomly goes to a page linked from the current page, • Randomly goes to any web page from a dangling page, • … except sometimes (e.g. 15% of the time), goes to a purely random page.
J A A tiny web: who should get the highest rank? B I C H D G F E
The associated stochastic matrix: 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.4400 0.0150 0.0150 0.2983 0.4400 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.2983 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.2983 0.8650 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.4400 0.0150 0.0150 0.8650 0.0150 0.8650 0.0150 0.0150 0.0150 0.0150 0.0150 0.2983 0.0150 0.0150 0.8650 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.8650 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.8650 0.2983 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.2983 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.4400 0.0150 0.0150 0.0150
The Imaginary Web Surfer: • Starts at any page, • Randomly goes to a page linked from the current page, • Randomly goes to any web page from a dangling page, • … except sometimes (e.g. 15% of the time), goes to a purely random page.
And the winners are… 'http://www.loc.gov/standards/iso639-2' 'http://www.sil.org/iso639-3' 'http://www.loc.gov/standards/iso639-5' 'http://purl.org/dc/elements/1.1' 'http://purl.org/dc/terms' 'http://purl.org/dc' 'http://creativecommons.org/licenses/by/3.0' 'http://i.creativecommons.org/l/by/3.0/88x31.png' 'http://www.nlb.gov.sg' 'http://purl.org/dcpapers' 'http://www.nl.go.kr' 'http://purl.org/dcregistry' 'http://www.kc.tsukuba.ac.jp/index_en.html'
How much storage to hold this array? • Current estimate of indexed WWW: 4.7 · 1010 web pages • If placed into an array this would have 2.21 · 1021 elements • If each element is stored in 4 bytes, this would be 8.8 · 1022 bytes • Current estimate of world’s data storage capacity is 3.0 · 1018 bytes (.003% of necessary space) http://www.smartplanet.com/blog/thinking-tech/what-is-the-worlds-data-storage-capacity/6256
How much time to do one power step? • Current estimate of indexed WWW: 4.7 · 1010 web pages • If placed into an array this would have 2.21 · 1021 elements • Fastest current machine does 33.86 · 1015 operations per second • One step of y = Ay takes 3.68 days
J A How is xk+1=Axkperformed? B I C H D G F E connection = [2 5 3 4 64 5 6 5 1 10 78 1 8 9] end = [2 5 6 7 8 9 11 12 13 16]
How is xk+1=Axkperformed? • xk+1 = .15/n e, (where e is all 1’s) • start = 1 • for j = 1,…, n • col_tot = endj-start • for i = start,…,endj • ii = connectioni • xk+1ii =xk+1ii+.85/col_tot*xki • c) start =endj+1