1 / 24

The effect of New Links on Google Pagerank

The effect of New Links on Google Pagerank. By Hui Xie Apr , 07. Computing PageRank. Matrix representation Let P be an n n matrix and p ij be the entry at the i-th row and j-th column. If page i has k>0 outgoing links p ij = 1/k if page i has a link to page j

lorimer
Télécharger la présentation

The effect of New Links on Google Pagerank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The effect of New Links on Google Pagerank By Hui Xie Apr , 07

  2. Computing PageRank Matrix representation Let P be an nn matrix and pij be the entry at the i-th row and j-th column. If page i has k>0 outgoing links pij = 1/k if page i has a link to page j pij = 0 if there is no link from i to j If page I has no outgoing links pij = 1/n j=1,…,n

  3. Google matrix • G=cP+(1-c)(1/n)eeT e=(1,…,1)T • G is stochastic matrix Ge=e • There exists a unique column vector π such that πT G= πT, πT e=1 • πT =(1-c)/n eT(I-cP)-1

  4. Discrete Time Markov Chains • A sequence of random variables {Xn} is called a Markov chain if it has the Markov property: • States are usually labeled {(0,)1,2,…} • State space can be finite or infinite

  5. Transition Probability • Probability to jump from state i to state j • Assume stationary: independent of time • Transition probability matrix: P = (pij) • Two state MC:

  6. Side Topic: Markov Chains • A discrete time stochastic process is a sequence of random variables {X0, X1, …, Xn, …} where the 0, 1, …, n, … are discrete points in time. • A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities. • Memorylessness property: for a Markov chain • Pr[Xt+1 = j | X0 = i0, X1 = i1, …, Xt = i] = Pr[Xt+1 = j | Xt = i]

  7. Side Topic: Markov Chains • Let pi(t) be the probability of being in state i at time step t. • Let p(t) = [p0(t), p1(t), … ] be the vector of probabilities at time t. • For an initial probability distribution p(0), the probabilities at time n are • p(n) = p(0) Pn • A probability distribution p is stationary if p = p P • P(Xm+n =j|Xm = i) = P(Xn =j|X0 = i) = Pn(i,j)

  8. absorbing Markov chain Define a discrete-time absorbing markov chain {Xt ,t=0,1,…}with the state space {0,1,…,n} Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing. The transition matrix is

  9. Random walk interpretation • Walk starts at a uniformly chosen web page • At each step, if currently at page p • W/p α, go to a uniformly chosen outneighbor of p • W/p 1 - α, stop

  10. Let Njbe the total number of visits to state j before absorption including the visit at time t = 0 if X0 is j . Formally, • Then zij=(I-cP)-1ij=E(Nj|X0=I) • Let qij be the probability of reaching the state j before absorption if the initial state is i. Then we have

  11. Theorem Let X denote a Markov chain with state space E. The total number of visits to a state j∈E under the condition that the chain starts in state i is given by P(Nj=m|X0=j)=qjjm-1(1-qjj) and for i!=j P(Nj=m|X0=i)= 1-qij if m=0 qij qjjm-1(1-qjj) if m>=1 Corollary For all i,j ∈E the relations zij=(1-qii)-1 and zij=qijzjj hold

  12. Outgoing links from i do not affect qji for any j!=I So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor zii=1/(1-qii) For 0<=qii<=c2 , 1<=zii<=(1-c2)-1≈3.6 for c=0.85

  13. Rank one update of google pagerank • Page 1 with k0 old links has k1 newly created links to page 2 to k1+1 • k=k0+k1 , p1T be the first row of matrix P • Updated hyperlink matrix

  14. According to (9) the ranking of page 1 increases when For z11=1/(1-q11), zi2=qi1z11, i>1 The above is equivalent to

  15. Hence, the page 1 increases its ranking when it refers to pages that are characterized by a high value of qi1. These must be the pages that refer to page 1 or at least belong to the same Web community. Here by a Web community we mean a set of Web pages that a surfer can reach from one to another in a relatively small number of steps.

  16. the PageRank of page j increases if

  17. the PageRank of page j increases if if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links. Such situation occurs when most of newly created links point to “irrelevant” pages.

  18. For instance, let j = 2 and assume that there is no hyperlink path from pages 3,…,k+1 to page 2.Then zijis close to zero for i = 3,…, k + 1, and the PageRank of page 2 will increase only if (c/k1)z22> z12, which is not necessarily true, especially if z12 and k1 are considerably large.

  19. Asymptotic analysis • Let be the stopping time of the first visit to the state j • Mij=E( |X0=i) be the average time needed to reach j starting from i(mean first passage time)

  20. Optimal Linking Strategy • Consider a page i = 1,…,n and assume that i has links to pages i1,…,ikdistinct from i. Further, let mij(c) be the mean first passage time from page i to page j for the Google transition matrix Gwith parameter c.

  21. outgoing links from i do not affect mji(c) for any j!= i. Thus, by linking from i to j , one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j* such that Note that (surprisingly) the PageRank of j* plays no role here.

  22. Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page.

  23. Conclusions • Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links. • Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy.

More Related