Inverted Index

# Inverted Index

Télécharger la présentation

## Inverted Index

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Inverted Index Allows quick lookup of document ids with a particular word Posting list lexicon/dictionary DIC PL(Stanford) Stanford PL(UCLA) UCLA MIT PL(MIT) …

2. PageRank A page is important if it is pointed by many important pages PR(p) = PR(p1)/c1 + … + PR(pk)/ckpi : page pointing to p, ci : number of links in pi PageRank of p is the sum of PageRanks of its parents One equation for every page N equations, N unknown variables Junghoo "John" Cho (UCLA Computer Science) 2

3. Example: Web of 1842 Ne MS Am • Netscape, Microsoft and Amazon PR(n) = PR(n)/2 + PR(a)/2 PR(m) = PR(a)/2 PR(a) = PR(n)/2+PR(m) Junghoo "John" Cho (UCLA Computer Science) 3

4. PageRank: Matrix Notation Web graph matrix M = { mij } Each page i corresponds to row i and column i of the matrix M mij = 1/c if page i is one of the c children of page jmij = 0 otherwise PageRank vector PageRank equation Junghoo "John" Cho (UCLA Computer Science) 4

5. PageRank: Iterative Computation Initially every page has a unit of importance At each round, each page shares its importance among its children and receives new importance from its parents Eventually the importance of each page reaches a limit Stochastic matrix Junghoo "John" Cho (UCLA Computer Science) 5

6. Example: Web of 1842 Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 6

7. PageRank: Random Surfer Model The probability of a Web surfer to reach a page after many clicks, following random links Random Click Junghoo "John" Cho (UCLA Computer Science) 7

8. Problems on the Real Web Dead end A page with no links to send importance All importance “leak out of” the Web Crawler trap A group of one or more pages that have no links out of the group Accumulate all the importance of the Web Junghoo "John" Cho (UCLA Computer Science) 8

9. Example: Dead End No link from Microsoft Dead end Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 9

10. Example: Dead End Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 10

11. Solution to Dead End Assume a surfer to jumps to a random page at a dead end Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 11

12. Example: Crawler Trap Only self-link at Microsoft Crawler trap Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 12

13. Example: Crawler Trap Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 13

14. Crawler Trap: Damping Factor “Tax” each page some fraction of its importance and distribute it equally Probability to jump to a random page Assuming 20% tax Junghoo "John" Cho (UCLA Computer Science) 14

15. Algorithm KMP while (m + i) < |D| do: if W[i] = D[m + i], let i = i + 1 if i = |W|, return m otherwise, let m = m + i - T[i], if i > 0, let i = T[i] return no-match