1 / 49

Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results. Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3. 1 Carnegie Mellon 2 UCLA 3 IIT Bombay. --------- --------- ---------.

baker-vang
Télécharger la présentation

Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shuffling a Stacked DeckThe Case for Partially Randomized Ranking of Search Engine Results Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay

  2. --------- • --------- • --------- Popularity as a Surrogate for Quality • Search engines want to measure the “quality” of pages • Quality hard to define and measure • Various “popularity” measures are used in ranking • e.g., in-links, PageRank, usertraffic

  3. Relationship Between Popularity and Quality • Popularity : depends on the number of users who “like” a page • relies on both awareness and quality of the page • Popularity correlated with quality • when awareness is large

  4. Problem • Popularity/quality correlation weak for young pages • Even if of high quality, may not (yet) be popular due to lack of user awareness • Plus, process of gaining popularity inhibited by “entrenchment effect”

  5. --------- • --------- • --------- • --------- • --------- • --------- … user attention entrenched pages Entrenchment Effect • Search engines show entrenched (already-popular) pages at the top • Users discover pages via search engines; tend to focus on top results

  6. Outline • Problem introduction • Evidence of entrenchment effect • Key idea: Mitigate entrenchment by introducing randomness into ranking • Model of ranking and popularity evolution • Evaluation • Summary

  7. Evidences of the Entrenchment Do search engines suppress controversy? - Susan L. Gerhart More news, less diversity - New York Times Googlearchy Distinction of retrievability and visibility The politics of search engines - IEEE Computer • The political economy • of linking on the Web • ACM conf. on • Hypertext & Hypermedia Are search engines biased? - Chris Sherman Bias on the Web - Comm. of the ACM

  8. Quantification of Entrenchment Effect • Impact of Search Engines on Page Popularity • Real Web study by Cho et. al. [WWW’04] • Pages downloaded every week from 154 sites • Partitioned into 10 groups based on initial link popularity • After 7 months, • 70% of new links to top 20% pages • Decrease in PageRank for bottom 50% pages

  9. Alternative Approaches to Counter-act Entrenchment Effect • Weight links to young pages more • [Baeza-Yates et. al SPIRE ’02] • Proposed an age-based variant of PageRank • Extrapolate quality based on increase in popularity • [Cho et. al SIGMOD ’05] • Proposed an estimate of quality based on the derivative of popularity

  10. 1 1 500 2 2 3 . . . 3 . 500 499 501 501 Our Approach: Randomized Rank Promotion • Select random (young) pages to promote to good rank positions • Rank position to promote to is chosen at random

  11. Our Approach: Randomized Rank Promotion • Consequence: Users visit promoted pages; improves quality estimate • Compared with previous approaches: • Does not rely on temporal measurements (+) • Sub-optimal (-)

  12. Exploration/Exploitation Tradeoff • Exploration/Exploitation tradeoff • exploit known high-quality pages by assigning good rank positions • explore quality of new pages by promoting them in rank • Existing search engines only exploit (to our knowledge)

  13. Possible Objectives for Rank Promotion • Fairness • Give each page an equal chance to become popular • Incentive for search engines to be fair? • Quality • Maximize quality of search results seen by users (in aggregate) • Quality page p: extent to which users “like” p • Q(p) [0,1] our choice

  14. Squash Linux Model of the Web • Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.) • A community is made up of a set of pages, interested users and related queries

  15. Model of the Web • Users visit pages only by issuing queries to search engine • Mixed surfing & searching considered in the paper • Query answer = ordered list containing all pages in the corresponding community • A single ranked list associated with each community • Since queries within a community are very similar

  16. --------- • --------- • --------- • --------- • --------- • --------- … • --------- • --------- • --------- • --------- • --------- • --------- … Model of the Web Community on Squash Community on Linux • Consequence: Each community evolves independent of the other communities

  17. Quality-Per-Click Metric (QPC) • V(p,t):number of visits to page p at time t • QPC : average quality of pages viewed by users, amortized over time

  18. Outline • Problem introduction • Evidence of entrenchment effect • Key idea: Mitigate entrenchment by introducing randomness into ranking • Model of ranking and popularity evolution • Evaluation • Summary

  19. 1 1 500 2 2 3 . . . 3 . 500 499 501 501 Desiderata for Randomized Rank Promotion Want ability to: • Control exploration/exploitation tradeoff • “Select” certain pages as candidates for promotion • “Protect’’ certain pages from demotion

  20. 1 2 W 3 4 1 2 3 4 Randomized Rank Promotion Scheme Promotion pool Wm random ordering Remainder W-Wm Lm order by popularity Ld

  21. 1-r r k-1 Randomized Rank Promotion Scheme Promotion list Remainder 1 2 1 2 4 3 Ld Lm 1 2 3 4 5 6 k = 3 r = 0.5

  22. Parameters • Promotion pool(Wm) • Uniform rank promotion : give an equal chance to each page • Selective rank promotion : exclusively target zero awareness pages • Start rank (k) • rank to start randomization from • Degree of randomization (r) • controls the tradeoff between exploration and exploitation

  23. Tuning the Parameters • Objective: maximize quality-per-click (QPC) • Entrenchment in a community depends on many factors • Number of pages and users • Page lifetimes • Visits per user • Two ways to tune • set parameters per community • one parameter setting for all communities

  24. Outline • Problem introduction • Evidence of entrenchment effect • Key idea: Mitigate entrenchment by introducing randomness into ranking • Model of ranking and popularity evolution • Evaluation • Summary

  25. Popularity Evolution Cycle Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t)

  26. DETAIL Popularity to Rank Relationship • Rank of a page under randomized rank promotion scheme • determined by a combination of popularity and randomness • Deterministic Popularity-based-ranking is a special case • i.e., r=0 • Unknown function FPR:rank as a function ofthe popularity of page p under a given randomized scheme R(p,t) = FPR(P(p,t))

  27. DETAIL Viewing Likelihood • Depends primarily on rank in list [Joachims KDD’02] • From AltaVista data [Lempel et al. WWW’03]: 1 . 2 1 0 . 8 view probability 0 . 6 Probability of Viewing FRV(r) r –1.5 0 . 4 0 . 2 0 0 5 0 1 0 0 1 5 0 rank R a n k

  28. DETAIL Visit to Awareness Relationship • Awareness A(p,t) :fraction of users who have visited page p at least once by time t

  29. DETAIL Awareness to Popularity Relationship • Quality Q(p) :extent to which users like page p (contribute towards its popularity) • Popularity P(p,t) :

  30. Popularity Evolution Cycle FPR(P(p,t)) FAP(A(p,t)) Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t) FRV(R(p,t)) FVA(V(p,t))

  31. Next step : derive formula for popularity evolution curve Popularity P(p,t) time (t) Deriving Popularity Evolution Curve • Derive it using the awareness distribution of pages

  32. Deriving Popularity Evolution Curve • Assumptions • number of pages constant • Pages are created and retired according to a Poisson process with rate parameter • Quality distribution of pages is stationary In the steady state, both popularity and awareness distribution of the pages are stationary

  33. DETAIL Popularity Evolution Curve and Awareness Distribution Awareness distribution : fraction of pages of quality q whose awareness is i / (#users) Popularity EvolutionCurveE(x,q) : time duration for which a page of quality q has popularity value x Next: derive popularity evolution curve using the awareness distribution

  34. DETAIL Popularity Evolution Curve and Awareness Distribution : interpret it as the probability of a page of quality q to have awareness ai at any point of time We know that : Hence,

  35. DETAIL Deriving Awareness Distribution • : fraction of pages of quality q whose awareness is i / (#users) • Doing the steady state analysis, we get but remember that we do not know FPRyet R(p,t) = FPR(P(p,t))

  36. DETAIL Deriving Awareness Distribution Good news: rank is a combination of popularity and randomness, we can derive FPR given . (ex. below) Start with an initial form of FPR; iterate till convergence

  37. Summary of Where We Stand • Formalized the popularity evolution cycle • Relationship between popularity evolution and awareness distribution • Derived the awareness distribution • Next step: tune parameters • Recall, goal is to obtain scheme that: • achieves high QPC (quality per click) • is robust across a wide range of community types

  38. Tuning the Promotion Scheme • Parameters: k, r and Wm • Objective: maximize QPC • Influential factors: • Number of pages and users • Page lifetimes • Visits per user

  39. Default Community Setting Number of pages = 10,000 * Number of users = 1000 Visits per user = 1000 visits per day Page lifetimes = 1.5 years [Ntoulas et. al, WWW’04 ] * How Much Information? SIMS, Berkeley, 2003

  40. Tuning: Wm parameter • -no promotion • - uniform promotion • selective promotion k=1 and r=0.2

  41. Tuning: k and r • Optimal r: (0,1) • Optimal r increases • with increasing k Based on simulation (reason: analysis only accurate for small values of r)

  42. Tuning: k and r Deciding k & r : • k >= 2 for “feeling lucky” • Minimize amount of “junk” perceived • Maximize QPC

  43. Final Parameter Settings • Promotion pool (Wm ): zero-awareness pages • Start rank (k): 1 or 2 • Randomization (r) : 0.1

  44. Tuning the Promotion Scheme • Parameters: k, r and Wm • Objective: maximize QPC • Influential factors: • Number of pages and users • Page lifetimes • Visits per user

  45. Influence of Number of Pages and Users

  46. Influence of Page Lifetime and Visit rate

  47. Influence of Visit Rate 1000 visits/day per user

  48. Summary • Entrenchment effect hurts search result quality • Solution: Randomized rank promotion • Model of Web evolution and QPC metric • Used to tune & evaluate randomized rank promotion • Initial results • Significantly increases QPC • Robust across wide range of Web communities • More study required

  49. THE END • Paper available at : www.cs.cmu.edu/~spandey

More Related