1 / 151

Statistical Models for Web Search Click Log Analysis

Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond. Statistical Models for Web Search Click Log Analysis. Prologue. Search Results for “CIKM”. # of clicks received. Prologue. Adapt ranking to user clicks?. # of clicks received. Prologue.

Télécharger la présentation

Statistical Models for Web Search Click Log Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fan Guo Chao LiuCarnegie Mellon University Microsoft Research-Redmond Statistical Models for Web Search Click Log Analysis

  2. Prologue • Search Results for “CIKM” # of clicks received CIKM'09 Tutorial, Hong Kong, China

  3. Prologue • Adapt ranking to user clicks? # of clicks received CIKM'09 Tutorial, Hong Kong, China

  4. Prologue • Tools needed for non-trivial cases # of clicks received CIKM'09 Tutorial, Hong Kong, China

  5. Motivation – Click Data Are Valuable • One of the most extensive (yet indirect) surveys of user experience. • For researchers: • Help understand human interaction with IR results • Design and calibrate novel models and hypotheses • For practitioners: • Measure, monitor and improve search engine performance. • Attract more page views and clicks, boost profit CIKM'09 Tutorial, Hong Kong, China

  6. Tutorial Goals • Introduce problems and applications in web search click modeling. • Present latest development of click models in web search. • Provide examples and discuss trade-offs for model design, implementation and evaluation. CIKM'09 Tutorial, Hong Kong, China

  7. Presenters – Fan Guo • Ph.D. Student (exp. 2011), Computer Science Department, Carnegie Mellon University • Advisor: Christos Faloutsos • Dissertation topic: graph mining for large bioinformatics image databases • 2008, M.S., CMU • 2005, B.E., Tsinghua University, Beijing, China CIKM'09 Tutorial, Hong Kong, China

  8. Presenters – Chao Liu • Researcher, Internet Services Research Center (ISRC), MSR-Redmond. • Research focus: large-scale search/browsing log analysis for effective Web information access. • 2007, Ph.D., UIUC2005, M.S., UIUC • Advisor: Jiawei Han • Dissertation on statistical debugging and automated failure analysis • 2003, B.S., Peking University, China CIKM'09 Tutorial, Hong Kong, China

  9. Outline • Introduction • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  10. Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  11. Diverse User Feedbacks • Click-through • Browser action • Dwelling time • Explicit judgment • Other page elements CIKM'09 Tutorial, Hong Kong, China

  12. Web Search Click Log • Auto-generated data keeping important information about search activity. CIKM'09 Tutorial, Hong Kong, China

  13. Web search click log • A real world example CIKM'09 Tutorial, Hong Kong, China

  14. Web Search Click Log • How large is the click log? • search logs: 10+ TB/day • In existing publications: • [Craswell+08]: 108k sessions • [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) • [Guo +09a] : 8.8M sessions from 110k unique queries • [Guo+09b]: 8.8M sessions from 110k unique queries • [Chapelle+09]: 58M sessions from 682k unique queries • [Liu+09a]: 0.26PB data from 103M unique queries CIKM'09 Tutorial, Hong Kong, China

  15. Web Search Click Log • How large is one ? CIKM'09 Tutorial, Hong Kong, China

  16. Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  17. Interpret Clicks: an Example • Clicks are good… • Are these two clicks equally “good”? • Non-clicks may have excuses: • Not relevant • Not examined CIKM'09 Tutorial, Hong Kong, China

  18. Eye-tracking User Study CIKM'09 Tutorial, Hong Kong, China

  19. Click Position-bias • Higher positions receive more user attention (eye fixation) and clicks than lower positions. • This is true even in the extreme setting where the order of positions is reversed. • “Clicks are informative but biased”. Percentage Normal Position Percentage [Joachims+07] Reversed Impression CIKM'09 Tutorial, Hong Kong, China

  20. Clicks as Relative Judgments • “Clicked > Skipped Above” [Joachims02] • Preference pairs:#5>#2, #5>#3, #5>#4. • Use Rank SVM to optimize the retrieval function. • Limitation: • Confidence of judgments • Little implication to user modeling 1 2 3 4 5 6 7 8 CIKM'09 Tutorial, Hong Kong, China

  21. Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  22. Problem Definition • Given a set of web search click logs: • Predict clicks: output the probability of click vectors given a new order of URLs. 210 possibilities! CIKM'09 Tutorial, Hong Kong, China

  23. The Heart of Solution • Given a set of web search click logs: • Estimate relevance: measures how good a URL is with regard to the information need of the query/user. Relevance score = 0.5 CIKM'09 Tutorial, Hong Kong, China

  24. Measuring Relevance • The probability of a click if the document appears at the top position. • Relevance score = 0.5 indicates that on average, the document will be clicked once per 2 sessions. • Bayesian click models characterize relevance using a probability distribution Density function Relevance score CIKM'09 Tutorial, Hong Kong, China

  25. Desired Properties • Effective: aware of the position-bias and address it properly • Scalable: linear complexity for both time and space, easy to parallel • Incremental: flexible for model update based on new data CIKM'09 Tutorial, Hong Kong, China

  26. Outline • Introduction • Web search click logs • Interpret clicks as relevance feedback • Building statistical models for clicks • Applications of click models • Designing click models • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  27. Applications of click models 0.72 • Optimizing the retrieval function • Ranking alternation based on clicks [Liu+09b] 0.20 0.05 0.08 0.90 0.10 CIKM'09 Tutorial, Hong Kong, China

  28. Applications of click models • Optimizing the retrieval function • Ranking alternation based on clicks • As a feature to a learning-to-rank system (e.g., RankNet [Burges+05] ) CIKM'09 Tutorial, Hong Kong, China

  29. Applications of click models • Online advertising • User model for sponsored search auctions CIKM'09 Tutorial, Hong Kong, China

  30. Applications of click models • Online advertising • User model for sponsored search auctions • Click through rate (CTR) prediction [Zhu+10] CIKM'09 Tutorial, Hong Kong, China

  31. Applications of click models • Search engine evaluation • Pskip [Wang+09]: click-through-rate above last clicks; dwelling time features could also be incorporated. CIKM'09 Tutorial, Hong Kong, China

  32. Applications of click models • Search engine evaluation • Pskip [Wang+09]: click-through-rate above last clicks; • Search relevance score [Guo+09c]: average relevance score weighted by chance of examination CIKM'09 Tutorial, Hong Kong, China

  33. Applications of click models • User behavior analysis • A preliminary work showing different user behavior patterns for navigational and informational queries [Guo+09c] CIKM'09 Tutorial, Hong Kong, China

  34. Outline • Introduction • Designing click models • Basic user hypotheses • Modeling the first click • Extending to multiple clicks • Summary of model design • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  35. Examination Hypothesis [Richardson+07] • A document must be examined before a click. • The (conditional) probability of click upon examination depends on document relevance. CIKM'09 Tutorial, Hong Kong, China

  36. Examination Hypothesis [Richardson+07] • The click probability could be decomposed: • Global component: the examination probability which reflects the position-bias • Local component: depends on the (query, URL) pair only • The building block for every existing model! CIKM'09 Tutorial, Hong Kong, China

  37. Cascade Hypothesis [Craswell+08] • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1) CIKM'09 Tutorial, Hong Kong, China

  38. Cascade Hypothesis [Craswell+08] • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1) CIKM'09 Tutorial, Hong Kong, China

  39. Cascade Hypothesis [Craswell+08] • Limitation: examination/click rate monotonically decreases with rank, which is not always true. • Some models do not follow this hypothesis (e.g., UBM) Web search data in [Guo+09a] Ads click data in [Zhu+10] CIKM'09 Tutorial, Hong Kong, China

  40. Outline • Introduction • Designing click models • Basic user hypotheses • Modeling the first click • Extending to multiple clicks • Summary of model design • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  41. Cascade model • Put together two hypotheses: • Formal model specification: • P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui • P(E1=1) =1, P(Ei+1=1|Ei=0) = 0 • P(Ei+1=1|Ei=1, Ci=0)=1 Cascade Model =[Craswell+08] examination hypothesis cascade hypothesis modeling a single click CIKM'09 Tutorial, Hong Kong, China

  42. Cascade model • The user behavior chart: Examine the URL Click? No Yes See Next URL? Yes Done Index for URL at position i CIKM'09 Tutorial, Hong Kong, China

  43. Alternatives • First click in Click Chain Model [Guo+09b] as well asDynamic Bayesian Network model [Chapelle+09] Examinethe URL Click? No Yes See Next URL? Yes No The chance that user may immediately abandon examination w/o a click. Done Done CIKM'09 Tutorial, Hong Kong, China

  44. Alternatives • First click in User Browsing Model [Dupret+08] Examinethe URL Click? No Yes See Next URL? Noi ←i+1 Yes Position-dependent parameters Done CIKM'09 Tutorial, Hong Kong, China

  45. Outline • Introduction • Designing click models • Basic user hypotheses • Modeling the first click • Extending to multiple clicks • Summary of model design • Bayesian click models • Selected topics on click models • Conclusion CIKM'09 Tutorial, Hong Kong, China

  46. Dependent Click Model [Guo+09a] • Generalize the cascade model to 1+ clicks: • P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui • P(E1=1) =1, P(Ei+1=1|Ei=0) = 0 • P(Ei+1=1|Ei=1, Ci=0)=1 • P(Ei+1=1|Ei=1, Ci=1)= λi λ:global parameters characterizing user browsing behavior CIKM'09 Tutorial, Hong Kong, China

  47. Dependent Click Model [Guo+09a] • Generalize the cascade model to 1+ clicks: CIKM'09 Tutorial, Hong Kong, China

  48. Dependent Click Model [Guo+09a] • DCM Algorithms: • Input: for each query session, the query term, with (URL, clicked) tuple for all top-10 positions. • Output: relevance for each (query, URL) pair;global parameters for user behavior • Method: approximate* maximum-likelihood estimation. CIKM'09 Tutorial, Hong Kong, China *Footnote: the algorithm maximizes a lower bound of log-likelihood function.

  49. Detour: last clicked position Last clicked position CIKM'09 Tutorial, Hong Kong, China

  50. Detour: last clicked position Last clicked position CIKM'09 Tutorial, Hong Kong, China

More Related