1 / 38

Click Chain Model in Web Search

Click Chain Model in Web Search. Fan Guo Carnegie Mellon University. PPT Revised and Presented by Xin Xin. Outline. Background and motivation Designing a click model Algorithms Experiments. How to utilize users’ feedback to improve search engine results?. Diverse User Feedback.

snow
Télécharger la présentation

Click Chain Model in Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Click Chain Model in Web Search Fan GuoCarnegie Mellon University PPT Revised and Presented by Xin Xin

  2. Outline • Background and motivation • Designing a click model • Algorithms • Experiments

  3. How to utilize users’ feedback to improve search engine results?

  4. Diverse User Feedback • Click-through • Browser action • Dwelling time • Explicit judgment • Other page elements 5

  5. Web Search Click Log • Auto-generated data keeping important information about search activity.

  6. A real world example

  7. How large is the clicklog? • search logs: 10+ TB/day • In existing publications: • [Craswell+08]: 108k sessions • [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) • [Guo +09a] : 8.8M sessions from 110k unique queries • [Guo+09b]: 8.8M sessions from 110k unique queries • [Chapelle+09]: 58M sessions from 682k unique queries • [Liu+09a]: 0.26PB data from 103M unique queries

  8. Intuition to Utilize Clicks • Adapt ranking to user clicks # of clicks received

  9. Position Bias Problem # of clicks received

  10. Problem Definition • Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be • Aware of the position bias and context dependency • Scalable to Terabyte data • Incremental to stay updated

  11. Outline • Background and motivation • Designing a click model • Algorithms • Experiments

  12. Examination Hypothesis • A document must be examined before a click. • The (conditional) probability of click upon examination depends on document relevance.

  13. Cascade Hypothesis • The first document is always examined. • First-order Markov property: • Examination at position (i+1) depends on examination and click at position i only • Examination follows a strict linear order: Position i Position (i+1)

  14. User Behavior Description Examine the Document Click? No Yes See Next Doc? No Yes Done Yes See Next Doc? No Done

  15. Click Chain Model … R1 R2 R3 R4 R5 Cascade Hypothesis … E1 E2 E3 E4 E5 Examination Hypothesis C1 C2 C3 C4 C5 …

  16. Outline • Background and motivation • Designing a click model • Algorithms • Experiments

  17. A Coin-Toss Example for Bayesian Framework Posterior Prior Density Function(not normalized) x1(1-x)0x2(1-x)0 x3(1-x)0x3(1-x)1 x4(1-x)1

  18. Click Data Example x1(1-x)0(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x1(1-x)1(1-0.6x)0(1+0.3x)1(1-0.5x)0(1-0.2x)0 … x2(1-x)1(1-0.6x)0(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)0(1-0.2x)0 … x3(1-x)1(1-0.6x)1(1+0.3x)2(1-0.5x)1(1-0.2x)0 … Prior Density Function(not normalized)

  19. Estimating P(C|Ri)

  20. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …

  21. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …

  22. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …

  23. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …

  24. 0 1 0 1 … R1 R2 R3 R4 R5 … E1 E2 E3 E4 E5 C1 C2 C3 C4 C5 …

  25. Putting them together

  26. Alpha Estimation

  27. Outline • Background and motivation • Designing a click model • Algorithms • Experiments

  28. Data Set • Collected in 2 weeks in July 2008. • Preprocessing: • Discard no-click sessions for fair comparison. • 178 most frequent queries removed. • Split to training/test sets according to time stamps.

  29. Data Set • After preprocessing: • 110,630 distinct queries; • 4.8M/4.0M query sessions in the training/test set.

  30. Metric • Efficiency: • Computational Time • Effectiveness: • Perplexity • Log-likely hood • Click Prediction.

  31. Competitors • UBM: User Browsing Model (Dupret et al., SIGIR’08) • DCM: Dependent Click Model (WSDM’09)

  32. Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b.

  33. Results – Perplexity Worse Better

  34. Results – Log Likelihood Better Worse

  35. First Clicked Position

  36. Last Clicked Position

  37. The End

More Related