1 / 27

Topics and Transitions: Investigation of User Search Behavior

Topics and Transitions: Investigation of User Search Behavior. Xuehua Shen, Susan Dumais, Eric Horvitz. What’s next for the user ?. Outline. Problem Automatic Topic Tagging Predictive models Evaluation Experiments and analysis Conclusion and future directions. Problem.

Télécharger la présentation

Topics and Transitions: Investigation of User Search Behavior

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz

  2. What’s next for the user?

  3. Outline • Problem • Automatic Topic Tagging • Predictive models • Evaluation • Experiments and analysis • Conclusion and future directions

  4. Problem • Opportunity: Personalizing search • Focus: What topics do users explore? • How similar are users to each other, to special groups, and to the population at large? • Data, data, data… • MSN search engine log • Query & clickthrough • 87,449,277 rows, 36,895,634 URLs 5% sample from MSN logs, 05/29-06/29 • Create predictive models of topic of queries and urls visited

  5. Automatic Topic Tagging • ODP (Open Directory Project) manually categorize URLs • MSN extended methods with heuristics to cover more urls • We develop a tool to automatically tag every URL in the log 15 top-level categories Arts, Business, Computers, Games, Health, Home, Kids_and_Teens, News, Recreation, Reference, Science, Shopping, Society, Sports, Adult

  6. multiple tagging Avg: 1.38 tags per URL A Snippet

  7. Predictive Model: User Perspective • Individual model Use only individual clickthrough to build a model for each user’s predictions • Group model Group similar users to build a model for each group’s prediction (e.g., group users with same ‘max topic’ clickthrough) • Population model Use clickthrough data for all users to build a model for all users predictions

  8. ? ? ? Predictive Model: Considering Time Dependence • Marginal model • Base probability for topics • Markov model • Probability of moving from one topic to another • Time-interval-specific Markov model • User search behavior has two different patterns

  9. Evaluation Metrics • KL (Kullback-Leibler) Divergence • Likelihood • Top K Match the real top K topics and predicted top K’ topics

  10. Experiment • 5 weeks data (05/22-06/29) • Build models based on different subsets of total data • Do prediction for a “holdout set”: Other weeks data

  11. Results from Basic Experiment Marginal model: Individual model has best performance Markov model: Consistently better than corresponding marginal model Markov model: Individual model has no best performance: Why?

  12. Results: Training Data Size Greater amounts of training data  Markov (same for Marginal) models improve But: Individual Markov model still can’t beat Population Markov model

  13. Results: Smoothing Using population Markov model to smooth helps individual Markov model But: smoothed individual Markov model still can’t outperform population model

  14. Results: Time Decay Effect When time of training data decays, the prediction accuracy decreases

  15. Results: Time-Interval-Specific Markov Model Markov Models capture short time access pattern better

  16. Conclusion • Use ODP categorization to tag URLs visited by users • Construct marginal and Markov models using tagged URLs • Explore performance of marginal and Markov models to predict transitions among topics • Set of results relating topic transition behaviors of population, groups, and specific users

  17. Directions • Study of reliability, failure modes of automated tagging process (use of expert human taggers) • Combination of query and clickthrough topics • Formulating and studying different groups of people • Topic-centric evaluation • Application of results in personalization of search experience • Interpretation of topics associated with queries • Ranking of results • Designs for client UI

  18. Acknowledgement • Susan and Eric for great mentoring and discussion • Johnson and Muru for development support • Haoyong for MSN Search Engine development environment

  19. Backup Slides

  20. W0/W1 Cur=1 Pre=1 W0/W1 Cur=1 Pre=2 W0/W1 Cur=1 Pre=3 W0/W1 Likelihood Results from Basic Experiment Model Individ. Group Pop #URLs #Users G>P G<P G>I G<I I>P I<P Marginal 0.274 0.274 0.176 218950 5608 2592 1240 0 0 2592 1240 Markov 0.294 0.298 0.421 207929 5508 1488 3305 1957 1423 1276 3401 Model Individ. Group Pop #URLs #Users G>P G<P G>I G<I I>P I<P Marginal 0.411 0.403 0.314 218950 5608 2539 1276 1745 1764 2816 1676 Markov 0.453 0.553 0.537 207929 5508 2568 1791 3596 701 1462 3194 Model Individ. Group Pop #URLs #Users G>P G<P G>I G<I I>P I<P Marginal 0.507 0.501 0.418 218950 5608 2504 1246 2106 1246 2883 1824 Markov 0.516 0.640 0.623 207929 5508 2554 1783 3948 542 1216 3430 Model Individ. Group Pop #URLs #Users G>P G<P G>I G<I I>P I<P Marginal 0.204 0.162 0.097 218950 5608 3763 1549 1669 3643 4268 1044 Markov 0.229 0.217 0.208 207929 5508 2540 2635 2448 2688 2707 2468 Marginal model: Individual model has best performance Markov model: Consistently better than corresponding marginal model Markov model: Population model has best performance: Why?

  21. Model Individual Group Pop #URL #User G>P G<P G>I G<I I>P I<P Marginal 0.272 0.272 0.179 86754 5608 1284 719 0 0 1284 719 Markov 0.288 0.293 0.415 82938 5508 718 1808 910 660 585 1830 Model Individual Group Pop #URL #User G>P G<P G>I G<I I>P I<P Marginal 0.296 0.296 0.182 91105 6153 1448 671 0 0 1448 671 Markov 0.340 0.356 0.416 87749 6153 881 1586 1208 818 759 1684 Model Individual Group Pop #URL #User G>P G<P G>I G<I I>P I<P Marginal 0.312 0.312 0.182 91105 6153 1492 613 0 0 1492 613 Markov 0.374 0.395 0.419 87749 6153 891 1165 1274 842 814 1458 Model Individual Group Pop #URL #User G>P G<P G>I G<I I>P I<P Marginal 0.323 0.323 0.182 91105 6153 1560 578 0 0 1560 578 Markov 0.389 0.407 0.419 87749 6153 906 974 1247 915 894 1337 Results: Training Data Size W0/W4 Cur=1 Pre=1 W0+W1 / W4 Cur=1 Pre=1 W0+W1+W2 / W4 Cur=1 Pre=1 W0+W1+W2+W3 / W4 Cur=1 Pre=1 Greater amounts of training data  Marginal and Markov models improve But: Individual Markov model still can’t beat Population Markov model

  22. Results: Smoothing Individual Marginal model with Jelinek- Mercer Smoothing W0 / W1 Cur=1 Pre=1

  23. Results: Smoothing (2) Population Markov model with Jelinek- Mercer Smoothing W0 / W1 Cur=1 Pre=1

  24. Results: Time-Interval-Specific Differentiated Markov Model W0+W1 / W2+W3+W4 Cur=1 Pre=1

  25. Results: Time Decay Effect When time of training data decays, the prediction accuracy decreases

  26. Results: Smoothing Using Marginal distribution to smooth Markov model does not help

More Related