1 / 32

Predicting Short-Term Interests Using Activity-Based Search Context

Predicting Short-Term Interests Using Activity-Based Search Context. CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh. Outline. Introduction Modeling Search Activity Study Conclusions. Introduction.

wren
Télécharger la présentation

Predicting Short-Term Interests Using Activity-Based Search Context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh

  2. Outline • Introduction • Modeling Search Activity • Study • Conclusions

  3. Introduction • Satisfying searchers’ information needs involves a through understanding of their interests through: - search query - search engine result page (SERP) clicks - post-SERP browsing behavior • Construct interest models of the current query which including: - previous queries - previous clicks on SERP • Evaluate the predictive effectiveness of these models using future actions

  4. Modeling Search Activity • Data - The data set contained browser logs with both searching and browsing episodes. - Log entries include a timestamp for each page view, and the URL of the Web page visited - Only in English-speaking United States locale - Search sessions on the Bing Web search engine were extracted

  5. Modeling Search Activity • ODP Labeling - Represented context a distribution across categories in ODP topical hierarchy. - Provides a consistent topical representation of queries and page visits from which to build the models. - ODP category label can also reflect topical differences in the search results for a query or a user’s interests - Automatic classification skill to assign an ODP category labels to each page. - 219 categories at the top two levels of the ODP hierarchy were used ( called L) -

  6. Modeling Search Activity • ODP Labeling - Strategy of labeling a page 1. Begin with URLs present in the ODP 2. Incrementally prunes non-present URLs until a match is found, or miss declared 3. Check for exact match with logistic regression classifier

  7. Modeling Search Activity • Sources and Source Combinations - ODP labels automatically assigned to the following sources: 1. Query: the top 10 search results for the query 2. SERPClick: the search results clicked by the user during the search session 3. NavTrai: Web pages that the user visits from a SERP click

  8. Modeling Search Activity • Model Definitions– Query Model(Q) - For each query, the category labels for the top 10 search results were obtained. - Probabilities are assigned to the categories in L by 1. normalized click frequencies for each top 10 results from search-engine click log data 2. the distribution across all ODP category labels - ODP categories in L that are not used to label are assigned the prior probabilities

  9. Modeling Search Activity • Model Definitions– Context Model(X) - The context model is constructed based on actions which comprise previous data as follows: 1. Queries 2. Web pages visited through a SERP click 3. Web pages visited on the navigational trail following a SERP click

  10. Modeling Search Activity

  11. Modeling Search Activity • Model Definition – Intent Model(I)

  12. Modeling Search Activity • Relevance Model or Ground Truth (R) - The relevance model contains actions that occur following the current query in the session

  13. Modeling Search Activity

  14. Study

  15. Study

  16. Study

  17. Study

  18. Study • Learning Optimal Context Weights Steps 1. Identify the optimal context weight (w) for each query on a held out training set 2. Create features for the query and the context that could be useful in predicting w

  19. Study • Learning Optimal Context Weights - To create a training set, the query, context, and relevance models were used to compute the optimal context weight per query by minimizing the regularized cross-entropy for each query independently.

  20. Study A regularizer that penalizes deviations from w=0.5

  21. Study • Generating Features of Query and Context - Divide features into three classes: 1. Query class: capturing characteristics of the current query and the query model. 2. Context class: capturing aspects of the pre-query interaction behavior as well as features of the context model themselves. 3. QueryContext: capturing aspects of how the query model and context model compare. - These features were generated for each session in the set and used to train a predictive model

  22. Study • Generating Features of Query and Context - Query class

  23. Study • Generating Features of Query and Context - Context class

  24. Study • Generating Features of Query and Context - QueryContext class

  25. study

  26. study • Predicting the Optimal Context Weight - 60% of those queries for training, 20%for validation, 20% for testing - 10-fold cross validation was performed to improve result reliability. - The folds were constructed by splitting session, so that all queries in a session are used for either training, validation, or testing

  27. study

  28. study • Predicting the Optimal Context Weight The most performant features related to the information divergence to the query models and the context model

  29. study • Predicting the Optimal Context Weight

  30. study

  31. study • Varying Context and Relevance Information

  32. Conclusions • A study of investigating the effectiveness of activity-based context in predicting user’s search interests. • Explored the value of modeling the current query, its context and their combination, and different sources. • Intent models developed from many sources perform best overall. • Developed techniques to learn the optimal combinations.

More Related