1 / 121

Personalized Web Search using Clickthrough History

Personalized Web Search using Clickthrough History . U. Rohini 200407019 rohini@research.iiit.ac.in Language Technologies Research Center (LTRC) International Institute of Information Technology (IIIT) Hyderabad, India. Outline of the talk. Introduction Current Search Engines – Problems

damisi
Télécharger la présentation

Personalized Web Search using Clickthrough History

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalized Web Search using Clickthrough History U. Rohini 200407019 rohini@research.iiit.ac.in Language Technologies Research Center (LTRC) International Institute of Information Technology (IIIT) Hyderabad, India

  2. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web Search • Personalized Search using user Relevance Feedback: Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Personalized Search using user Relevance Feedback: Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback: Simple Statistical Language modeling based method • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  3. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web Search • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  4. Introduction • Current Web Search engines • Provide users with documents “relevant” to their information need • Issues • Information overload • To cater Hundreds of millions of users • Terabytes of data • Poor description of Information need • Short queries - Difficult to understand • Word ambiguities • Users only see top few results • Relevance • subjective – depends on the user One size Fits all ???

  5. Motivation • Search is not a solved problem! • Poorly described information need • Java – (Java island / Java programming language ) • Jaguar – (cat /car) • Lemur – (animal / lemur tool kit) • SBH – (State bank of Hyderbad/Syracuse Behavioral Health care) • Given prior information • I am into biology – best guess for Jaguar? • past queries - { information retrieval, language modeling } – best guess for lemur?

  6. Background • Prior Information – user feedback

  7. Problem Description • Personalized Search • Customize search results according to each individual user • Personalized Search - Issues • What to use to Personalize? • How to Personalize? • When not to Personalize? • How to know Personalization helped?

  8. Problem Statement • Problem: How to Personalize? • Our Direction: • Use past Search history • Long term learning • Sub Problems Broken down into 2 sub problems • How to model and represent past search contexts • How to use it to improve search results

  9. Solution Outline 1. How to model and represent past search contexts • Past search history from user over a period of time – query logs • User contexts – triples : {user,query,{relevant documents}} • Apply appropriate method, learn from user contexts, build model – user profile User Profile Learning 2. How to use it to improve search results • Get Initial Search results • Take top few documents, re-score using user profile and sort again Reranking

  10. Contributions • I Search : A suite of approaches for Personalized Web Search • Proposed Personalized search approaches • Baseline • Basic Retrieval methods • Automatic Evaluation • Analysis of Query Log • Creating Simulated Feedback

  11. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web Search • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  12. Review of Personalized Search Personalized Search Query logs Machine learning Language modeling Community based Others

  13. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web Search • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  14. I Search : A suite of approaches for Personalized Search • Suite of Approaches • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel Model based method • Machine learning based approach • Ranking SVM based method • Personalization without relevance feedback • Simple N-gram based method

  15. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web • Statistical Language modeling based approaches • Simple Language model based method • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  16. Statistical Language Modeling based Approaches: Introduction • Statistical language modeling : task of estimating probability distribution that captures statistical regularities of natural language • Applied to a number of problems – Speech, Machine Translation, IR, Summarization

  17. Statistical Language Modeling based Approaches: Background Lemur Query Formulation Model Query Given a query, which is most likely to be the Ideal Document? User Information need Ideal Document In spite of the progress, not much work to capture, model and integrate user context !

  18. Motivation for our approach Ideal document Encyclopedia gives a brief description of the physical traits of this animal. The Lemur toolkit for language modeling and information retrieval is documented and made available for download. Information retrieval User Past Search Contexts Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which

  19. Statistical Language Modeling based Approaches : Overview • From user contexts, capture statistical properties of texts • Use the same to improve search results • Different Contexts • Unigram and Bigrams • Simple N-gram based approaches • Relationship between query and document words • Noisy Channel based approach

  20. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  21. N-gram based Approaches: Motivation Ideal document Lemur - Encyclopedia gives a brief description of the physical traits of this animal. The Lemur toolkit for language modeling and information retrieval is documented and made available for download. Information retrieval Past Search Contexts Unigrams Information Retrieval Documents … Bigrams Information retrieval Searching documents Information documents … Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which

  22. Sample user profile

  23. Learning user profile Given Past search history Hu = {(q1, rf1), (q2, rf2), …, (qn, rfn)} • rfall = contentation of all rf • For each unigram wi • User profile

  24. Reranking • Recall, in general LM for IR • Our Approach

  25. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  26. Noisy Channel based Approach • Documents and Queries different information spaces • Queries – short, concise • Documents – more descriptive • Most methods to retrieval or personalized web search do not model this • We capture relationship between query and document words

  27. Noisy Channel based approach Motivation Query Generation Process (Noisy Channel) Ideal Document Retrieval Query Generation Process (Noisy Channel)

  28. Similar to Statistical Machine Translation • Given an english sentence translate into french • Given a query, retrieve documents closer to ideal document Noisy channel 1 English Sentence French Sentence P(e/f) Noisy Channel 2 Ideal Document Query P(q/w)

  29. Learning user profile • User profile: Translation Model Triples : (qw,dw,p(qw/dw)) • Use Statistical Machine Translation methods • Learning user profile training a translation model • In SMT: Training a translation model • From Parallel texts • Using EM algorithm

  30. Learning User profile • Extracting Parallel Texts • From Queries and corresponding snippets from clicked documents • Training a Translation Model • GIZA++ - an open source tool kit widely used for training translation models in Statistical Machine Translation research. U. Rohini, Vamshi Ambati, and Vasudeva Varma. Statistical machine transla- tion models for personalized search. Technical report, International Institute of Information Technology, 2007

  31. Sample user profile

  32. Reranking • Recall, in general LM for IR • Noisy Channel based approach lemur P(retrieval/lemur) Lemur encyclopedia … brief … Lemur toolkit … information retireval … Lemur - Encyclopedia gives a brief description of the physical traits of this animal. The Lemur toolkit for language modeling and information retrieval is documented and made available for download. D1 : D4:

  33. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  34. Machine Learning based Approaches:Introduction • Most machine learning for IR - Binary classification problem – “relevant” and “non-relevant” • Click through data • Click is not an absolute relevance but relative relevance • i.e., assuming clicked – relevant, un clicked - irrelevant is wrong. • Clicks – biased • Partial relative relevance - Clicked documents are more relevant than the un clicked documents.

  35. Background • Ranking SVM • A variation of SVM • Learns from Partial Relevance Data • Learning similar to classification SVM

  36. Ranking SVMs based method • Use Ranking SVMs for learning user profile • Experimented • Different features • Unigram, bigram • Different Feature weights • Boolean, Term Frequency, Normalized Term Frequency

  37. Learning user profile • User profile : a weight vector • Learning: Training an SVM Model • Steps • Extracting Features • Computing Feature Weights • Training SVM 1. Uppuluri R, Ambati V, Improving web search results using collaborative filtering, In proceedings of 3rd International Workshop on Web Personalization (ITWP), held in conjunction with AAAI 2006, 2006. 2. U. Rohini and Vasudeva Varma. A novel approach for re-ranking of search results using collaborative filtering. In Proceeedings of International Conference on Computing: Theory and Applications (ICCTA’07), pages 491–495, Kolkota, India, March 2007

  38. Extracting Features • Features : unigram, bigram Given Past search history Hu = {(q1, rf1), (q2, rf2), …, (qn, rfn)} rfall = contentation of all rf • Remove stop words from rfall • Extract all unigrams (or bigrams) from rfall

  39. Computing Feature Weights • In each Relevant Document (di), compute weights of features: • Boolean Weighting • 1 or 0 • Term Frequency Weighting • tfw – Number of times it occurs in di • Normalized Term Frequency Weighting • tfw/ |di| |Q|

  40. Training SVM • Each relevant document – represent as a string of features and corresponding weights • We used SVMlight for training

  41. Sample Training Sample User Profile

  42. Reranking • Sim(Q,D) = W. Ф(Q,D) • W – weight vector/user profile • Ф(Q,D) – vector of term and their weights • Measure of similarity between Q and D • Each term – term in the query • Term weight – product of weights in the query and the document (boolean, term frequency,normalized term frequency)

  43. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  44. Personalized Search without Relevance Feedback:Introduction • Can personalized be done without relevance feedback about which documents are relevant • How much informative are the queries posed by users • Is information contained in the queries enough to personalize?

  45. Approach • Past queries of the user available • Make effective use of past queries • Simple N-gram based approach

  46. Learning user profile Given Past search history Hu = {q1 q2, qn } qconcat : Concatenation of all queries For each unigram wi • User profile

  47. Sample user profile

  48. Reranking • In general LM for IR • Our Approach U. Rohini, Vamshi Ambati, and Vasudeva Varma. Personalized search without relevance feedback. Technical report, International Institute of Information Technology, 2007

  49. Outline of the talk • Introduction • Current Search Engines – Problems • Motivation • Background • Problem Description • Solution Outline • Contributions • Review of Personalized Search • I Search : A suite of approaches for Personalized Web • Statistical Language modeling based approaches • Simple N-gram based methods • Noisy Channel based method • Machine Learning based approach • Ranking SVM based method • Personalization without Relevance Feedback • Experiments • Query Log Study • Simulated Feedback • Conclusions and Future Directions

  50. Experiments: Introduction, Problems • Aim: To see how they perform by comparing it with a baseline • Problems • No standard evaluation framework • Data • Lack of standardization • Comparison with previous work difficult • Difficult to repeat previously conducted experiments • Difficult to share results and observations • Repeating effort to collect data over and over • Identified as a problem and need for standardization (Allan et al. 2003) • Lack of standard personalized search baselines • In our work, used a variation of the Rocchio Algorithm • Metrics

More Related