Query Chains: Learning to Rank from Implicit Feedback

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr

The Problem • The results returned from web searches can be cluttered with results that the user considers to be irrelevant • Search engines don’t learn from your document selections or from revisions to your query

Page Ranking Non-learning Methods • Link-based (Google PageRank) Learning Methods • Explicit user feedback • Ask the user how relevant they found the result • Very accurate data, but very time-consuming • Implicit user feedback • Determine the relevance by looking at search engine logs • Unlimited data at a low cost, but requires interpretation

The Solution • Automatically detect query chains • Use query chains to infer relevance of results in each query and between results from all queries in the chain • Use a ranking Support Vector Machine (SVM) to learn a retrieval function from the results. • Osmot search engine based on this model

Query Chains • People often reword their queries to get more useful results • Spelling mistake • Increased or decreased specificity • New but related query • Query chains are defined as a sequence of reformulated queries

Support Vector Machines • Learning method used for classification • Separates two classes of data points by generating a hyperplane that maximizes the vector distance between the two sets and the hyperplane • Uses the hyperplane to assign new data points to one of the two classes

Identifying Query Chains • Manually labeled query chains from the Cornell University library search engine for a period of five weeks • Used data to train SVM’s with various parameters, giving an accuracy of 94.3% and a precision of 96.5% • Non-learning strategy of assuming all queries from the same IP in a 30 minute period belong to the same chain gave an accuracy and precision of 91.6% • The non-learning strategy was sufficiently accurate and less expensive so they used it instead

Inferring Relevance Developed six strategies for generating feedback from query chains • Click >q Skip Above: A clicked on document is more relevant than any documents above it • Click First >q No-Click Second: Given the first two document results, if the first was clicked, it is more relevant • Strategies 3 and 4 are the same as the first two, but with respect to the previous query • Click >q’ Skip Earlier Query: A clicked on document is more relevant than any that were skipped in any earlier query • Click >q’ Top Two Earlier Query: If nothing was clicked in the last query, the clicked document is more relevant than the top two from an earlier query

Example

Learning Ranking Functions

Experiment • The Osmot search engine was created as a wrapper, implementing logging, analysis and ranking • Users presented with a combination of results from two different ranking functions • Evaluate which ranking was better based on which documents were clicked • Evaluation conducted over two months collecting around 2400 queries

Experiment Results • Users preferred results from the query chain ranking function 53% of the time • Model trained with query chains outperformed model trained without query chains with 99% confidence

Conclusion • Developed an algorithm to determine the relevance of a document from log entries • Developed another algorithm to use preference judgments to learn an improved ranking function • Algorithm can learn to include documents that weren’t included in the original search results

Critique • The learning method uses only log files rather than constantly updating itself • Referred to other papers rather than explain concepts needed to understand the paper • Didn’t offer a comparison between the effectiveness of their learning algorithm compared to other learning algorithms

Questions?

Query Chains: Learning to Rank from Implicit Feedback