460 likes | 601 Vues
This document explores the complexities of understanding and classifying web queries through query log analysis and automatic classification techniques. It discusses how existing search systems primarily focus on query terms without adequately addressing the tasks and topics users intend to convey. By analyzing user queries over time and classifying them into relevant topical categories, this research aims to improve the effectiveness and efficiency of search results. Key findings reveal significant variations in query trends and similarities across different topical categories, offering insights for future advancements in query routing and disambiguation.
E N D
On Understanding and Classifying Web Queries Prepared for: Telcordia Contact:Steve Beitzel Applied Research steve@research.telcordia.com April 14, 2008
Overview • Introduction: Understanding Queries • Query Log Analysis • Automatic Query Classification • Conclusions
Problem Statement • A query contains more information than just its terms • Search is not just about finding relevant documents – users have: • Target task (information, navigation, transaction) • Target topic (e.g., news, sports, entertainment) • General information need • User queries are simply an attempt to express all of the above in a couple of terms
Problem Statement (2) • Current search systems focus mainly on the terms in the queries • Systems do not focus on extracting target task & topic information about user queries • We propose two techniques for improving understanding of queries • Large-Scale Query Log Analysis • Automatic Query Classification • This information can be used to improve general search effectiveness and efficiency
Query Log Analysis • Introduction to Query Log Analysis • Our Approach • Key Findings • Conclusions
Introduction • Web query logs are a source of information on users’ behaviors on the web • Analysis of logs’ contents may allow search services to better tailor their products to serve users’ needs • Existing query log analysis focuses on high-level, general measurements such as query length and frequency
Our Approach • Examine several aspects of the query stream over time: • Total query volume • Topical trends by category: • Popularity (Topical Coverage of the Query Stream) • Stability (Pearson Correlation of Frequencies)
Query Log Characteristics • Analyzed two AOL search service logs: • One full week of queries from December, 2003 • Six full months of queries; Sept. 2004-Feb. 2005 • Some light pre-processing was done: • Case differences, punctuation, & special operators removed; whitespace trimmed • Basic statistics: • Queries average 2.2 terms in length • Only one page of results is viewed 81% of the time • Two pages: 18% • Three or more: 1%
Category Breakdown • Query lists for each category formed by a team of human editors • Query stream classified by exactly matching each query to category lists
Key Findings • Some topical categories vary substantially more in popularity than others over an average day • Some topics are more popular during particular times of the day • others have a more constant level of interest • Most Individual categories are substantially less divergent over longer periods • Still some seasonal changes (Sports, Holidays)
Pearson Correlations for Selected Categories Over Six Months
Key Findings • The query sets for different categories have differing similarity over time • The level of similarity between the actual query sets received within topical categories varies differently according to category • As we move out to very large time scales, new trends become apparent: • Climatic (Seasonal) • Holidays • Sports-related • Several major events fall within the studied six-month period, causing high divergence in some categories • Long-term trends like these can potentially be very useful for query routing & disambiguation
Summary • Query Stream contains trends that are independent of volume fluctuation • Query Stream exhibits different trends depending on the timescale being examined • Future work may be able to leverage these trends for improvement in areas such as • Caching strategies • Query disambiguation • Query routing & classification
Automatic Query Classification • Introduction: Query Classification • Motivations & Prior Work • Our approach • Results & Analysis • Conclusions • Future Work
Introduction • Goal is to conceive an approach that can identify a query with relevant topical categories • Automatic classifiers help a search service decide when to use specialized databases • Specialized databases may provide tailored, topic-specific results
Problem Statement • Current search systems focus mainly on the terms in the queries • No focus on extracting topic information • Manual query classification is expensive • Does not take advantage of the large supply of unlabeled data available in query logs
Prior Work • Much early text classification was document-based • Query Classification: • Manual (human assessors) • Automatic • Clustering Techniques – doesn’t help identify topics • Supervised learning via retrieved documents • Still expensive – retrieved documents must be classified
Automatic Query Classification Motivations • Web queries have very few features • Achieving and sustaining classification recall is difficult • Web query logs provide a rich source of unlabeled data; we must harness these data to aid classification
Our Approach • Combine three methods of classification: • Labeled Data Approaches: • Manual (exact-match lookup using labeled queries) • Supervised Learning (Perceptron trained with labeled queries) • Unlabeled Data Approach: • Unsupervised Rule Learning with unlabeled data from a large query log • Disjunctive Combination of the above
Approach #1 - Exact-Match to Manual Classifications • A team of editors manually classified approximately 1M popular queries into 18 topical categories • General topics (sports, health, entertainment) • Mostly popular queries • Pros • Expect high precision from exact-match lookup • Cons • Expensive to maintain • Very low classification recall • Not robust to changes in the query stream
Approach #2 - Supervised Learning with a Perceptron • Goal: achieve higher levels of recall than human efforts • Supervised Learning • Used heavily in text classification • Bayes, Perceptron, SVM, etc… • Use manually classified queries to train a classifier • Pros: • Leverages available manual classifications for training • Finds features that are good predictors of a class • Cons: • Entirely dependant on the quality andquantity of manual classifications • Does not leverage unlabeled data
Approach #3 - Unsupervised Rule Learning Using Unlabeled Data • We have query logs with very large numbers of queries • Must take advantage of millions of users showing us how they look for things • Build on manual efforts • Manual efforts tell us some words from each category • Find words associated with each category • Learn how people look for topics, e.g. “what words do users use to find musicians or lawn-mowers”
Unsupervised Rule Learning Using Unlabeled Data (2) • Find good predictors of a class based on how users look for queries related to certain categories • Use those words to predict new members of each category • Apply the notion of selectional preferences to find weighted rules for classifying queries automatically
Selectional Preferences: Step 1 • Obtain a large log of unlabeled web queries • View each query as pairs of lexical units: • <head, tail> • Only applicable to queries of 2+ terms • Queries with n terms form n-1 pairs • Example: “directions to DIMACS” forms two pairs: • <directions, to DIMACS> and <directions to, DIMACS> • Count and record the frequency of each pair
Selectional Preferences: Step 2 • Obtain a set of manually labeled queries • Check the heads and tails of each pair to see if they appear in the manually labeled set • Convert each <head, tail> pair into: • <head, CATEGORY> (forward preference) • <CATEGORY, tail> (backward preference) • Discard <head, tail> pairs for which there is no category information at all • Sum counts for all contributing pairs and normalize by the number of contributing pairs
Selectional Preferences: Step 3 • Score each preference using Resnik’s Selectional Preference Strength formula: • Where urepresents a category, as found in Step 2. • S(x) is the sum of the weighted scores for every category associated with a given lexical unit
Selectional Preferences: Step 4 • Use the mined preferences and weighted scores from Steps 3 and 4 to assign classifications to unseen queries
Forward Rules harlem club X ENT->0.722 PLACES->0.378 TRAVEL->1.531 harley all stainless X AUTOS->3.448 SHOPPING->0.021 harley chicks with X PORN->5.681 Backward Rules X gets hot wont start AUTOS->2.049 PLACES->0.594 X getaway bargain PLACES->0.877 SHOPPING->0.047 TRAVEL->0.862 X getaway bargain hotel and airfare PLACES->0.594 TRAVEL->2.057 Selectional Preference Rule Examples
Combined Approach • Each approach exploits different qualities of our query stream • A natural next step is to combine them • How similar are the approaches?
Evaluation Metrics • Classification Precision: • #true positives / (#true positives + #false positives) • Classification Recall: • #true positives / (#true positives + # false negatives) • F-Measure: Higher values of beta put more emphasis on recall
Experimental Data Sets • Separate collections for training and testing: • Training: • Nearly 1M web queries manually classified by a team of editors • Grouped non-exclusively into 18 topical categories, and trained each category independently • Query log of several hundred million queries used for forming SP rules • Testing: • 20,000 web queries classified by human assessors • ~30% agreement with classifications in training set • 25% of the testing set was set aside for tuning the perceptron & SP classifiers
KDD Cup 2005 • 2005 KDD Cup task was Query Classification • 800,000 queries and 67 topical categories • 800 queries judged by three assessors • Top performers used information from retrieved documents • Retrieved result snippets for aiding classification decisions • Top terms from snippets and documents used for query expansion • Systems evaluated on precision and F1
KDD Cup Experiments • We mapped our manual classifications on to the KDD cup category set • Obviously an imperfect mapping • Our categories are general, i.e. “Sports” • KDD Cup categories are specific, i.e. “Sports-Baseball” • Running a retrieval pass is prohibitively expensive • We relied only on our general manual classifications and queries in the log
Conclusions • Our system successfully makes use of large amounts of unlabeled data • The Selectional Preference rules allow us to classify a significantly larger portion of the query stream than manual efforts alone • Excellent potential for further improvements
Future Work • Expand available classification features per query • Mine web query logs for related terms and patterns • More intelligent combination methods • Learned combination functions • Voting algorithms • Utilize external sources of information • Patterns and trends from query log analysis • Topical ontology lookups • Use automatic query classification to improve effectiveness and efficiency in a production search system
Related Bibliography • Journals • S. Beitzel, et. al, “Temporal Analysis of a Very Large Topically Categorized Query Log”, Journal of the American Society for Information Science and Technology (JASIST), Vol. 58, No. 2, 2007. • S. Beitzel, et. al, “Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs”, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2, April 2007. • Conferences • S. Beitzel, et. al, “Hourly Analysis of a Very Large Topically Categorized Web Query Log", ACM-SIGIR, July 2004. • S. Beitzel, et. al “Automatic Query Classification”, ACM-SIGIR, August 2005. • S. Beitzel, et. al, “Improving Automatic Query Classification via Semi-supervised Learning”, IEEE-ICDM, November 2005.
Questions? • Thanks!