On Understanding and Classifying Web Queries

On Understanding and Classifying Web Queries Prepared for: Telcordia Contact:Steve Beitzel Applied Research steve@research.telcordia.com April 14, 2008

Overview • Introduction: Understanding Queries • Query Log Analysis • Automatic Query Classification • Conclusions

Problem Statement • A query contains more information than just its terms • Search is not just about finding relevant documents – users have: • Target task (information, navigation, transaction) • Target topic (e.g., news, sports, entertainment) • General information need • User queries are simply an attempt to express all of the above in a couple of terms

Popular Web Queries

Problem Statement (2) • Current search systems focus mainly on the terms in the queries • Systems do not focus on extracting target task & topic information about user queries • We propose two techniques for improving understanding of queries • Large-Scale Query Log Analysis • Automatic Query Classification • This information can be used to improve general search effectiveness and efficiency

Query Log Analysis • Introduction to Query Log Analysis • Our Approach • Key Findings • Conclusions

Introduction • Web query logs are a source of information on users’ behaviors on the web • Analysis of logs’ contents may allow search services to better tailor their products to serve users’ needs • Existing query log analysis focuses on high-level, general measurements such as query length and frequency

Our Approach • Examine several aspects of the query stream over time: • Total query volume • Topical trends by category: • Popularity (Topical Coverage of the Query Stream) • Stability (Pearson Correlation of Frequencies)

Query Log Characteristics • Analyzed two AOL search service logs: • One full week of queries from December, 2003 • Six full months of queries; Sept. 2004-Feb. 2005 • Some light pre-processing was done: • Case differences, punctuation, & special operators removed; whitespace trimmed • Basic statistics: • Queries average 2.2 terms in length • Only one page of results is viewed 81% of the time • Two pages: 18% • Three or more: 1%

Traffic Volume Over a Day

Category Breakdown • Query lists for each category formed by a team of human editors • Query stream classified by exactly matching each query to category lists

Category Popularity Over a Day

Category Popularity Over Six Months

Key Findings • Some topical categories vary substantially more in popularity than others over an average day • Some topics are more popular during particular times of the day • others have a more constant level of interest • Most Individual categories are substantially less divergent over longer periods • Still some seasonal changes (Sports, Holidays)

Pearson Correlations for Selected Categories Over A Day

Pearson Correlations for Selected Categories Over Six Months

Key Findings • The query sets for different categories have differing similarity over time • The level of similarity between the actual query sets received within topical categories varies differently according to category • As we move out to very large time scales, new trends become apparent: • Climatic (Seasonal) • Holidays • Sports-related • Several major events fall within the studied six-month period, causing high divergence in some categories • Long-term trends like these can potentially be very useful for query routing & disambiguation

Summary • Query Stream contains trends that are independent of volume fluctuation • Query Stream exhibits different trends depending on the timescale being examined • Future work may be able to leverage these trends for improvement in areas such as • Caching strategies • Query disambiguation • Query routing & classification

Automatic Query Classification • Introduction: Query Classification • Motivations & Prior Work • Our approach • Results & Analysis • Conclusions • Future Work

Introduction • Goal is to conceive an approach that can identify a query with relevant topical categories • Automatic classifiers help a search service decide when to use specialized databases • Specialized databases may provide tailored, topic-specific results

Problem Statement • Current search systems focus mainly on the terms in the queries • No focus on extracting topic information • Manual query classification is expensive • Does not take advantage of the large supply of unlabeled data available in query logs

Prior Work • Much early text classification was document-based • Query Classification: • Manual (human assessors) • Automatic • Clustering Techniques – doesn’t help identify topics • Supervised learning via retrieved documents • Still expensive – retrieved documents must be classified

Automatic Query Classification Motivations • Web queries have very few features • Achieving and sustaining classification recall is difficult • Web query logs provide a rich source of unlabeled data; we must harness these data to aid classification

Our Approach • Combine three methods of classification: • Labeled Data Approaches: • Manual (exact-match lookup using labeled queries) • Supervised Learning (Perceptron trained with labeled queries) • Unlabeled Data Approach: • Unsupervised Rule Learning with unlabeled data from a large query log • Disjunctive Combination of the above

Approach #1 - Exact-Match to Manual Classifications • A team of editors manually classified approximately 1M popular queries into 18 topical categories • General topics (sports, health, entertainment) • Mostly popular queries • Pros • Expect high precision from exact-match lookup • Cons • Expensive to maintain • Very low classification recall • Not robust to changes in the query stream

Approach #2 - Supervised Learning with a Perceptron • Goal: achieve higher levels of recall than human efforts • Supervised Learning • Used heavily in text classification • Bayes, Perceptron, SVM, etc… • Use manually classified queries to train a classifier • Pros: • Leverages available manual classifications for training • Finds features that are good predictors of a class • Cons: • Entirely dependant on the quality andquantity of manual classifications • Does not leverage unlabeled data

Approach #3 - Unsupervised Rule Learning Using Unlabeled Data • We have query logs with very large numbers of queries • Must take advantage of millions of users showing us how they look for things • Build on manual efforts • Manual efforts tell us some words from each category • Find words associated with each category • Learn how people look for topics, e.g. “what words do users use to find musicians or lawn-mowers”

Unsupervised Rule Learning Using Unlabeled Data (2) • Find good predictors of a class based on how users look for queries related to certain categories • Use those words to predict new members of each category • Apply the notion of selectional preferences to find weighted rules for classifying queries automatically

Selectional Preferences: Step 1 • Obtain a large log of unlabeled web queries • View each query as pairs of lexical units: • <head, tail> • Only applicable to queries of 2+ terms • Queries with n terms form n-1 pairs • Example: “directions to DIMACS” forms two pairs: • <directions, to DIMACS> and <directions to, DIMACS> • Count and record the frequency of each pair

Selectional Preferences: Step 2 • Obtain a set of manually labeled queries • Check the heads and tails of each pair to see if they appear in the manually labeled set • Convert each <head, tail> pair into: • <head, CATEGORY> (forward preference) • <CATEGORY, tail> (backward preference) • Discard <head, tail> pairs for which there is no category information at all • Sum counts for all contributing pairs and normalize by the number of contributing pairs

Selectional Preferences: Step 2

Selectional Preferences: Step 3 • Score each preference using Resnik’s Selectional Preference Strength formula: • Where urepresents a category, as found in Step 2. • S(x) is the sum of the weighted scores for every category associated with a given lexical unit

Selectional Preferences: Step 4 • Use the mined preferences and weighted scores from Steps 3 and 4 to assign classifications to unseen queries

Forward Rules harlem club X ENT->0.722 PLACES->0.378 TRAVEL->1.531 harley all stainless X AUTOS->3.448 SHOPPING->0.021 harley chicks with X PORN->5.681 Backward Rules X gets hot wont start AUTOS->2.049 PLACES->0.594 X getaway bargain PLACES->0.877 SHOPPING->0.047 TRAVEL->0.862 X getaway bargain hotel and airfare PLACES->0.594 TRAVEL->2.057 Selectional Preference Rule Examples

Combined Approach • Each approach exploits different qualities of our query stream • A natural next step is to combine them • How similar are the approaches?

Evaluation Metrics • Classification Precision: • #true positives / (#true positives + #false positives) • Classification Recall: • #true positives / (#true positives + # false negatives) • F-Measure: Higher values of beta put more emphasis on recall

Experimental Data Sets • Separate collections for training and testing: • Training: • Nearly 1M web queries manually classified by a team of editors • Grouped non-exclusively into 18 topical categories, and trained each category independently • Query log of several hundred million queries used for forming SP rules • Testing: • 20,000 web queries classified by human assessors • ~30% agreement with classifications in training set • 25% of the testing set was set aside for tuning the perceptron & SP classifiers

Effectiveness of each approach

Performance of Classifiers at varying levels of Beta

KDD Cup 2005 • 2005 KDD Cup task was Query Classification • 800,000 queries and 67 topical categories • 800 queries judged by three assessors • Top performers used information from retrieved documents • Retrieved result snippets for aiding classification decisions • Top terms from snippets and documents used for query expansion • Systems evaluated on precision and F1

KDD Cup Experiments • We mapped our manual classifications on to the KDD cup category set • Obviously an imperfect mapping • Our categories are general, i.e. “Sports” • KDD Cup categories are specific, i.e. “Sports-Baseball” • Running a retrieval pass is prohibitively expensive • We relied only on our general manual classifications and queries in the log

KDD Cup Results

Conclusions • Our system successfully makes use of large amounts of unlabeled data • The Selectional Preference rules allow us to classify a significantly larger portion of the query stream than manual efforts alone • Excellent potential for further improvements

Future Work • Expand available classification features per query • Mine web query logs for related terms and patterns • More intelligent combination methods • Learned combination functions • Voting algorithms • Utilize external sources of information • Patterns and trends from query log analysis • Topical ontology lookups • Use automatic query classification to improve effectiveness and efficiency in a production search system

Related Bibliography • Journals • S. Beitzel, et. al, “Temporal Analysis of a Very Large Topically Categorized Query Log”, Journal of the American Society for Information Science and Technology (JASIST), Vol. 58, No. 2, 2007. • S. Beitzel, et. al, “Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs”, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2, April 2007. • Conferences • S. Beitzel, et. al, “Hourly Analysis of a Very Large Topically Categorized Web Query Log", ACM-SIGIR, July 2004. • S. Beitzel, et. al “Automatic Query Classification”, ACM-SIGIR, August 2005. • S. Beitzel, et. al, “Improving Automatic Query Classification via Semi-supervised Learning”, IEEE-ICDM, November 2005.

Questions? • Thanks!

On Understanding and Classifying Web Queries

On Understanding and Classifying Web Queries

Presentation Transcript

Classifying Business Messages on Facebook

Structured Annotations of Web Queries

Probase : Understanding Data on the Web

Automation of Web Form Queries

Classifying and Understanding Data

Queries on Encrypted Data

Understanding Web Sites

Clustering Web Queries

Understanding Web Services

7 Top-k Queries on Web Sources and Structured Data

Quantum queries on permutations

Answering Relationship Queries on the Web

Using Web Structure for Classifying and Describing Web Pages

A review on “Answering Relationship Queries on the Web”

Understanding Web Graphics

Symbolizing and Classifying

On Provenance of Queries on Linked Web Data