240 likes | 365 Vues
This paper explores the mining of term-level patterns from search logs to enhance the effectiveness of query reformulation. By analyzing context-sensitive term substitution and addition, the authors propose probabilistic methods to address common query ineffectiveness issues, such as vocabulary mismatches and lack of discrimination. Evaluated on commercial search engine logs, the proposed methods demonstrate significant improvements in search query performance, showcasing how contextual and translation models can improve user search experiences.
E N D
Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign ACM CIKM 2008, Oct. 26-30, Napa Valley
Ineffective Queries reduce space command latex ACM CIKM 2008, Oct. 26-30, Napa Valley
Effective Queries squeeze space command latex ACM CIKM 2008, Oct. 26-30, Napa Valley
More Examples • If you want to wash your vehicle • “vehicle wash”, “auto wash” • “car wash”, “truck wash” • If you want to buy a car • “auto quotes” • “auto sale quotes”? • “auto insurance quotes”? ACM CIKM 2008, Oct. 26-30, Napa Valley
What Makes a Query Ineffective? • Vocabulary mismatch • “reduce space command latex” vs “squeeze space command latex” • “auto wash” vs “car wash” • Lack of discrimination • “auto quotes” vs “auto sale quotes” • … Term substitution Term addition How can we help improving ineffective queries? ACM CIKM 2008, Oct. 26-30, Napa Valley
Our Contribution • We cast query reformulation as term levelpattern mining from search logs • We define two basic types of patterns at term level and propose probabilistic methods • Context-sensitive term substitution • “autocar | _wash”, “car auto | _trade” • Context-sensitive term addition • “+sale | auto_quotes” • We evaluate our methods on commercial search engine logs and show their effectiveness ACM CIKM 2008, Oct. 26-30, Napa Valley
Problem Formulation q = auto wash Search logs Task 1:Contextual Models Task 3: Pattern Mining Query Collection autocar | _washautotruck | _wash Patterns Task 2:Translation Models +southland | _auto wash… car washtruck wash southland auto wash… Offline part Online part ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 1: Contextual Models • Syntagmatic relations • Capture terms frequently co-occur with w inside queries enterprise car rental rental car budget car rentalcar pricingcar picturescar accidents… Sample query collection G: General context rental: 0.375enterprise: 0.125budget: 0.125pricing: 0.125… Model PG( * |car) ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 1: Contextual Models Syntagmatic relations Capture terms frequently co-occur with w inside queries enterprise car rental rental car budget car rentalcar pricingcar picturescar accidents… Sample query collection L1: 1st Left Context rental: 0.333enterprise: 0.333budget: 0.333… Model: P L1( * | car) ACM CIKM 2008, Oct. 26-30, Napa Valley 9
Task 1: Contextual Models Syntagmatic relations Capture terms frequently co-occur with w inside queries enterprise car rental rental car budget car rentalcar pricingcar picturescar accidents… Sample query collection R1: 1st Right context rental: 0.4pricing: 0.2pictures: 0.2accidents: 0.2 … Model: P R1( * |w) ACM CIKM 2008, Oct. 26-30, Napa Valley 10
Task 2: Translation Models • Paradigmatic relations (“car” and “auto”) • Capture terms that are substitutable with w • Similar contexts high translation probability • Translation models Probability of generating s’s context from w’s contextual model Size of L1 context Size of R1 context ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 3.1: Pattern Mining–Term Substitution q=[w1…wi-1wiwi+1…wn] Global factor:translation model Substitute wi by s q’=[w1…wi-1swi+1…wn] Local factor Which word s should be chosen? ACM CIKM 2008, Oct. 26-30, Napa Valley
Estimating Local Factor s w1…wi-1__wi+1…wn Independence … … Ignore those terms far away ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 3.2: Pattern Mining–Term Addition q=[w1…wi-1wi…wn] Uniform Adding r before wi q’=[w1…wi-1rwi…wn] Similar to the Local Factor in Term Substitution Patterns ACM CIKM 2008, Oct. 26-30, Napa Valley
Evaluation: Data Preparation Future logs History Logs 5/1/2006 5/20/2006 5/31/2006 • From Microsoft Live Labs History Collection 4.4M queries 1.6M are distinct 1.3M user sessions Used to construct test cases ACM CIKM 2008, Oct. 26-30, Napa Valley
Examples of Contextual Models • Left and Right contexts are different • General context mixed them together ACM CIKM 2008, Oct. 26-30, Napa Valley
Examples of Translation Models • Conceptually similar keywords have high translation probabilities • Provide possibility for exploratory search in an interactive manner ACM CIKM 2008, Oct. 26-30, Napa Valley
Examples of Term Substitution • Substitution is context sensitive • Intuitively, reworded queries are more effective ACM CIKM 2008, Oct. 26-30, Napa Valley
Effectiveness Comparison of Term Substitution – Experiment Design … Q1 Q2 Session Qk R21 R22 R23 … Rk1 Rk2 Rk3 … C1 … C3 C2 How well can a reformulated query rank C1, C2, and C3 on the top? reformulation Q1 Q1’ Q2’ Q3’ dx C3 C1 C2 dx … dx C1 dx dx dx … dx C2 dx C3 dx … Best P@5=0.6 P@5 0.6 0.2 0.4 ACM CIKM 2008, Oct. 26-30, Napa Valley
Results Our method [Jones’06] #Recommended Queries Our method reformulates queries more effectively ACM CIKM 2008, Oct. 26-30, Napa Valley
Term Addition Patterns Term addition patterns can refine a broad query ACM CIKM 2008, Oct. 26-30, Napa Valley
Related Work • Query suggestions [e.g., Jones’06, Sahami et al’06] • Discover pattern at query level • Rely on external resources or training data • Does not consider the effectiveness • Query modifications in IR [Rocchio’71, Anick’03] • Expand queries from returned documents • Does not rely on search logs, mostly adding terms • Related work in NLP community [Lin’98, Rapp’02] • Finding synonym or near synonyms • Syntagmatic and paradigmatic relations • Not used for query reformulation ACM CIKM 2008, Oct. 26-30, Napa Valley
Conclusions and Future Work • We propose a new way to mine search logs for patterns to address ineffective queries • Vocabulary mismatch • Lack of discrimination • We define and mine two basic patterns at term level • Context-sensitive term substitution patterns • Context-sensitive term addition patterns • Experiments show the effectiveness of our methods • In the future, • Use relevance judgments instead of clicks • Exploit click information for better query reformulation ACM CIKM 2008, Oct. 26-30, Napa Valley
Thank You! ACM CIKM 2008, Oct. 26-30, Napa Valley