180 likes | 315 Vues
Ranking using Multiple Document Types in Desktop Search. Jinyoung Kim, W. Bruce Croft SIGIR ’ 10 Speaker: Hsin-Lan, Wang Date: 2011/05/02. Outline. Introduction Retrieval Model Type-specific Retrieval Type Prediction Result Merging Experiment Conclusion. Introduction.
E N D
Ranking using Multiple Document Types in Desktop Search Jinyoung Kim, W. Bruce Croft SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/05/02
Outline • Introduction • Retrieval Model • Type-specific Retrieval • Type Prediction • Result Merging • Experiment • Conclusion
Introduction • People have many types of documents on their desktop with different sets of metadata for each type. • email→sender、receiver • office document→filename、author • Desktop search system: being able to predict which type of document a user is looking for given a query.
Introduction • Main goal: to show how the retrieval effectiveness of a desktop search system can be enhanced by improving type prediction performance.
Retrieval Model • query Q = (q1,…,qm) • each collection C contains documents of n field types (F1,…,Fn) • each document d may include fields (f1,…,fn)
Type-specific Retrieval • Goal: to rank documents from each sub-collection. • Probabilistic Retrieval Model for Semi-structured Data PQL(qi|fj)=(1-λ)P(qi|fj)+λP(qi|Fj)
Type Prediction • Goal: to score each collection given a user query. • Methods for Type Prediction • Query-likelihood of Collection • Query-likelihood of Query Log
Type Prediction • Methods for Type Prediction • Geometric Average • ReDDE • Query Clarity
Type Prediction • Methods for Type Prediction • Dictionary-based Matching • Built the dictionary for each sub-collection by using the names of the collection and metadata fields. • Using Document Metadata Fields • New method: field-based collection query likelihood (FQL).
Type Prediction • Combining Type Prediction Methods • Grid-search of Parameter Values • Golden Section Search • Multi-class Classification • MultiSVM (Liblinear Toolkit) • Rank-learning Method • RankSVM
Result Merging • C: collection score (from type prediction) • D: document score (from type-specific retrieval)
Experiment • Pseudo-desktop Collection • Generation Method • collect documents with similar characteristics • generate queries by statistically taking terms from each of the target documents
Experiment • Pseudo-desktop Collection • Prediction Accuracy • Retrieval Performance • Mean Reciprocal Rank
Experiment • Best: use the retrieval method with the best aggregate performance for each sub-collection. • Uniform: each collection has the same chance of containing the relevant document. • Oracle: have perfect knowledge of the collection that contains the relevant document
Experiment • CS Collection • Generation Method • DocTrack game • Prediction Accuracy
Experiment • CS Collection • Retrieval Performance • Leave-one-out Prediction Accuracy
Conclusion • Suggest a retrieval model for desktop search where type-specific retrieval results are merged into the final rank list based on type prediction scores. • Introduce FQL – a new type prediction method.
Conclusion • Develop a human computation game for collecting queries in a more realistic setting. • Show that the combination method can improve type prediction performance.