1 / 18

Ranking using Multiple Document Types in Desktop Search

Ranking using Multiple Document Types in Desktop Search. Jinyoung Kim, W. Bruce Croft SIGIR ’ 10 Speaker: Hsin-Lan, Wang Date: 2011/05/02. Outline. Introduction Retrieval Model Type-specific Retrieval Type Prediction Result Merging Experiment Conclusion. Introduction.

solada
Télécharger la présentation

Ranking using Multiple Document Types in Desktop Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ranking using Multiple Document Types in Desktop Search Jinyoung Kim, W. Bruce Croft SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/05/02

  2. Outline • Introduction • Retrieval Model • Type-specific Retrieval • Type Prediction • Result Merging • Experiment • Conclusion

  3. Introduction • People have many types of documents on their desktop with different sets of metadata for each type. • email→sender、receiver • office document→filename、author • Desktop search system: being able to predict which type of document a user is looking for given a query.

  4. Introduction • Main goal: to show how the retrieval effectiveness of a desktop search system can be enhanced by improving type prediction performance.

  5. Retrieval Model • query Q = (q1,…,qm) • each collection C contains documents of n field types (F1,…,Fn) • each document d may include fields (f1,…,fn)

  6. Type-specific Retrieval • Goal: to rank documents from each sub-collection. • Probabilistic Retrieval Model for Semi-structured Data PQL(qi|fj)=(1-λ)P(qi|fj)+λP(qi|Fj)

  7. Type Prediction • Goal: to score each collection given a user query. • Methods for Type Prediction • Query-likelihood of Collection • Query-likelihood of Query Log

  8. Type Prediction • Methods for Type Prediction • Geometric Average • ReDDE • Query Clarity

  9. Type Prediction • Methods for Type Prediction • Dictionary-based Matching • Built the dictionary for each sub-collection by using the names of the collection and metadata fields. • Using Document Metadata Fields • New method: field-based collection query likelihood (FQL).

  10. Type Prediction • Combining Type Prediction Methods • Grid-search of Parameter Values • Golden Section Search • Multi-class Classification • MultiSVM (Liblinear Toolkit) • Rank-learning Method • RankSVM

  11. Result Merging • C: collection score (from type prediction) • D: document score (from type-specific retrieval)

  12. Experiment • Pseudo-desktop Collection • Generation Method • collect documents with similar characteristics • generate queries by statistically taking terms from each of the target documents

  13. Experiment • Pseudo-desktop Collection • Prediction Accuracy • Retrieval Performance • Mean Reciprocal Rank

  14. Experiment • Best: use the retrieval method with the best aggregate performance for each sub-collection. • Uniform: each collection has the same chance of containing the relevant document. • Oracle: have perfect knowledge of the collection that contains the relevant document

  15. Experiment • CS Collection • Generation Method • DocTrack game • Prediction Accuracy

  16. Experiment • CS Collection • Retrieval Performance • Leave-one-out Prediction Accuracy

  17. Conclusion • Suggest a retrieval model for desktop search where type-specific retrieval results are merged into the final rank list based on type prediction scores. • Introduce FQL – a new type prediction method.

  18. Conclusion • Develop a human computation game for collecting queries in a more realistic setting. • Show that the combination method can improve type prediction performance.

More Related