Ranking using Multiple Document Types in Desktop Search

Ranking using Multiple Document Types in Desktop Search Jinyoung Kim, W. Bruce Croft SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/05/02

Outline • Introduction • Retrieval Model • Type-specific Retrieval • Type Prediction • Result Merging • Experiment • Conclusion

Introduction • People have many types of documents on their desktop with different sets of metadata for each type. • email→sender、receiver • office document→filename、author • Desktop search system: being able to predict which type of document a user is looking for given a query.

Introduction • Main goal: to show how the retrieval effectiveness of a desktop search system can be enhanced by improving type prediction performance.

Retrieval Model • query Q = (q1,…,qm) • each collection C contains documents of n field types (F1,…,Fn) • each document d may include fields (f1,…,fn)

Type-specific Retrieval • Goal: to rank documents from each sub-collection. • Probabilistic Retrieval Model for Semi-structured Data PQL(qi|fj)=(1-λ)P(qi|fj)+λP(qi|Fj)

Type Prediction • Goal: to score each collection given a user query. • Methods for Type Prediction • Query-likelihood of Collection • Query-likelihood of Query Log

Type Prediction • Methods for Type Prediction • Geometric Average • ReDDE • Query Clarity

Type Prediction • Methods for Type Prediction • Dictionary-based Matching • Built the dictionary for each sub-collection by using the names of the collection and metadata fields. • Using Document Metadata Fields • New method: field-based collection query likelihood (FQL).

Type Prediction • Combining Type Prediction Methods • Grid-search of Parameter Values • Golden Section Search • Multi-class Classification • MultiSVM (Liblinear Toolkit) • Rank-learning Method • RankSVM

Result Merging • C: collection score (from type prediction) • D: document score (from type-specific retrieval)

Experiment • Pseudo-desktop Collection • Generation Method • collect documents with similar characteristics • generate queries by statistically taking terms from each of the target documents

Experiment • Pseudo-desktop Collection • Prediction Accuracy • Retrieval Performance • Mean Reciprocal Rank

Experiment • Best: use the retrieval method with the best aggregate performance for each sub-collection. • Uniform: each collection has the same chance of containing the relevant document. • Oracle: have perfect knowledge of the collection that contains the relevant document

Experiment • CS Collection • Generation Method • DocTrack game • Prediction Accuracy

Experiment • CS Collection • Retrieval Performance • Leave-one-out Prediction Accuracy

Conclusion • Suggest a retrieval model for desktop search where type-specific retrieval results are merged into the final rank list based on type prediction scores. • Introduce FQL – a new type prediction method.

Conclusion • Develop a human computation game for collecting queries in a more realistic setting. • Show that the combination method can improve type prediction performance.

Ranking using Multiple Document Types in Desktop Search

Ranking using Multiple Document Types in Desktop Search

Presentation Transcript

Use multiple TOCs in a document

Multiple Intents Re-ranking

Multiple Signature Document

Multiple Instance Ranking

ExtMiner: Combining Multiple Ranking and Clustering Algorithms for Structured Document Retrieval

Page Ranking Techniques In Search Engines

Multiple Sequence Alignment Using Tabu Search

Ranking Results in IR Search

Using Personal-Characteristic and Friend-Ranking in Blog Search

Test document: Using Formal Search Theory in a Land Search Environment.pdf

Document Types

DOCUMENT TYPES

Search Engine Ranking Factors

Tips Enhancing Ranking Using-Search Engine Optimization

Google Search Ranking

Local Search Ranking Factors in 2018

Document Ranking using Customizes Vector Method

Desktop Search

Multiple Aspect Ranking using the Good Grief Algorithm

Ranking Search Results

Document ranking