A Study of Learning a Merge Model for Multilingual Information Retrieval

A Study of Learning a Merge Model for Multilingual Information Retrieval Presenter: Cheng-Hui Chen Author: Ming-Feng Tsai, Yu-Ting Wang, Hsin-Hsi Chen SIGIR 2008

Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

Motivation • Multilingual information retrieval(MLIR) that result list usually includes more irrelevant words. • Traditional merging methods for MLIR that assumption relevant documents are homogeneously distributedover monolingual result lists.

Objectives • The various translation and retrieval qualities in different collections that to merge a unique result list. • To proposes merge method doesn’t assumption relevant documents are homogeneously distributed over monolingual result lists. • The enhancement merge model quality.

Methodology • Traditional MLIR Framework. • Raw-score • Round-robin • Normalized-by-top1 • Normalized-by-topk • The Proposes a learning method. • FRank

MLIR merge process • Feature Set • Query levels • Document levels • Translation levels • The Construction of a Merge Model • FRank ranking algorithm • BM25

Feature set • Query levels • The manually classify the terms within a query into several pre-defined categories. • Location/country names (Loc) • Organization names (Org) • Event names (EN) • Technical terms (TT) • Document levels • The extracted document length (Dlength) and title length (Tlength).

Feature set Loc 斗六 EN 英->中 Order、Park Loc EN 食べる • Translation levels • The size of a bilingual dictionary used for various language (i.e., DictSize). • The average number of translation equivalents within a query (i.e., AvgTAD). • If a query has two query terms both with three translation equivalents. • AvgTAD of the query is (3 + 3)/2 = 3.

The Construction of Merge model • The FRank’s generalized additive model, a merge model can be represented as : • mt(x) is a weak learner • αtis the learned weight • t is the number of selected weak learners • The combine with a retrevalmodel (bm25) by using linear combination .

Experiments • Data set • The Details of Experimental Collections • The Percentage of Retrieved Documents

Experiments Mean Average Precision (MAP)

Experiments The Experimental Results of Our Method using Different Combination Coefficient λ.

Experiments Feature Analysis

Conclusions The proposed merge model can significantly improve merging quality. The merge model indicates the key factors are the number of translatable terms and compound words.

Conclusions • The future work • Use other learning-based ranking algorithms. • Such as RankSVM and RankNet. • Extract more representative features to construct a merge model. • Such as linguistic features. • Expect to discover more relations within query terms. • Such as query term association and substitution.

Comments • Advantage • Improve merging quality. • Drawback • Application • Multilingual Information retrieval.

A Study of Learning a Merge Model for Multilingual Information Retrieval

A Study of Learning a Merge Model for Multilingual Information Retrieval

Presentation Transcript

Information Retrieval Model

Gravitation-Based Model for Information Retrieval

A Study of Poisson Query Generation Model for Information Retrieval

Machine Learning for multimedia information retrieval

Learning Techniques for Information Retrieval

The study of information retrieval – a long view

Simultaneous Multilingual Search for Translingual Information Retrieval

Set-Based Model: A New Approach for Information Retrieval

A Formal Study of Information Retrieval Heuristics

Genetic Learning for Information Retrieval

Multilingual Information Retrieval using GHSOM

Beyond Bags of Words: A Markov Random Field Model for Information Retrieval

Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval

“A Visual Toolkit For Information Retrieval”

Learning a Monolingual Language Model from a Multilingual Text Database

A User Study on Mathematical Information Retrieval (MIR)

Dependence Language Model for Information Retrieval

A Formal Study of Information Retrieval Heuristics

A Formal Study of Information Retrieval Heuristics

“A Visual Toolkit For Information Retrieval”