1 / 20

Relevance-Based Language Models

Relevance-Based Language Models. Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA 01003 SIGIR 2001 Presented by Yi-Ting. Outline. Introduction

alina
Télécharger la présentation

Relevance-Based Language Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA 01003 SIGIR 2001 Presented by Yi-Ting

  2. Outline • Introduction • Related Work-Classical Probabilistic Model、Language Modeling Approaches • Relevance Model • Experimental Results • Conclusions

  3. introduction • A departure from the traditional models of relevance and emergence of language modeling frameworks. • To introduce a technique for constructing a relevance model from the query alone and compare resulting query-based relevance models to language models constructed with training data.

  4. Related Work • Classical Probabilistic Models • Probability ranking principle:P(D|R)/P(D|N) • Generative language models:P(w|R)/P(w|N) • Binary Independence Model • multiple-Bernoulli language models: • 2-Poisson Model

  5. Related Work • Language Modeling Approaches • Focused on viewing documents themselves as models and queries as strings of text randomly sampled from these models. • By Ponte and Croft, • By Miller et al., Song and Croft, • Berger and Lafferty view the query Q as a potential translation of the document D, and use powerful estimation techniques.

  6. Relevance Model • To refer to a mechanism that determines the probability P(w|R) of observing a word w in the documents relevant. • Justified way of estimating the relevance model when no training data is available. • Ranking with a Relevance Model:

  7. Relevance Model • Assume that the query is a sample from any specific document model. Instead we assume that both the query and the documents are samples from an unknown relevance model R.

  8. Relevance Model • Let Q=q…qk. The probability of w to the conditional probability of observing w given that we just observed q1…qk:

  9. Relevance Model • Method1:i.i.d sampling • Assume that the query q…qk words and the w in relevant documents are sampled identically and independently from a unigram distribution MR.

  10. Relevance Model • Method2:conditional sampling • Fixing a value of w according to some prior P(w). Then pick a distribution Mi according to P(Mi|w), sample the query word qi from Mi with probability P(qi|Mi). • Assume the query words q…qk to be independent of each other, but keep their dependence on w:

  11. Relevance Model • Method 2 performs slightly better in terms of retrieval effectiveness and tracking errors.

  12. Relevance Model • Estimation Details • Set the query prior P(q…qk) in equation(6) to be: • P(w) to be: • P(w|MD) to be: • P(Mi|w) to be: P(Mi|w) = P(w|Mi)P(w)/P(Mi)

  13. Experimental Results • Model cross-entropy • Cross-entropy is an information-theoretic measure of distance between two distribution, measured bits. • Both models exhibit lowest cross-entropy around 50 documents, however the model estimated with Method 2 achieves lower absolute cross-entropy.

  14. Experimental Results

  15. Experimental Results • TREC ad-hoc retrieval • Two query sets: TREC title queries 101-150 and 151-200. • Baseline: the performance of the language modeling approach, where we rank documents by their probability of generating the query. • Average precision、Wilcoxon test、R_Precision • Relevance models is an attractive choice in high-precision applications.

  16. Experimental Results

  17. Experimental Results • TDT topic tracking • The tracking task in TDT is evaluated using Detection Error Tradeoff (DET) : the tradeoff between Misses and False errors. • Ran the tracking system on a modified task. • TDT tracking systems with 1 to 4 training examples.

  18. Experimental Results

  19. Conclusions • Our technique is also relatively insensitive to the number of documents used in “expansion”. • our model is simple to implement and does not require any training data. • There are a number of interesting directions for further investigation of relevance models. • Our method produces accurate models of relevance, which leads to significantly improved retrieval performance. • demonstrated that unsupervised Relevance Models can be competitive with supervised topic models in TDT.

More Related