Language Models for TR

Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Text mining paper Food nutrition paper Text Generation with Unigram LM (Unigram) Language Model  p(w| ) Sampling Document … text 0.2 mining 0.1 assocation 0.01 clustering 0.02 … food 0.00001 … Topic 1: Text mining … food 0.25 nutrition 0.1 healthy 0.05 diet 0.02 … Topic 2: Health

… text ? mining ? assocation ? database ? … query ? … 10/100 5/100 3/100 3/100 1/100 Estimation of Unigram LM (Unigram) Language Model  p(w| )=? Estimation Document text 10 mining 5 association 3 database 3 algorithm 2 … query 1 efficient 1 A “text mining paper” (total #words=100)

Language Model … text ? mining ? assocation ? clustering ? … food ? … ? Which model would most likely have generated this query? … food ? nutrition ? healthy ? diet ? … Language Models for Retrieval(Ponte & Croft 98) Document Query = “data mining algorithms” Text mining paper Food nutrition paper

Doc LM Query likelihood d1 p(q| d1) p(q| d2) d2 p(q| dN) dN Ranking Docs by Query Likelihood d1 q d2 dN

Document language model Retrieval as Language Model Estimation • Document ranking based on query likelihood • Retrieval problem  Estimation of p(wi|d) • Smoothing is an important issue, and distinguishes different approaches

How to Estimate p(w|d)? • Simplest solution: Maximum Likelihood Estimator • P(w|d) = relative frequency of word w in d • What if a word doesn’t appear in the text? P(w|d)=0 • In general, what probability should we give a word that has not been observed? • If we want to assign non-zero probabilities to such words, we’ll have to discount the probabilities of observed words • This is what “smoothing” is about …

Smoothed LM Language Model Smoothing (Illustration) P(w) Max. Likelihood Estimate Word w

Discounted ML estimate Collection language model A General Smoothing Scheme • All smoothing methods try to • discount the probability of words seen in a doc • re-allocate the extra probability so that unseen words will have a non-zero probability • Most use a reference model (collection language model) to discriminate unseen words

Doc length normalization (long doc is expected to have a smaller d) TF weighting IDFweighting Ignore for ranking Smoothing & TF-IDF Weighting • Plug in the general smoothing scheme to the query likelihood retrieval formula, we obtain • Smoothing with p(w|C) TF-IDF + length norm.

Discounted ML estimate Reference language model Derivation of the Query Likelihood Retrieval Formula Retrieval formula using the general smoothing scheme Key rewriting step Similar rewritings are very common when using LMs for IR…

Three Smoothing Methods(Zhai & Lafferty 01) • Simplified Jelinek-Mercer: Shrink uniformly toward p(w|C) • Dirichlet prior (Bayesian): Assumepseudo countsp(w|C) • Absolute discounting: Subtract a constant

Comparison of Three Methods

Keyword queries Verbose queries The Need of Query-Modeling(Dual-Role of Smoothing) Why does query type affect smoothing sensitivity?

Intuitively, d2 should have a higher score, but p(q|d1)>p(q|d2)… Query = “the algorithms for data mining” P(w|REF) 0.2 0.00001 0.2 0.00001 0.00001 Smoothed p(w|d1): 0.184 0.000109 0.182 0.000209 0.000309 Smoothed p(w|d2):0.182 0.000109 0.181 0.000309 0.000409 Another Reason for Smoothing Content words Query = “the algorithms for data mining” pDML(w|d1):0.04 0.001 0.02 0.002 0.003 pDML(w|d2): 0.02 0.001 0.01 0.003 0.004 p( “algorithms”|d1) = p(“algorithm”|d2) p( “data”|d1) < p(“data”|d2) p( “mining”|d1) < p(“mining”|d2) So we should make p(“the”) and p(“for”) less different for all docs, and smoothing helps achieve this goal…

Stage-1 -Explain unseen words -Dirichlet prior(Bayesian) Stage-2 -Explain noise in query -2-component mixture c(w,d) +p(w|C) (1-) + p(w|U)   |d| + P(w|d) = User background model Two-stage Smoothing  and  can be automatically set through statistical estimation

What You Should Know • The basic idea of ranking docs by query likelihood (“the language modeling approach”) • How smoothing is connected with TF-IDF weighting and document length normalization • The basic idea of two-stage smoothing

Language Models for TR

Language Models for TR

Presentation Transcript

Language Models for Information Retrieval

TR 555 Statistics “Refresher” Lecture 3: Models

KNR 273 : TR Models Continued

Lexicon Optimization Approaches for Language Models of Agglutinative Language

Models of Language

Information Retrieval – Language models for IR

Language Models

Language Models

Program Models for English Language Instruction

Language Models

Cluster Language Models

Factored Language Models

Language Models for TR

LANGUAGE TEACHING MODELS

Language Models For Speech Recognition

Discriminative Models for Spoken Language Understanding

Language Models

KNR 273: TR Models Continued