1 / 25

Language Modeling Again

Language Modeling Again. So are we smooth now? Courtesy of Chris Jordan. So what did we talk about last week?. Language models represent documents as multinomial distributions What is a multinomial? The Maximum Likelihood Estimate calculates document model

mgrimes
Télécharger la présentation

Language Modeling Again

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language Modeling Again So are we smooth now? Courtesy of Chris Jordan

  2. So what did we talk about last week? • Language models represent documents as multinomial distributions • What is a multinomial? • The Maximum Likelihood Estimate calculates document model • What is the Maximum Likelihood Estimate? • Smoothing document models

  3. Why is smoothing so important? • Maximum Likelihood Estimate gives 0 probabilities • Why is that an issue? • What does smoothing do? • What types of smoothing are there

  4. Challenge questions • What is common in every smoothing technique that we have covered? • What does smoothing really do? • Do it make for a more accurate document model? • Replace the need for more data?

  5. A Study of Smoothing Methods of Language Models Applied to Ad Hoc Information Retrieval • Thoughts? • What is Additive? • What is Interpolation? • What is Backoff?

  6. Laplace / Additive Smoothing • Just increasing raw term frequencies • Is that representative of the document model? • How hard is this to implement? • What happens if the constant added is really large?

  7. Interpolation • Jelinek Mercer ps(t) = p(t|d) + (1-)p(t|corpus) • Dirichlet • Anyone know what this is? • Remember Gaussian? Poisson? Beta? Gamma? • Distributions for Binomials • Distribution for Multinomials

  8. Dirichlet / Absolute Discounting • What do Absolute Discounting do? • How is it different from Laplace? Jelinek Mercer? • What is the key difference between the d in Jelinek Mercer and d in Dirichlet and Absolute Discounting • d is used to determine how much probability mass is subtracted from seen terms and added to unseen ones

  9. Back off • What is the idea here? • Do not pad the probability of seen terms • Any idea why this isn’t work? • The seen terms have their probabilities decreased • Too much smoothing?

  10. Pause… Review • Why do we smooth? • Does smoothing make sense? • What is Laplace? • What is Jelinek Mercer? • What is the Dirichlet smoothing? • What is Absolute Discounting? • What is Back off

  11. Let’s beat this horse some more! • Everyone know what mean average precision is? • Let’s have a look at the results • Are these really improvements • What is an increase of .05 precision really mean? • Will that matter to the user?

  12. And now we come full circle • What is a real performance improvement? • Cranfield paradigm evaluation • Corpus • Queries • Qrels • User trials • Satisfaction • Effectiveness • Efficiency

  13. Cluster Based Smoothing • What will clustering give us? • Cluster the corpus • Find clusters for each document • Mixture model now involves • Document model • Cluster model • Corpus model • Some performance gains • Significant but not so special

  14. Relevance Modeling • Blind Relevance Feedback approach • Top documents in the result set used as feedback • A language model is constructed from these top ranked documents for each query • This model is used as the relevance model for probabilistic retrieval

  15. One the topic of Blind Relevance Feedback • How can we use Relative Entropy here? • Derive a model that minimizes the relative entropy between the documents in the top rank • Does Relevance Modeling make sense? • Does using Relative Entropy make sense?

  16. The big assumption • Top ranked documents are a good source of relevant text • This obviously is not always true • There is a lot of noise • Are top rank representative of the relevant set? • Relevance modeling and Relative Entropy BRF approaches have been shown to improve performance • But not really…

  17. Review • What is average precision? • What is the Cranfield paradigm? • What alternative sources can be used for smoothing? • Do Blind Relevance Feedback make sense? • Why does it work?

  18. You have been a good class • We have covered • Language Modeling for ad-hoc document retrieval • Unigram model • Maximum Likelihood Estimate • Smoothing Techniques • Different mixture models • Blind Relevance Feedback for Language Modeling

  19. Questions for me?

  20. Questions for you • Why do we work with the unigram model? • Why is smoothing important? • How does a language model represent a document? • What is interpolation?

  21. Let’s talk more about me

  22. Another application of language modeling • Unsupervised Morphological Analysis • A morpheme is a basic unit of meaning in a language pretested : pre - test - ed • English is a relatively easy language • Turkish, Finnish, German are agglomerative • Very hard

  23. Morfessor • All terms in the vocabulary are candidate morphemes • Terms are recursively split • Build up the candidate morpheme set • Repeatedly analyze the whole vocabulary until the candidate morpheme set can no longer be improved

  24. Swordfish • Ngram based unsupervised morpheme analyzer • Character Ngrams • Substrings • A language model is constructed over all ngrams of all lengths • Maximum Likelihood Estimate • Terms recursive split based on the likelihood of the ngrams

  25. Swordfish Results • Reasonable Results • Character ngrams are useful in finding morphemes • All morphemes are ngrams but not all ngrams are morphemes • The most prominent ngrams appear to be morphemes • How one defines prominent is an open question • Check out the PASCAL Morpho-Challenge

More Related