1 / 74

Statistical Translation and Web Search Ranking

Statistical Translation and Web Search Ranking. Jianfeng Gao Natural language processing, MSR July 22, 2011. Who should be here?. Interested in statistical machine translation and Web search ranking Interested in modeling technologies Look for topics for your master/PhD thesis

Télécharger la présentation

Statistical Translation and Web Search Ranking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Translation and Web Search Ranking Jianfeng Gao Natural language processing, MSR July 22, 2011

  2. Who should be here? • Interested in statistical machine translation and Web search ranking • Interested in modeling technologies • Look for topics for your master/PhD thesis • A difficult topic: very hard to beat a simple baseline • An easy topic: others cannot beat it either

  3. Outline • Probability • Statistical Machine Translation (SMT) • SMT for Web search ranking

  4. Probability (1/2) • Probability space: • Cannot say • Joint probability: • Probability that x and y are both true • Conditional probability: • Probability that y is true when we already know x is true • Independence: • x and y are independent

  5. Probability (2/2) • : assumptions on which the probabilities are based • Product rule –from the def of conditional probability • Sum rule – a rewrite of the marginal probability def • Bayes rule – from the product rule

  6. An example: Statistical Language Modeling

  7. Statistical Language Modeling (SLM) • Model form • capture language structure via a probabilistic model • Model parameters • estimation of free parameters using training data

  8. Model Form • How to incorporate language structure into a probabilistic model • Task: next word prediction • Fill in the blank: “The dog of our neighbor ___” • Starting point: word n-gram model • Very simple, yet surprisingly effective • Words are generated from left-to-right • Assumes no other structure than words themselves

  9. Word N-gram Model • Word based model • Using chain rule on its history (=preceding words)

  10. Word N-gram Model • How do we get probability estimates? • Get text and count! • Problem of using the whole history • Rare events: unreliable probability estimates • Assuming a vocabulary of 20,000 words, From Manning and Schütze 1999: 194

  11. Word N-gram Model • Markov independence assumption • A word depends only on N-1 preceding words • N=3 → word trigram model • Reduce the number of parameters in the model • By forming equivalence classes • Word trigram model • ...

  12. Model Parameters • Bayesian estimation paradigm • Maximum likelihood estimation (MLE) • Smoothing in N-gram language models

  13. Bayesian Paradigm • – Posterior probability • – Likelihood • – Prior probability • – Marginal probability • Likelihood versus probability • for fixed , defines a probability over ; • for fixed , defines the likelihood of . • Never say “the likelihood of the data” • Always say “the likelihood of the parameters given the data”

  14. Maximum Likelihood Estimation (MLE) • : model; : data • Assume a uniform prior • is independent of , and is dropped • where is the likelihood of parameter • Key difference between MLE and Bayesian Estimation • MLE assume that is fixed but unknown, • Bayesian estimation assumes that itself is a random variable with a prior distribution

  15. MLE for Trigram LM • It is easy – let us get some real text and start to count • But, why is this the MLE solution?

  16. Derivation of MLE for N-gram • Homework – an interview question of MSR  • Hints • This is a constrained optimization problem • Use log likelihood as objective function • Assume a multinomial distribution of LM • Introduce Lagrange multiplier for the constraints

  17. Sparse Data Problem • Say our vocabulary size is |V| • There are |V|3 parameters in the trigram LM • |V| = 20,000  20,0003 = 8  1012 parameters • Most trigrams have a zero count even in a large text corpus • oops…

  18. Smoothing: Adding One • Add one smoothing (from Bayesian paradigm) • But works very badly – do not use this • Add delta smoothing • Still very bad – do not use this

  19. Smoothing: Backoff • Backoff trigram to bigram, bigram to unigram • D(0,1)is a discount constant – absolute discount • αis calculated so probabilities sum to 1 (homework) • Simple and effective – use this one!

  20. Outline • Probability • SMTand translation models • SMT for web search ranking

  21. SMT C:救援人员在倒塌的房屋里寻找生还者 E: Rescue workers search for survivors in collapsed houses and

  22. Translation process (generative story) • C is broken into translation units • Each unit is translated into English • Glue translated units to form E • Translation models • Word-based models • Phrase-based models • Syntax-based models

  23. Generative Modeling Art Science Engineering Story Math Code

  24. Generative Modeling for • Story making • how a target sentence is generated from a source sentence step by step • Mathematical formulation • modeling each generation steps in the generative story using a probability distribution • Parameter estimation • implementing an effective way of estimating the probability distributions from training data

  25. Word-Based Models: IBM Model 1 • We first choose the length for the target sentence , according to the distribution . • Then, for each position in the target sentence, we choose a position in the source sentence from which to generate the -th target word according to the distribution • Finally, we generate the target word by translating according to the distribution .

  26. Mathematical Formulation • Assume that the choice of the length is independent of and • Assume that all positions in the source sentence are equally likely to be chosen • Assuming that each target word is generated independently from

  27. Parameter Estimation • Model Form • MLE on word-aligned training data • Don’t forget smoothing

  28. Phrase-Based Models

  29. Mathematical Formulation • Assume a uniform probability over segmentations • Use the maximum approximation to the sum • Assume each phrase being translated independently and use distance-based reordering model

  30. Parameter Estimation MLE: Don’t forget smoothing

  31. Syntax-Based Models

  32. Story • Parse an input Chinese sentence into a parse tree • Translate each Chinese constituent into English • VP  (PP 寻找 NP, search for NP PP) • Glue these English constituents into a well-formed English sentence.

  33. Other Two Tasks? • Mathematical formation • Based on synchronous context free grammar (SCFG) • Parameter estimation • Learning SCFG from data • Homework  • Let us go thru an example (thanks to Michel Galley) • Hierarchical phrase model • Linguistically syntax-based models

  34. 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 rescue workers search for survivors in collapsed houses collapsed houses 倒塌的 房屋

  35. 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 rescue workers search for survivors in collapsed houses search for survivors in collapsed houses 在倒塌 的 房屋 里 寻找生还者

  36. 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 rescue workers search for survivors in collapsed houses search for survivors in collapsed houses 在倒塌 的 房屋 里 寻找生还者

  37. A synchronous rule • Phrase-based translation unit • Discontinuous translation unit • Controlon reordering 在 里 寻找

  38. A synchronous grammar 在 里 寻找 倒塌的 房屋 生还者 Context-free derivation: search for in 在 里 寻找 search for in collapsed houses 在倒塌 的 房屋 里 寻找 search for survivors in collapsed houses 在倒塌 的 房屋 里 寻找生还者

  39. A synchronous grammar 在 里 寻找 倒塌的 房屋 生还者 Recognizes: search for survivors in collapsed houses search for collapsed houses in survivors search for survivors collapsed houses in

  40. rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S

  41. rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S

  42. rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S

  43. rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS PP NP NP PP VBP PP PP VBP VP VP VP PP VBP IN VP PP 寻找 NP search for NP PP

  44. rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS PP NP NP PP VBP PP PP VBP VP VP SCFG rule: VP-234 PP-32 寻找 NP-57 search for NP-57 PP-32

  45. rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S

  46. Outline • Probability • SMTand translation models • SMT for web search ranking

  47. Web Documents and Search Queries • cold home remedy • cold remeedy • flu treatment • how to deal with stuffy nose?

  48. Map Queries to Documents • Fuzzy keyword matching • Q: cold home remedy • D: best home remedies for cold and flu • Spelling correction • Q: coldremeedies • D: best homeremedies for cold and flu • Query alteration • Q: flutreatment • D: best homeremedies for cold and flu • Query/document rewriting • Q: how to deal withstuffy nose • D: best homeremedies for cold and flu • Where are we now?

  49. Research Agenda (Gao et al. 2010, 2011) • Model documents and queries as different languages (Gao et al., 2010) • Cast mapping queries to documents as bridging the language gap via translation • Leverage statistical machine translation (SMT) technologies and infrastructures to improve search relevance

More Related