1 / 87

Machine Translation Domain Adaptation

Machine Translation Domain Adaptation. Day 19. Project #2. MEMM tools. Online description of project #2 has been updated with more information. Quick walk through. training.txt. I/PRP left/VBD ./. John/NNP arrived/VBD ./. You write code to convert this to features!

genica
Télécharger la présentation

Machine Translation Domain Adaptation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine TranslationDomain Adaptation Day 19

  2. Project #2

  3. MEMM tools • Online description of project #2 has been updated with more information

  4. Quick walk through training.txt I/PRP left/VBD ./. John/NNP arrived/VBD ./.

  5. You write code to convert this to features! “featurize.pl training.txt training.feats” Quick walk through training.txt I/PRP left/VBD ./. John/NNP arrived/VBD ./. training.feats PRP w0=I:1 w-1=<s>:1 VBD w0=left:1 w-1=I:1 . w0=.:1 w-1=left:1 NNP w0=John:1 w-1=<s>:1 VBD w0=arrived:1 w-1=John:1 . w0=.:1 w-1=arrived:1

  6. Run memm_train to train this model “memm_train --input training.feats--classifier trigram.model --markovOrder 2” Quick walk through training.txt I/PRP left/VBD ./. John/NNP arrived/VBD ./. training.feats PRP w0=I:1 w-1=<s>:1 VBD w0=left:1 w-1=I:1 . w0=.:1 w-1=left:1 NNP w0=John:1 w-1=<s>:1 VBD w0=arrived:1 w-1=John:1 . w0=.:1 w-1=arrived:1 trigram.model <binary gobbledegoo>

  7. Get some unseen test data… Quick walk through training.txt test.txt I/PRP left/VBD ./. John/NNP arrived/VBD ./. he/PRP arrived/VBD ./. John/NNP left/VBD ./. training.feats PRP w0=I:1 w-1=<s>:1 VBD w0=left:1 w-1=I:1 . w0=.:1 w-1=left:1 NNP w0=John:1 w-1=<s>:1 VBD w0=arrived:1 w-1=John:1 . w0=.:1 w-1=arrived:1 trigram.model <binary gobbledegoo>

  8. Use the same featurization code on test data “featurize.pl test.txt test.feats” Quick walk through training.txt test.txt I/PRP left/VBD ./. John/NNP arrived/VBD ./. he/PRP arrived/VBD ./. John/NNP left/VBD ./. training.feats test.feats PRP w0=I:1 w-1=<s>:1 VBD w0=left:1 w-1=I:1 . w0=.:1 w-1=left:1 NNP w0=John:1 w-1=<s>:1 VBD w0=arrived:1 w-1=John:1 . w0=.:1 w-1=arrived:1 PRP w0=he:1 w-1=<s>:1 VBD w0=arrived:1 w-1=he:1 . w0=.:1 w-1=arrived:1 NNP w0=John:1 w-1=<s>:1 VBD w0=left:1 w-1=John:1 . w0=.:1 w-1=left:1 trigram.model <binary gobbledegoo>

  9. memm_test predicts tags (memm_testignores first column; can include true tags) “memm_test --input test.feats --classifier trigram.model --markovOrder 2 --output test.tags” Quick walk through training.txt test.txt I/PRP left/VBD ./. John/NNP arrived/VBD ./. he/PRP arrived/VBD ./. John/NNP left/VBD ./. training.feats test.feats test.tags PRP w0=I:1 w-1=<s>:1 VBD w0=left:1 w-1=I:1 . w0=.:1 w-1=left:1 NNP w0=John:1 w-1=<s>:1 VBD w0=arrived:1 w-1=John:1 . w0=.:1 w-1=arrived:1 PRP w0=he:1 w-1=<s>:1 VBD w0=arrived:1 w-1=he:1 . w0=.:1 w-1=arrived:1 NNP w0=John:1 w-1=<s>:1 VBD w0=left:1 w-1=John:1 . w0=.:1 w-1=left:1 PRP VBD . NNP VBD . trigram.model <binary gobbledegoo>

  10. MEMM features training.txt I/PRP left/VBD ./. John/NNP arrived/VBD ./. You provide these features… …and add the argument “--markovOrder 2” training.feats Actual features used by MEMM PRP w0=I:1 w-1=<s>:1 VBD w0=left:1 w-1=I:1 . w0=.:1 w-1=left:1 NNP w0=John:1 w-1=<s>:1 VBD w0=arrived:1 w-1=John:1 . w0=.:1 w-1=arrived:1 PRP w0=I:1 w-1=<s>:1 t[-1]=<s>:1 t[-1]=<s>,t[-2]=<s>:1 VBD w0=left:1 w-1=I:1 t[-1]=PRP:1 t[-1]=PRP,t[-2]=<s>:1 . w0=.:1 w-1=left:1 t[-1]=VBD:1 t[-1]=VBD,t[-2]=PRP:1 <s> t[-1]=.:1 t[-1]=.,t[-2]=VBD:1 NNP w0=John:1 w-1=<s>:1 t[-1]=<s>:1 t[-1]=<s>,t[-2]=<s>:1 VBD w0=arrived:1 w-1=John:1 t[-1]=NNP:1 t[-1]=NNP,t[-2]=<s>:1 . w0=.:1 w-1=arrived:1 t[-1]=VBD:1 t[-1]=VBD,t[-2]=NNP:1 <s> t[-1]=.:1 t[-1]=.,t[-2]=VBD:1 The MEMM adds in features about tag context add training and test time

  11. Machine Translation

  12. Acknowledgments • Many thanks to (for helpful content and input on content): • Chris Callison-Burch, Matt Post, & Adam Lopez (JHU) • Philipp Koehn & Barry Haddow (U Edinburgh) • Kevin Knight (ISI)

  13. Non-English Internet content and user communities are increasing explosively Human translation costs are excessive: major languages range from 10-50 cents per word Translation: global problem and interesting research problem Result: the vast majority of published material remains untranslated!

  14. Prevalence of MT on the Web From Rarrick et al, 2010

  15. The Goal: (sentence) translation • 滴水之恩當以涌泉相報 • A drop of water shall be returned with a burst of spring. • Translate source sentences into target sentences • For now, ignore discourse structure, co-reference, and phenomena across sentence boundaries

  16. Types of MT systems Modified Vauquois pyramid • Source of information • Rule based: People write rules to specify translations of words, phrases • Data-driven: Use learning techniques to derive translation “rules” from data sources (e.g., parallel corpora) • Level of representation

  17. Advantages of data-driven translation • We can model the genres of documents that we would like to model • Learn contextually appropriate translations for technical data, chat data, etc. • Very flexible system • Given corpus C= ({x1,y1}, {x2,y2}, …) of sentence pairs • Translate(C, x) = y is a function of the training data and the input sentence • To build a new system (or optimize our old one) we just change the data • But…we need oodles of data to get “good” models

  18. Statistical MT • Learn word and phrase alignments from “parallel” data

  19. Statistical MT • Learn word and phrase alignments from “parallel” data • Parallel data? • Parallel documents?

  20. Statistical MT • Learn word and phrase alignments from “parallel” data • Parallel documents?

  21. Statistical MT • Learn word and phrase alignments from “parallel” data • Parallel documents?

  22. Statistical MT • Learn word and phrase alignments from “parallel” data • Parallel documents?

  23. Statistical MT • Learn word and phrase alignments from “parallel” data • Start with parallel documents • Need parallel sentences • Sentence break and sentence align • Word align and produce word and phrase translation tables (our translation models)

  24. Some Hmong

  25. Some More Hmong

  26. Even More Hmong

  27. Statistical MT • Learn word and phrase alignments from “parallel” data • Start with parallel documents • Need parallel sentences • Sentence break and sentence align • Word align and produce word and phrase translation tables (our translation models)

  28. Statistical MT • Learn word and phrase alignments from “parallel” data • Start with parallel documents • Need parallel sentences • Sentence break and sentence align • Word align and produce word and phrase translation tables (our translation models) • Use monolingual data to • Build language models • Inform ordering • Choose best translation from n-best list

  29. Statistical MT Recipe Start With Build These Components Translation Model Probs associated with aligned words & phrases – P (E|F) • Parallel sentences • Align words & phrases, & generate counts

  30. Statistical MT Recipe Start With Build These Components Translation Model Probs associated with aligned words & phrases – P (E|F) Language Model – P(E) • Parallel sentences • Align words & phrases, & generate counts • Monolingual data

  31. Statistical MT Recipe Start With Build These Components Translation Model Probs associated with aligned words & phrases – P (E|F) Language Model – P(E) Decoder Maximizes P(F|E)*P(E) • Parallel sentences • Align words & phrases, & generate counts • Monolingual data • Decoding Algorithm

  32. Statistical Machine Translation • Given foreign f, find best English translation e* e* = argmaxe P(e | f) • Use Bayes’ rule to get “noisy channel” model P(e | f) = P(f | e) ∙ P(e) / P(f) argmaxe P(e | f) = argmax P(f | e) ∙ P(e) • P(f | e) is the channelor translation model • P(e) is the language model

  33. Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp Slides 38-74 adapted from Kevin Knight and CCB’s JHU crew

  34. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

  35. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farokcrrrokhihokyorokclokkantok ok-yurp

  36. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farokcrrrokhihokyorokclokkantok ok-yurp

  37. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp ???

  38. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

  39. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihokyorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

  40. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihokyorok clok kantok ok-yurp

  41. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihokyorok clok kantok ok-yurp ???

  42. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

  43. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorokclok kantok ok-yurp process of elimination

  44. 1a. ok-voon ororok sprok . 1b. at-voon bichat dat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorokclok kantok ok-yurp cognate?

  45. 1a. ok-voonororoksprok . 1b. at-voonbichatdat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 2b. at-drubel at-voon pippat rrat dat . 8a. lalok brok anok plok nok . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 3b. totat dat arrat vat hilat . 9a. wiwok nok izok kantok ok-yurp . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 4b. at-voon krat pippat sat lat . 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 5b. totat jjat quat cat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 6b. wat dat krat quat cat . 12a. lalok rarok nok izok hihok mok . 12b. wat nnat forat arrat vat gat . Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorokclok kantok ok-yurp zero fertility

More Related