1 / 68

Bridging the Gap: Machine Translation for Lesser Resourced Languages

Bridging the Gap: Machine Translation for Lesser Resourced Languages. Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst. Mapudungun 900,000 Speakers.

karli
Télécharger la présentation

Bridging the Gap: Machine Translation for Lesser Resourced Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the Gap:Machine Translation for Lesser Resourced Languages Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst

  2. Mapudungun 900,000 Speakers Inupiaq 100’s of Speakers Katrina 100’s of Speakers Quechua 6 Million Speakers

  3. Machine Translation (MT) Source Language Target Language

  4. Machine Translation (MT) Source Language Target Language Direct Statistical MT Example Based MT

  5. Machine Translation (MT) Transfer Rule Based MT Morphologial Analysis Text Generation Syntactic Parsing + Source Language Target Language Direct Statistical MT Example Based MT

  6. Machine Translation (MT) Interlingua Semantic Analysis Sentence Planning Transfer Rule Based MT Morphologial Analysis Text Generation Syntactic Parsing + Source Language Target Language Direct Statistical MT Example Based MT

  7. Machine Translation (MT) Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Transfer Rule Based MT Morphologial Analysis Text Generation Syntactic Parsing + Source Language Target Language Direct Statistical MT Example Based MT

  8. Machine Translation (MT) Interlingua + Short development time - Requires large bilingual corpus Semantic Analysis Transfer Rule Based MT Morphologial Analysis Text Generation Syntactic Parsing + Source Language Target Language Direct Statistical MT Example Based MT

  9. Machine Translation (MT) Interlingua Semantic Analysis Our Approach Transfer Rule Based MT Morphologial Analysis Text Generation Syntactic Parsing + Source Language Target Language Direct Statistical MT Example Based MT

  10. Machine Translation (MT) Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Transfer Rule Based MT Morphologial Analysis Text Generation Syntactic Parsing + Source Language Target Language Direct Statistical MT Example Based MT

  11. Machine Translation (MT) Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Morphologial Analysis Text Generation Syntactic Parsing + Automate the development of deep-analysis MT Source Language Target Language

  12. Our Position Linguistic Structure and Bilingual Informants help automate the development of deep-analysis machine translation systems

  13. Sub-Problems • Morphology Induction • Syntax Refinement

  14. Morphology Induction 1. Linguistic Structure 2. Bilingual Informants

  15. Morphology Induction 1. Linguistic Structure 2. Bilingual Informants

  16. Paradigms Organize Morphology

  17. e.er.erá.ido.ieron.ió 28: deb, escog, ofrec, roconoc, vend, ... e.ido.ieron.ir.irá.ió 28: asist, dirig, exig, ocurr, sufr, ... azar.e.ido.ieron.ir.ió 1: sal e.er.erá.ieron.ió 32: deb, padec, romp, ... e.erá.ido.ieron.ió 28: deb, escog, ... e.er.ido.ieron.ió 46: deb, parec, recog... e.ido.ieron.irá.ió 28: asist, dirig, ... e.ido.ieron.ir.ió 39: asist, bat, sal, ... e.ido.ieron.ió 86: asist, deb, hund,... e.erá.ieron.ió 32: deb, padec, ... er.ido.ieron.ió 58: ascend, ejerc, recog, ... ido.ieron.ir.ió 44: interrump, sal, ... Paradigm Discovery in 3 Steps • Search out partial paradigms in a network of candidates • Cluster overlapping partial paradigms • Filter the clusters, keeping the largest clusters most likely to model true paradigms A portion of a Spanish paradigm candidate network

  18. Morpho Challenge 2007 Unsupervised Morphology Induction Competition • English • 3rd Place Overall • Bested the Strong Baseline Morfessor (Creutz, 2006) • German • 1st Place when Combined with Morfessor

  19. Morpho Challenge 2007 Unsupervised Morphology Induction Competition • English • 3rd Place Overall • Bested the Strong Baseline Morfessor (Creutz, 2006) • German • 1st Place when Combined with Morfessor • No Mapudungun yet • Agglutinative sequences of suffixes coming soon

  20. Our Machine Translation Architecture INPUT TEXT

  21. Our Machine Translation Architecture INPUT TEXT Morphology Analysis Lexicon Morphology Analysis

  22. Our Machine Translation Architecture INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System

  23. Our Machine Translation Architecture INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Morphology Generation Lexicon Morphology Generation

  24. Our Machine Translation Architecture INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Morphology Generation Lexicon Morphology Generation OUTPUT TEXT

  25. Our Machine Translation Architecture INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Morphology Generation Lexicon Morphology Generation OUTPUT TEXT

  26. Our Machine Translation Architecture INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Morphology Generation Lexicon Morphology Generation OUTPUT TEXT

  27. Sub-Problems • Morphology Induction • Syntax Refinement

  28. Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants

  29. Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants

  30. Linguistic Structure: Syntax • English • I didn’t see Maria Mapudungun pelafiñ Maria Spanish No vi a María

  31. Linguistic Structure: Syntax • English • I didn’t see Maria Mapudungun pelafiñ Maria pe -la -fi -ñ Maria see -neg -3.obj -1.subj.indicative Maria Spanish No vi a María No vi a María neg see.1.subj.past.indicative acc Maria

  32. pe-la-fi-ñ Maria V pe

  33. pe-la-fi-ñ Maria V pe VSuff Negation = + la

  34. pe-la-fi-ñ Maria V pe VSuffG Pass all features up VSuff la

  35. pe-la-fi-ñ Maria V pe VSuffG VSuff object person = 3 fi VSuff la

  36. pe-la-fi-ñ Maria V VSuffG pe Pass all features up from both children VSuffG VSuff fi VSuff la

  37. pe-la-fi-ñ Maria V VSuffG VSuff pe person = 1 number = sg mood = ind VSuffG VSuff ñ fi VSuff la

  38. pe-la-fi-ñ Maria V VSuffG VSuffG VSuff pe Pass all features up from both children VSuffG VSuff ñ fi VSuff la

  39. V pe-la-fi-ñ Maria Pass all features up from both children Check that: 1) negation = + 2) tense is undefined V VSuffG VSuffG VSuff pe VSuffG VSuff ñ fi VSuff la

  40. V N pe-la-fi-ñMaria NP V VSuffG person = 3 number = sg human = + VSuffG VSuff N pe VSuffG VSuff Maria ñ fi VSuff la

  41. Pass features up from V V N pe-la-fi-ñ Maria S Check that NP is human = + VP NP V VSuffG VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  42. V N Transfer to Spanish: Top-Down S S VP VP NP V VSuffG VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  43. V V N Transfer to Spanish: Top-Down Pass all features to Spanish side S S VP VP NP “a” NP V VSuffG VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  44. V V N Transfer to Spanish: Top-Down S S Pass all features down VP VP NP “a” NP V VSuffG VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  45. V V N Transfer to Spanish: Top-Down S S Pass object features down VP VP NP “a” NP V VSuffG VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  46. V V N Transfer to Spanish: Top-Down S S VP VP NP “a” NP V VSuffG Accusative marker on objects is introduced because human = + VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  47. V V N Transfer to Spanish: Top-Down S S VP VP VP::VP [VBar NP] -> [VBar "a" NP] ( (X1::Y1) (X2::Y3) ((X2 type) = (*NOT* personal)) ((X2 human) =c +) (X0 = X1) ((X0 object) = X2) (Y0 = X0) ((Y0 object) = (X0 object)) (Y1 = Y0) (Y3 = (Y0 object)) ((Y1 objmarker person) = (Y3 person)) ((Y1 objmarker number) = (Y3 number)) ((Y1 objmarker gender) = (Y3 gender))) NP “a” NP V VSuffG VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  48. V V N Transfer to Spanish: Top-Down S S Pass person, number, and mood features to Spanish Verb VP VP NP “a” NP Assign tense = past V VSuffG “no” V VSuffG VSuff N pe VSuffG VSuff ñ Maria fi VSuff la

  49. V V N Transfer to Spanish: Top-Down S S VP VP NP “a” NP V VSuffG “no” V VSuffG VSuff N pe VSuffG VSuff ñ Maria Introduced because negation = + fi VSuff la

  50. V V N Transfer to Spanish: Top-Down S S VP VP NP “a” NP V VSuffG “no” V VSuffG VSuff N pe ver VSuffG VSuff ñ Maria fi VSuff la

More Related