1 / 82

MT for Languages with Limited Resources

MT for Languages with Limited Resources. 11-731 Machine Translation April 20, 2011 Based on Joint Work with: Lori Levin, Jaime Carbonell, Stephan Vogel, Shuly Wintner, Danny Shacham, Katharina Probst, Erik Peterson, Christian Monson, Roberto Aranovich and Ariadna Font-Llitjos.

elisha
Télécharger la présentation

MT for Languages with Limited Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MT for Languages with Limited Resources 11-731 Machine Translation April 20, 2011 Based on Joint Work with: Lori Levin, Jaime Carbonell, Stephan Vogel, Shuly Wintner, Danny Shacham, Katharina Probst, Erik Peterson, Christian Monson, Roberto Aranovich and Ariadna Font-Llitjos

  2. Why Machine Translation for Minority and Indigenous Languages? • Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) • Is there hope for MT for languages with very limited resources? • Benefits include: • Better government access to indigenous communities (Epidemics, crop failures, etc.) • Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. • Language preservation • Civilian and military applications (disaster relief) 11-731 Machine Translation

  3. MT for Minority and Indigenous Languages: Challenges • Minimal amount of parallel text • Possibly competing standards for orthography/spelling • Often relatively few trained linguists • Access to native informants possible • Need to minimize development time and cost 11-731 Machine Translation

  4. MT for Low Resource Languages • Possible Approaches: • Phrase-based SMT, with whatever small amounts of parallel data that is available • Build a rule-based system – need for bilingual experts and resources • Hybrid approaches, such as the AVENUE Project (Stat-XFER) approach: • Incorporate acquired manual resources within a general statistical framework • Augment with targeted elicitation and resource acquisition from bilingual non-experts 11-731 Machine Translation

  5. CMU Statistical Transfer (Stat-XFER) MT Approach Integrate the major strengths of rule-based and statistical MT within a common framework: Linguistically rich formalism that can express complex and abstract compositional transfer rules Rules can be written by human experts and also acquired automatically from data Easy integration of morphological analyzers and generators Word and syntactic-phrase correspondences can be automatically acquired from parallel text Search-based decoding from statistical MT adapted to find the best translation within the search space: multi-feature scoring, beam-search, parameter optimization, etc. Framework suitable for both resource-rich and resource-poor language scenarios 11-731 Machine Translation

  6. Stat-XFER Main Principles Framework: Statistical search-based approach with syntactic translation transfer rules that can be acquired from data but also developed and extended by experts Automatic Word and Phrase translation lexicon acquisition from parallel data Transfer-rule Learning: apply ML-based methods to automatically acquire syntactic transfer rules for translation between the two languages Elicitation: use bilingual native informants to produce a small high-quality word-aligned bilingual corpus of translated phrases and sentences Rule Refinement: refine the acquired rules via a process of interaction with bilingual informants XFER + Decoder: XFER engine produces a lattice of possible transferred structures at all levels Decoder searches and selects the best scoring combination 11-731 Machine Translation

  7. Stat-XFER Framework Language Model Weighted Features Preprocessing Morphology Transfer Rules Translation Lattice Transfer Engine Second-Stage Decoder Bilingual Lexicon Target Output Source Input 11-731 Machine Translation

  8. Hebrew Input בשורה הבאה Preprocessing Morphology Transfer Rules English Language Model {NP1,3} NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1] ((X3::Y1) (X1::Y2) ((X1 def) = +) ((X1 status) =c absolute) ((X1 num) = (X3 num)) ((X1 gen) = (X3 gen)) (X0 = X1)) Transfer Engine Translation Lexicon Decoder N::N |: ["$WR"] -> ["BULL"] ((X1::Y1) ((X0 NUM) = s) ((Y0 lex) = "BULL")) N::N |: ["$WRH"] -> ["LINE"] ((X1::Y1) ((X0 NUM) = s) ((Y0 lex) = "LINE")) Translation Output Lattice (0 1 "IN" @PREP) (1 1 "THE" @DET) (2 2 "LINE" @N) (1 2 "THE LINE" @NP) (0 2 "IN LINE" @PP) (0 4 "IN THE NEXT LINE" @PP) English Output in the next line

  9. Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) Transfer Rule Formalism ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) ) 11-731 Machine Translation

  10. Value constraints Agreement constraints Transfer Rule Formalism (II) ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) ) 11-731 Machine Translation

  11. Translation Lexicon: Hebrew-to-English Examples(Semi-manually-developed) PRO::PRO |: ["ANI"] -> ["I"] ( (X1::Y1) ((X0 per) = 1) ((X0 num) = s) ((X0 case) = nom) ) PRO::PRO |: ["ATH"] -> ["you"] ( (X1::Y1) ((X0 per) = 2) ((X0 num) = s) ((X0 gen) = m) ((X0 case) = nom) ) N::N |: ["$&H"] -> ["HOUR"] ( (X1::Y1) ((X0 NUM) = s) ((Y0 NUM) = s) ((Y0 lex) = "HOUR") ) N::N |: ["$&H"] -> ["hours"] ( (X1::Y1) ((Y0 NUM) = p) ((X0 NUM) = p) ((Y0 lex) = "HOUR") ) 11-731 Machine Translation

  12. Translation Lexicon: French-to-English Examples(Automatically-acquired) DET::DET |: [“le"] -> [“the"] ( (X1::Y1) ) Prep::Prep |:[“dans”] -> [“in”] ( (X1::Y1) ) N::N |: [“principes"] -> [“principles"] ( (X1::Y1) ) N::N |: [“respect"] -> [“accordance"] ( (X1::Y1) ) NP::NP |: [“le respect"] -> [“accordance"] ( ) PP::PP |: [“dans le respect"] -> [“in accordance"] ( ) PP::PP |: [“des principes"] -> [“with the principles"] ( ) 11-731 Machine Translation

  13. Hebrew-English Transfer GrammarExample Rules(Manually-developed) {NP1,2} ;;SL: $MLH ADWMH ;;TL: A RED DRESS NP1::NP1 [NP1 ADJ] -> [ADJ NP1] ( (X2::Y1) (X1::Y2) ((X1 def) = -) ((X1 status) =c absolute) ((X1 num) = (X2 num)) ((X1 gen) = (X2 gen)) (X0 = X1) ) {NP1,3} ;;SL: H $MLWT H ADWMWT ;;TL: THE RED DRESSES NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1] ( (X3::Y1) (X1::Y2) ((X1 def) = +) ((X1 status) =c absolute) ((X1 num) = (X3 num)) ((X1 gen) = (X3 gen)) (X0 = X1) ) 11-731 Machine Translation

  14. French-English Transfer GrammarExample Rules(Automatically-acquired) {PP,24691} ;;SL: des principes ;;TL: with the principles PP::PP [“des” N] -> [“with the” N] ( (X1::Y1) ) {PP,312} ;;SL: dans le respect des principes ;;TL: in accordance with the principles PP::PP [Prep NP] -> [Prep NP] ( (X1::Y1) (X2::Y2) ) 11-731 Machine Translation

  15. The Transfer Engine Input: source-language input sentence, or source-language confusion network Output: lattice representing collection of translation fragments at all levels supported by transfer rules Basic Algorithm: “bottom-up” integrated “parsing-transfer-generation” chart-parser guided by the synchronous transfer rules Start with translations of individual words and phrases from translation lexicon Create translations of larger constituents by applying applicable transfer rules to previously created lattice entries Beam-search controls the exponential combinatorics of the search-space, using multiple scoring features 11-731 Machine Translation

  16. The Transfer Engine Some Unique Features: Works with either learned or manually-developed transfer grammars Handles rules with or without unification constraints Supports interfacing with servers for morphological analysis and generation Can handle ambiguous source-word analyses and/or SL segmentations represented in the form of lattice structures 11-731 Machine Translation

  17. Hebrew Example(From [Lavie et al., 2004]) Input word: B$WRH 0 1 2 3 4 |--------B$WRH--------| |-----B-----|$WR|--H--| |--B--|-H--|--$WRH---| 11-731 Machine Translation

  18. Hebrew Example(From [Lavie et al., 2004]) Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE)) Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET)) Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE)) 11-731 Machine Translation

  19. XFER Output Lattice (28 28 "AND" -5.6988 "W" "(CONJ,0 'AND')") (29 29 "SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE')) ") (29 29 "SINCE THEN" -12.0165 "MAZ " "(ADVP,0 (ADV,6 'SINCE THEN')) ") (29 29 "EVER SINCE" -12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE')) ") (30 30 "WORKED" -10.9913 "&BD " "(VERB,0 (V,11 'WORKED')) ") (30 30 "FUNCTIONED" -16.0023 "&BD " "(VERB,0 (V,10 'FUNCTIONED')) ") (30 30 "WORSHIPPED" -17.3393 "&BD " "(VERB,0 (V,12 'WORSHIPPED')) ") (30 30 "SERVED" -11.5161 "&BD " "(VERB,0 (V,14 'SERVED')) ") (30 30 "SLAVE" -13.9523 "&BD " "(NP0,0 (N,34 'SLAVE')) ") (30 30 "BONDSMAN" -18.0325 "&BD " "(NP0,0 (N,36 'BONDSMAN')) ") (30 30 "A SLAVE" -16.8671 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,34 'SLAVE')) ) ) ) ") (30 30 "A BONDSMAN" -21.0649 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ") 11-731 Machine Translation

  20. The Lattice Decoder Stack Decoder, similar to standard Statistical MT decoders Searches for best-scoring path of non-overlapping lattice arcs No reordering during decoding Scoring based on log-linear combination of scoring features, with weights trained using Minimum Error Rate Training (MERT) Scoring components: Statistical Language Model Bi-directional MLE phrase and rule scores Lexical Probabilities Fragmentation: how many arcs to cover the entire translation? Length Penalty: how far from expected target length? 11-731 Machine Translation

  21. XFER Lattice Decoder 0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT TO A MORNING MEAL Overall: -8.18323, Prob: -94.382, Rules: 0, Frag: 0.153846, Length: 0, Words: 13,13 235 < 0 8 -19.7602: B H IWM RBI&I (PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE') (NP2,0 (NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1 (N,6 'DAY')))))))> 918 < 8 14 -46.2973: H ARIH AKL AT H $PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0 'ATE'))(NP,100 (NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,24 'RABBIT')))))))> 584 < 14 17 -30.6607: L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32 'MORNING'))(NP0,0 (N,27 'MEAL')))))))> 11-731 Machine Translation

  22. Stat-XFER MT Systems General Stat-XFER framework under development for past nine years Systems so far: Chinese-to-English French-to-English Hebrew-to-English Urdu-to-English German-to-English Hindi-to-English Dutch-to-English Turkish-to-English Mapudungun-to-Spanish Arabic-to-English Brazilian Portuguese-to-English English-to-Arabic Hebrew-to-Arabic 11-731 Machine Translation

  23. Learning Transfer-Rules for Languages with Limited Resources • Rationale: • Large bilingual corpora not available • Bilingual native informant(s) can translate and align a small pre-designed elicitation corpus, using elicitation tool • Elicitation corpus designed to be typologically comprehensive and compositional • Transfer-rule engine and rule learning approach support acquisition of generalized transfer-rules from the data 11-731 Machine Translation

  24. English-Chinese Example 11-731 Machine Translation

  25. English-Hindi Example 11-731 Machine Translation

  26. Spanish-Mapudungun Example 11-731 Machine Translation

  27. English-Arabic Example 11-731 Machine Translation

  28. The Typological Elicitation Corpus • Translated, aligned by bilingual informant • Corpus consists of linguistically diverse constructions • Based on elicitation and documentation work of field linguists (e.g. Comrie 1977, Bouquiaux 1992) • Organized compositionally: elicit simple structures first, then use them as building blocks • Goal: minimize size, maximize linguistic coverage 11-731 Machine Translation

  29. The Structural Elicitation Corpus • Designed to cover the most common phrase structures of English  learn how these structures map onto their equivalents in other languages • Constructed using the constituent parse trees from the Penn TreeBank • Extracted and frequency ranked all rules in parse trees • Selected top ~200 rules, filtered idiosyncratic cases • Revised lexical choices within examples • Goal: minimize size, maximize linguistic coverage of structures 11-731 Machine Translation

  30. The Structural Elicitation Corpus Examples: srcsent: in the forest tgtsent: B H I&R aligned: ((1,1),(2,2),(3,3)) context: C-Structure:(<PP> (PREP in-1) (<NP> (DET the-2) (N forest-3))) srcsent: steps tgtsent: MDRGWT aligned: ((1,1)) context: C-Structure:(<NP> (N steps-1)) srcsent: the boy ate the apple tgtsent: H ILD AKL AT H TPWX aligned: ((1,1),(2,2),(3,3),(4,5),(5,6)) context: C-Structure:(<S> (<NP> (DET the-1) (N boy-2)) (<VP> (V ate-3) (<NP> (DET the-4)(N apple-5)))) srcsent: the first year tgtsent: H $NH H RA$WNH aligned: ((1,1 3),(2,4),(3,2)) context: C-Structure:(<NP> (DET the-1) (<ADJP> (ADJ first-2)) (N year-3)) 11-731 Machine Translation

  31. A Limited Data Scenario for Hindi-to-English • Conducted during a DARPA “Surprise Language Exercise” (SLE) in June 2003 • Put together a scenario with “miserly” data resources: • Elicited Data corpus: 17589 phrases • Cleaned portion (top 12%) of LDC dictionary: ~2725 Hindi words (23612 translation pairs) • Manually acquired resources during the SLE: • 500 manual bigram translations • 72 manually written phrase transfer rules • 105 manually written postposition rules • 48 manually written time expression rules • No additional parallel text!! 11-731 Machine Translation

  32. Examples of Learned Rules (Hindi-to-English) 11-731 Machine Translation

  33. Manual Transfer Rules: Hindi Example ;; PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT VERB ;; passive of 43 (7b) {VP,28} VP::VP : [V V V] -> [Aux V] ( (X1::Y2) ((x1 form) = root) ((x2 type) =c light) ((x2 form) = part) ((x2 aspect) = perf) ((x3 lexwx) = 'jAnA') ((x3 form) = part) ((x3 aspect) = perf) (x0 = x1) ((y1 lex) = be) ((y1 tense) = past) ((y1 agr num) = (x3 agr num)) ((y1 agr pers) = (x3 agr pers)) ((y2 form) = part) ) 11-731 Machine Translation

  34. Manual Transfer Rules: Example NP PP NP1 NP P Adj N N1 ke eka aXyAya N jIvana NP NP1 PP Adj N P NP one chapter of N1 N life ; NP1 ke NP2 -> NP2 of NP1 ; Ex: jIvana ke eka aXyAya ; life of (one) chapter ; ==> a chapter of life ; {NP,12} NP::NP : [PP NP1] -> [NP1 PP] ( (X1::Y2) (X2::Y1) ; ((x2 lexwx) = 'kA') ) {NP,13} NP::NP : [NP1] -> [NP1] ( (X1::Y1) ) {PP,12} PP::PP : [NP Postp] -> [Prep NP] ( (X1::Y2) (X2::Y1) ) 11-731 Machine Translation

  35. Manual Grammar Development • Covers mostly NPs, PPs and VPs (verb complexes) • ~70 grammar rules, covering basic and recursive NPs and PPs, verb complexes of main tenses in Hindi (developed in two weeks) 11-731 Machine Translation

  36. Testing Conditions • Tested on section of JHU provided data: 258 sentences with four reference translations • SMT system (stand-alone) • EBMT system (stand-alone) • XFER system (naïve decoding) • XFER system with “strong” decoder • No grammar rules (baseline) • Manually developed grammar rules • Automatically learned grammar rules • XFER+SMT with strong decoder (MEMT) 11-731 Machine Translation

  37. Results on JHU Test Set 11-731 Machine Translation

  38. Effect of Reordering in the Decoder 11-731 Machine Translation

  39. Observations and Lessons (I) • XFER with strong decoder outperformed SMT even without any grammar rules in the miserly data scenario • SMT Trained on elicited phrases that are very short • SMT has insufficient data to train more discriminative translation probabilities • XFER takes advantage of Morphology • Token coverage without morphology: 0.6989 • Token coverage with morphology: 0.7892 • Manual grammar was somewhat better than automatically learned grammar • Learned rules were very simple • Large room for improvement on learning rules 11-731 Machine Translation

  40. Observations and Lessons (II) • MEMT (XFER and SMT) based on strong decoder produced best results in the miserly scenario. • Reordering within the decoder provided very significant score improvements • Much room for more sophisticated grammar rules • Strong decoder can carry some of the reordering “burden” 11-731 Machine Translation

  41. Modern Hebrew • Native language of about 3-4 Million in Israel • Semitic language, closely related to Arabic and with similar linguistic properties • Root+Pattern word formation system • Rich verb and noun morphology • Particles attach as prefixed to the following word: definite article (H), prepositions (B,K,L,M), coordinating conjuction (W), relativizers ($,K$)… • Unique alphabet and Writing System • 22 letters represent (mostly) consonants • Vowels represented (mostly) by diacritics • Modern texts omit the diacritic vowels, thus additional level of ambiguity: “bare” word  word • Example: MHGR  mehager, m+hagar, m+h+ger 11-731 Machine Translation

  42. Modern Hebrew Spelling • Two main spelling variants • “KTIV XASER” (difficient): spelling with the vowel diacritics, and consonant words when the diacritics are removed • “KTIV MALEH” (full): words with I/O/U vowels are written with long vowels which include a letter • KTIV MALEH is predominant, but not strictly adhered to even in newspapers and official publications  inconsistent spelling • Example: • niqud (spelling): NIQWD, NQWD, NQD • When written as NQD, could also be niqed, naqed, nuqad 11-731 Machine Translation

  43. Challenges for Hebrew MT • Puacity in existing language resources for Hebrew • No publicly available broad coverage morphological analyzer • No publicly available bilingual lexicons or dictionaries • No POS-tagged corpus or parse tree-bank corpus for Hebrew • No large Hebrew/English parallel corpus • Scenario well suited for CMU transfer-based MT framework for languages with limited resources 11-731 Machine Translation

  44. Morphological Analyzer • We use a publicly available morphological analyzer distributed by the Technion’s Knowledge Center, adapted for our system • Coverage is reasonable (for nouns, verbs and adjectives) • Produces all analyses or a disambiguated analysis for each word • Output format includes lexeme (base form), POS, morphological features • Output was adapted to our representation needs (POS and feature mappings) 11-731 Machine Translation

  45. Morphology Example • Input word: B$WRH 0 1 2 3 4 |--------B$WRH--------| |-----B-----|$WR|--H--| |--B--|-H--|--$WRH---| 11-731 Machine Translation

  46. Morphology Example Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE)) Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET)) Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE)) 11-731 Machine Translation

  47. Translation Lexicon • Constructed our own Hebrew-to-English lexicon, based primarily on existing “Dahan” H-to-E and E-to-H dictionary made available to us, augmented by other public sources • Coverage is not great but not bad as a start • Dahan H-to-E is about 15K translation pairs • Dahan E-to-H is about 7K translation pairs • Base forms, POS information on both sides • Converted Dahan into our representation, added entries for missing closed-class entries (pronouns, prepositions, etc.) • Had to deal with spelling conventions • Recently augmented with ~50K translation pairs extracted from Wikipedia (mostly proper names and named entities) 11-731 Machine Translation

  48. Manual Transfer Grammar (human-developed) • Initially developed by Alon in a couple of days, extended and revised by Nurit over time • Current grammar has 36 rules: • 21 NP rules • one PP rule • 6 verb complexes and VP rules • 8 higher-phrase and sentence-level rules • Captures the most common (mostly local) structural differences between Hebrew and English 11-731 Machine Translation

  49. Transfer GrammarExample Rules {NP1,2} ;;SL: $MLH ADWMH ;;TL: A RED DRESS NP1::NP1 [NP1 ADJ] -> [ADJ NP1] ( (X2::Y1) (X1::Y2) ((X1 def) = -) ((X1 status) =c absolute) ((X1 num) = (X2 num)) ((X1 gen) = (X2 gen)) (X0 = X1) ) {NP1,3} ;;SL: H $MLWT H ADWMWT ;;TL: THE RED DRESSES NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1] ( (X3::Y1) (X1::Y2) ((X1 def) = +) ((X1 status) =c absolute) ((X1 num) = (X3 num)) ((X1 gen) = (X3 gen)) (X0 = X1) ) 11-731 Machine Translation

  50. Hebrew-to-English MT Prototype • Initial prototype developed within a two month intensive effort • Accomplished: • Adapted available morphological analyzer • Constructed a preliminary translation lexicon • Translated and aligned Elicitation Corpus • Learned XFER rules • Developed (small) manual XFER grammar • System debugging and development • Evaluated performance on unseen test data using automatic evaluation metrics 11-731 Machine Translation

More Related