1 / 75

Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules. Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004. Outline. Introduction Thesis statement and scope Preliminary Research

said
Télécharger la présentation

Towards Interactive and Automatic Refinement of Translation Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004

  2. Outline • Introduction • Thesis statement and scope • Preliminary Research • Interactive elicitation of error information • A framework for automatic rule adaptation • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

  3. Machine Translation (MT) • Source Language (SL) sentence: Gaudi was a great artist In Spanish, it translates as: Gaudi era un gran artista • MT System outputs : *Gaudi estaba un artista grande  *Gaudi era un artista grande Interactive and Automatic Rule Refinement

  4. Spanish Adjectives Automatic Rule Adaptation NP DET NADJ una casa grande a big house NP DET ADJ N NP DET ADJ N NP DET ADJN un gran artista a great artist Completed Work General order: grande big in size Exception: gran exceptional Interactive and Automatic Rule Refinement

  5. Commercial and Online Systems Correct Translation: Gaudi era un gran artista • Systran, Babelfish (Altavista), WorldLingo, Translated.net : • *Gaudi era gran artista • ImTranslation: *El Gaudi era un gran artista • 1-800-Translate  *Gaudi era un fenomenal artista Interactive and Automatic Rule Refinement

  6. Post-editing • Current solutions: Manual post-editing [Allen, 2003] Automated post-edition module (APE) [Allen & Hogan, 2000] Interactive and Automatic Rule Refinement

  7. Drawbacks of Current Methods • Manual post-editing  Corrections do not generalize  Gaudi era un artista grande  Juan es un amigo grande(Juan is a great friend) Era una oportunidad grande (It is a great opportunity) • APE  Humans need to predict all the errors ahead of time and code for the post-editing rules + new error  Interactive and Automatic Rule Refinement

  8. My Solution • Automate post-editing efforts by feeding them back into the MT system. • Possible alternatives: • Automatic learning of post-editing rules + system independent - several thousands of sentences might need to be corrected for the same error • Automatic refinement of translation rules + attacks the core of the problem • for transfer-based MT systems (need rules to fix!) Interactive and Automatic Rule Refinement

  9. Related Work [Corston-Oliver & Gammon, 2003] [Imamura et al. 2003] [Menezes & Richardson, 2001] [Brill, 1993] [Gavaldà, 2000] Machine Translation Rule Adaptation [Callison-Burch, 2004] [Su et al. 1995] My Thesis Post-editing No pre-existing training data required No human reference translations required Use Non-expert user feedback Interactive and Automatic Rule Refinement

  10. Resource-poor Scenarios (AVENUE) Mapudungun Quechua Aymara • Lack of electronic parallel data • Lack of manual grammar (or very small initial grammar)  Need to validate elicitation corpus and automatically learned translation rules Why bother? • Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.) • Preservation of their language and culture Interactive and Automatic Rule Refinement

  11. How is MT possible for resource-poor languages? Bilingual speakers Interactive and Automatic Rule Refinement

  12. AVENUE Project Overview Elicitation Morphology Rule Learning Run-Time System Word-Aligned Parallel Corpus Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Elicitation Corpus Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement

  13. My Thesis Elicitation Morphology Rule Learning Run-Time System Rule Refinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Corpus Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement

  14. Recycle corrections of Machine Translation output back into the system by refining and expandingexisting translation rules

  15. Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. - We can automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality. Interactive and Automatic Rule Refinement

  16. Assumptions • No parallel training data available • No human reference translations available • The SL sentence needs to be fully parsed by the translation grammar. • Bilingual speakers can give enough information about the MT errors. Interactive and Automatic Rule Refinement

  17. Scope Types of errors that: • Focus 1: can be refined fully automatically just by using correction information. • Focus 2: can be refined fully automatically using correction and error information. • Focus 3: require a reasonable amount of further user interaction and can be solved by available correction and error information. Interactive and Automatic Rule Refinement

  18. Technical Challenges Refine and expand a translation rules minimally Manually written Automatically Learned Automatic Evaluation of Refinement process Elicit minimal MT information from non-expert users Interactive and Automatic Rule Refinement

  19. Preliminary Work Interactive elicitation of error information A framework for automatic rule adaptation

  20. Interactive Elicitation of MT Errors Goal: • Simplify MT correction task maximally Challenges: • Find appropriate level of granularity for MT error classification • Design a user-friendly graphic user interface with: • SL sentence (e.g. I see them) • TL sentence (e.g. Yo veo los) • word-to-word alignments (I-yo, see-veo, them-los) • (context) Interactive and Automatic Rule Refinement

  21. MT Error Typology for RR (simplified) Completed Work Interactive elicitation of error information Missing word Extra word Wrong word order Incorrect word Wrong agreement Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Interactive and Automatic Rule Refinement

  22. TCTool (Demo) Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order Interactive and Automatic Rule Refinement

  23. 1st Eng2Spa User Study Completed Work Interactive elicitation of error information [LREC 2004] • MT error classification  9 linguistically-motivated classes: word order, sense, agreement error (number, person, gender, tense), form, incorrect wordandno translation Interactive and Automatic Rule Refinement

  24. Completed Work Automatic Rule Refinement Framework Automatic Rule Adaptation • Find best RR operations given a: • Grammar (G), • Lexicon (L), • (Set of) Source Language sentence(s) (SL), • (Set of) Target Language sentence(s) (TL), • Its Parse tree (P), and • Minimal correction of TL (TL’) such that TQ2 > TQ1 • Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) Interactive and Automatic Rule Refinement

  25. Types of Refinement Operations Automatic Rule Adaptation NP DET N ADJ NP DET N ADJ NP DET ADJ N NP DET ADJ N Completed Work 1. Refine a translation rule: R0  R1 (R0 modified, either made more specific or more general) R0: una casa bonito a nice house R1: N gender = ADJ gender a nice house una casa bonita Interactive and Automatic Rule Refinement

  26. Types of Refinement Operations (2) Automatic Rule Adaptation NP DET NADJ NP DET ADJ N NP DET ADJ N NP DET ADJN Completed Work 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (R0 modified, specific rule) R0: una casa bonita a nice house R1: un gran artista a great artist Interactive and Automatic Rule Refinement

  27. Formalizing Error Information Automatic Rule Adaptation Completed Work Wi = error Wi’ = correction Wc = clue word • Example: SL: the red car - TL: *el auto roja  TL’: el auto rojo Wi = roja Wi’ = rojo Wc = auto need to agree Interactive and Automatic Rule Refinement

  28. Triggering Feature Detection Automatic Rule Adaptation Completed Work Comparison at the feature level to detect triggering feature(s) • Delta function: (Wi,Wi’) Examples: (rojo,roja) = {gender} (comiamos,comia) = {person,number} (mujer,guitarra) = {} If  set is empty, need to postulate a new binary feature Interactive and Automatic Rule Refinement

  29. Deciding on the Refinement Op Automatic Rule Adaptation Completed Work Given: - Action performed by the user(add, delete, modify, change word order) , and - Error information is available(clue word, word alignments, etc.)  Refinement Action Interactive and Automatic Rule Refinement

  30. Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…)Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Rule Refinement Operations Interactive and Automatic Rule Refinement

  31. Proposed Work - Batch and Interactive mode User studies Evaluation

  32. Rule Refinement Example Automatic Rule Adaptation Change word order SL: Gaudí was a great artist TL: Gaudí era un artista grande Corrected TL (TL’):Gaudí era un gran artista Interactive and Automatic Rule Refinement

  33. Automatic Rule Adaptation 1. Error Information Elicitation Refinement Operation Typology Interactive and Automatic Rule Refinement

  34. 2. Variable Instantiation from Log File Automatic Rule Adaptation Correcting Actions: 1. Word order change (artista grande grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word Wc = artist In this case, even if user had not identified Wc, refinement process would have been the same Interactive and Automatic Rule Refinement

  35. 3. Retrieve Relevant Lexical Entries Automatic Rule Adaptation • No lexical entry for great  gran • Duplicate lexical entry great-grande and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) (Morphological analyzer:grande = gran) ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Interactive and Automatic Rule Refinement

  36. 4. Finding Triggering Feature(s) Automatic Rule Adaptation Feature  function: (Wi, Wi’) =   need to postulate a new binary feature: feat1 5. Blame assignment tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement

  37. 6. Variable Instantiation in the Rules Automatic Rule Adaptation Wi = grande  POSi = ADJ =Y3, y3 Wc = artista  POSc = N = Y2, y2 {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) (R0) Interactive and Automatic Rule Refinement

  38. 7. Refining Rules Automatic Rule Adaptation {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) (R1) Interactive and Automatic Rule Refinement

  39. 8. Refining Lexical Entries Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Interactive and Automatic Rule Refinement

  40. Done? Not yet Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] • Need to restrict application of general rule (R0) to just post-nominal ADJ un artista grande un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement

  41. Add Blocking Constraint Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Can we also eliminate incorrect translations automatically? un artista grande *un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement

  42. Making the grammar tighter Automatic Rule Adaptation • If Wc = artista • Add [feat1= +] to N(artista) • Add agreement constraint to NP,8 (R0) between N and ADJ ((N feat1) = (ADJ feat1)) *un artista grande *un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement

  43. Batch Mode Implementation Automatic Rule Adaptation Proposed Work • For Refinement operations of errors that can be refined : • Fully automatically just by using correction information (Focus 1) • Fully automatically using correction and error information (Focus 2) Interactive and Automatic Rule Refinement

  44. Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…)Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Rule Refinement Operations Interactive and Automatic Rule Refinement

  45. Modify Add Delete Change W Order +Wc –Wc+Wc –Wc +al –al Wi WcWi(…)Wc –Wc = +rule –rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista Focus 1 Rule Refinement Operations It is a nice house – Es una casa bonito  Es una casa bonita John and Mary fell – Juan y Maria cayeron  Juan y Maria se cayeron Interactive and Automatic Rule Refinement

  46. Modify Add Delete Change W Order +Wc –Wc+Wc –Wc +al –al Wi WcWi(…)Wc –Wc = +rule –rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1 Rule Refinement Operations J y M cayeron  J y M se cayeron Es una casa bonito  Es una casa bonita Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista I will help him fix the car – Ayudaréa él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

  47. Modify Add Delete Change W Order +Wc –Wc+Wc –Wc +al –al Wi WcWi(…)Wc –Wc = +rule –rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1 Rule Refinement Operations I would like to go – Me gustaria que ir  Me gustaria ir I will help him fix the car – Ayudaréa él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

  48. Modify Add Delete Change W Order +Wc–Wc+Wc–Wc +al –al Wi WcWi(…)Wc –Wc = +rule–rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1 & 2 Rule Refinement Operations PP  PREP NP I am proud of you – Estoy orgullosa tu Estoy orgullosa de ti Interactive and Automatic Rule Refinement

  49. Interactive Mode Implementation Automatic Rule Adaptation Proposed Work • Extra error information is required • More sentences need to be evaluated (and corrected) by users • Relevant Minimal Pairs (MP) • Focus 3: types of errors that require a reasonable amount of further user interaction and can be solved by available correction and error information. Interactive and Automatic Rule Refinement

  50. Modify Add Delete Change W Order +Wc–Wc+Wc–Wc +al –al Wi WcWi(…)Wc –Wc  =+rule –rule +al –al=WiWiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 3 Rule Refinement Operations Wally plays the guitar – Wally juega la guitarra  Wally toca la guitarra I saw the woman – Vi la mujer  Vi a la mujer I see them – Veo los  Los veo Interactive and Automatic Rule Refinement

More Related