Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004

Outline • Introduction • Thesis statement and scope • Preliminary Research • Interactive elicitation of error information • A framework for automatic rule adaptation • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

Machine Translation (MT) • Source Language (SL) sentence: Gaudi was a great artist In Spanish, it translates as: Gaudi era un gran artista • MT System outputs : *Gaudi estaba un artista grande  *Gaudi era un artista grande Interactive and Automatic Rule Refinement

Spanish Adjectives Automatic Rule Adaptation NP DET NADJ una casa grande a big house NP DET ADJ N NP DET ADJ N NP DET ADJN un gran artista a great artist Completed Work General order: grande big in size Exception: gran exceptional Interactive and Automatic Rule Refinement

Commercial and Online Systems Correct Translation: Gaudi era un gran artista • Systran, Babelfish (Altavista), WorldLingo, Translated.net : • *Gaudi era gran artista • ImTranslation: *El Gaudi era un gran artista • 1-800-Translate  *Gaudi era un fenomenal artista Interactive and Automatic Rule Refinement

Post-editing • Current solutions: Manual post-editing [Allen, 2003] Automated post-edition module (APE) [Allen & Hogan, 2000] Interactive and Automatic Rule Refinement

Drawbacks of Current Methods • Manual post-editing  Corrections do not generalize  Gaudi era un artista grande  Juan es un amigo grande(Juan is a great friend) Era una oportunidad grande (It is a great opportunity) • APE  Humans need to predict all the errors ahead of time and code for the post-editing rules + new error  Interactive and Automatic Rule Refinement

My Solution • Automate post-editing efforts by feeding them back into the MT system. • Possible alternatives: • Automatic learning of post-editing rules + system independent - several thousands of sentences might need to be corrected for the same error • Automatic refinement of translation rules + attacks the core of the problem • for transfer-based MT systems (need rules to fix!) Interactive and Automatic Rule Refinement

Related Work [Corston-Oliver & Gammon, 2003] [Imamura et al. 2003] [Menezes & Richardson, 2001] [Brill, 1993] [Gavaldà, 2000] Machine Translation Rule Adaptation [Callison-Burch, 2004] [Su et al. 1995] My Thesis Post-editing No pre-existing training data required No human reference translations required Use Non-expert user feedback Interactive and Automatic Rule Refinement

Resource-poor Scenarios (AVENUE) Mapudungun Quechua Aymara • Lack of electronic parallel data • Lack of manual grammar (or very small initial grammar)  Need to validate elicitation corpus and automatically learned translation rules Why bother? • Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.) • Preservation of their language and culture Interactive and Automatic Rule Refinement

How is MT possible for resource-poor languages? Bilingual speakers Interactive and Automatic Rule Refinement

AVENUE Project Overview Elicitation Morphology Rule Learning Run-Time System Word-Aligned Parallel Corpus Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Elicitation Corpus Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement

My Thesis Elicitation Morphology Rule Learning Run-Time System Rule Refinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Corpus Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement

Recycle corrections of Machine Translation output back into the system by refining and expandingexisting translation rules

Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. - We can automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality. Interactive and Automatic Rule Refinement

Assumptions • No parallel training data available • No human reference translations available • The SL sentence needs to be fully parsed by the translation grammar. • Bilingual speakers can give enough information about the MT errors. Interactive and Automatic Rule Refinement

Scope Types of errors that: • Focus 1: can be refined fully automatically just by using correction information. • Focus 2: can be refined fully automatically using correction and error information. • Focus 3: require a reasonable amount of further user interaction and can be solved by available correction and error information. Interactive and Automatic Rule Refinement

Technical Challenges Refine and expand a translation rules minimally Manually written Automatically Learned Automatic Evaluation of Refinement process Elicit minimal MT information from non-expert users Interactive and Automatic Rule Refinement

Preliminary Work Interactive elicitation of error information A framework for automatic rule adaptation

Interactive Elicitation of MT Errors Goal: • Simplify MT correction task maximally Challenges: • Find appropriate level of granularity for MT error classification • Design a user-friendly graphic user interface with: • SL sentence (e.g. I see them) • TL sentence (e.g. Yo veo los) • word-to-word alignments (I-yo, see-veo, them-los) • (context) Interactive and Automatic Rule Refinement

MT Error Typology for RR (simplified) Completed Work Interactive elicitation of error information Missing word Extra word Wrong word order Incorrect word Wrong agreement Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Interactive and Automatic Rule Refinement

TCTool (Demo) Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order Interactive and Automatic Rule Refinement

1st Eng2Spa User Study Completed Work Interactive elicitation of error information [LREC 2004] • MT error classification  9 linguistically-motivated classes: word order, sense, agreement error (number, person, gender, tense), form, incorrect wordandno translation Interactive and Automatic Rule Refinement

Completed Work Automatic Rule Refinement Framework Automatic Rule Adaptation • Find best RR operations given a: • Grammar (G), • Lexicon (L), • (Set of) Source Language sentence(s) (SL), • (Set of) Target Language sentence(s) (TL), • Its Parse tree (P), and • Minimal correction of TL (TL’) such that TQ2 > TQ1 • Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) Interactive and Automatic Rule Refinement

Types of Refinement Operations Automatic Rule Adaptation NP DET N ADJ NP DET N ADJ NP DET ADJ N NP DET ADJ N Completed Work 1. Refine a translation rule: R0  R1 (R0 modified, either made more specific or more general) R0: una casa bonito a nice house R1: N gender = ADJ gender a nice house una casa bonita Interactive and Automatic Rule Refinement

Types of Refinement Operations (2) Automatic Rule Adaptation NP DET NADJ NP DET ADJ N NP DET ADJ N NP DET ADJN Completed Work 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (R0 modified, specific rule) R0: una casa bonita a nice house R1: un gran artista a great artist Interactive and Automatic Rule Refinement

Formalizing Error Information Automatic Rule Adaptation Completed Work Wi = error Wi’ = correction Wc = clue word • Example: SL: the red car - TL: *el auto roja  TL’: el auto rojo Wi = roja Wi’ = rojo Wc = auto need to agree Interactive and Automatic Rule Refinement

Triggering Feature Detection Automatic Rule Adaptation Completed Work Comparison at the feature level to detect triggering feature(s) • Delta function: (Wi,Wi’) Examples: (rojo,roja) = {gender} (comiamos,comia) = {person,number} (mujer,guitarra) = {} If  set is empty, need to postulate a new binary feature Interactive and Automatic Rule Refinement

Deciding on the Refinement Op Automatic Rule Adaptation Completed Work Given: - Action performed by the user(add, delete, modify, change word order) , and - Error information is available(clue word, word alignments, etc.)  Refinement Action Interactive and Automatic Rule Refinement

Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…)Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Rule Refinement Operations Interactive and Automatic Rule Refinement

Proposed Work - Batch and Interactive mode User studies Evaluation

Rule Refinement Example Automatic Rule Adaptation Change word order SL: Gaudí was a great artist TL: Gaudí era un artista grande Corrected TL (TL’):Gaudí era un gran artista Interactive and Automatic Rule Refinement

Automatic Rule Adaptation 1. Error Information Elicitation Refinement Operation Typology Interactive and Automatic Rule Refinement

2. Variable Instantiation from Log File Automatic Rule Adaptation Correcting Actions: 1. Word order change (artista grande grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word Wc = artist In this case, even if user had not identified Wc, refinement process would have been the same Interactive and Automatic Rule Refinement

3. Retrieve Relevant Lexical Entries Automatic Rule Adaptation • No lexical entry for great  gran • Duplicate lexical entry great-grande and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) (Morphological analyzer:grande = gran) ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Interactive and Automatic Rule Refinement

4. Finding Triggering Feature(s) Automatic Rule Adaptation Feature  function: (Wi, Wi’) =   need to postulate a new binary feature: feat1 5. Blame assignment tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement

6. Variable Instantiation in the Rules Automatic Rule Adaptation Wi = grande  POSi = ADJ =Y3, y3 Wc = artista  POSc = N = Y2, y2 {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) (R0) Interactive and Automatic Rule Refinement

7. Refining Rules Automatic Rule Adaptation {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) (R1) Interactive and Automatic Rule Refinement

8. Refining Lexical Entries Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Interactive and Automatic Rule Refinement

Done? Not yet Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] • Need to restrict application of general rule (R0) to just post-nominal ADJ un artista grande un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement

Add Blocking Constraint Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Can we also eliminate incorrect translations automatically? un artista grande *un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement

Making the grammar tighter Automatic Rule Adaptation • If Wc = artista • Add [feat1= +] to N(artista) • Add agreement constraint to NP,8 (R0) between N and ADJ ((N feat1) = (ADJ feat1)) *un artista grande *un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement

Batch Mode Implementation Automatic Rule Adaptation Proposed Work • For Refinement operations of errors that can be refined : • Fully automatically just by using correction information (Focus 1) • Fully automatically using correction and error information (Focus 2) Interactive and Automatic Rule Refinement

Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…)Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Rule Refinement Operations Interactive and Automatic Rule Refinement

Modify Add Delete Change W Order +Wc –Wc+Wc –Wc +al –al Wi WcWi(…)Wc –Wc = +rule –rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista Focus 1 Rule Refinement Operations It is a nice house – Es una casa bonito  Es una casa bonita John and Mary fell – Juan y Maria cayeron  Juan y Maria se cayeron Interactive and Automatic Rule Refinement

Modify Add Delete Change W Order +Wc –Wc+Wc –Wc +al –al Wi WcWi(…)Wc –Wc = +rule –rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1 Rule Refinement Operations J y M cayeron  J y M se cayeron Es una casa bonito  Es una casa bonita Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista I will help him fix the car – Ayudaréa él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

Modify Add Delete Change W Order +Wc –Wc+Wc –Wc +al –al Wi WcWi(…)Wc –Wc = +rule –rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1 Rule Refinement Operations I would like to go – Me gustaria que ir  Me gustaria ir I will help him fix the car – Ayudaréa él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

Modify Add Delete Change W Order +Wc–Wc+Wc–Wc +al –al Wi WcWi(…)Wc –Wc = +rule–rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1 & 2 Rule Refinement Operations PP  PREP NP I am proud of you – Estoy orgullosa tu Estoy orgullosa de ti Interactive and Automatic Rule Refinement

Interactive Mode Implementation Automatic Rule Adaptation Proposed Work • Extra error information is required • More sentences need to be evaluated (and corrected) by users • Relevant Minimal Pairs (MP) • Focus 3: types of errors that require a reasonable amount of further user interaction and can be solved by available correction and error information. Interactive and Automatic Rule Refinement

Modify Add Delete Change W Order +Wc–Wc+Wc–Wc +al –al Wi WcWi(…)Wc –Wc  =+rule –rule +al –al=WiWiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 3 Rule Refinement Operations Wally plays the guitar – Wally juega la guitarra  Wally toca la guitarra I saw the woman – Vi la mujer  Vi a la mujer I see them – Veo los  Los veo Interactive and Automatic Rule Refinement

Towards Interactive and Automatic Refinement of Translation Rules