1 / 22

First and Second Language Models to Correct Preposition Errors

First and Second Language Models to Correct Preposition Errors. Matthieu Hermet, Alain Désilets National Research Council of Canada. Preposition Errors. A good case study : High error rate More than 17% of errors in our dataset

deanna
Télécharger la présentation

First and Second Language Models to Correct Preposition Errors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First and Second LanguageModels to Correct PrepositionErrors Matthieu Hermet, Alain Désilets National Research Council of Canada

  2. PrepositionErrors • A good case study: • High errorrate • More than 17% of errors in ourdataset • Instance of function-worderrors, correctibleusing corpus-basedmethods • Instance of interferenceerrors

  3. PrepositionErrors • 2 major causes: • Confusion withpreposition of the samesemantic class …à la conférence NAACL …at the NAACL conference …in the NAACL conference • Interferencewith L1 Écouter les intervenants Listen to the speakers Listen the speakers

  4. Approaches • Rule-based: • Mal-rules: cost of manualcreation • Syntacticconstraint relaxation: parser-dependent • Corpus-based: • Languagemodels: lowcoverage • Web as a corpus: bettercoverage • Still not enough: lessthan 40% of our data set

  5. Approach • Interferenceerrorsmaybe hard to addressproperlythroughcorpus-basedmethods • Theyrepresent a model of L2 correctness  To deal withinterferenceerrors, itmaybeadvantageous to use a model whichtakes L1 intoaccount

  6. Roundtrip MT • carry out a single round-trip translation at the level of a clause or sentence • Use a phrase-based translation system  Google Translate

  7. Roundtrip MT Send to phrase-based translation system L1 (en): “Police arrived at the scene of the crime” To L1: Policemen arrived at the crime scene Back to L2: Les policiers sont arrivés sur les lieux du crime L2 (fr): “Les policiers sont arrivés à la scène de la crime.”

  8. Theory Les policiers sont arrivés à la scène du crime

  9. Drawback • The round-trip translatedsentence can show • A wrongtranslation N’hésitez pas de me contacter  s’il vous plait contactez moi • A correct translation that uses the wrongpreposition J’ai de la difficulté de formuler des phrases  je trouve difficile de formuler des phrases • A wrong translation that usesthe correct preposition […] demandé à mon amie pour le corriger […] demandé à mon amie de le fixer

  10. Assessment • Correctnesscantakeat least twoforms: • Correct translation • Wrong translation but correct preposition Twostrategies for evaluation: • Clause: the roundtrip translation is a good correction, includingpreposition • Prep: the prepositiononlyis correct in the roundtrip translation

  11. Assessment • In the Clausestrategy, the RT translation is sent back as the correction • In the Prepstrategy, weneed a procedure to retrieve the prepositionfrom the incorrect translation  The prepositiononlyis sent back as the correction

  12. Prep • greedy mining method to retrieve the preposition from the translation • Êtreprocheàlui êtreprèsdelui • The sequences <prepà> lui == <prepde> lui validates the preposition de as a correction

  13. Unilingual • An instance of a corpus-basedapproach • Web as a probabilisticlanguage-model • Strength of an utterancemeasured in number of search hits • Practically the Web’scoverageisincomplete • Impossible to discriminatewhenzerohits are returned for all alternatives  Syntacticpruning to maximize chances of hits

  14. Pruning 1 • Sentence isparsed and reduced to a phrasalminimum around the preposition • S  VP or NP (or AP) I have lived in a smalltown all my life  lived in a smalltown I’llget a chance to meet people a chance to meet • Words are lemmatized • Verbs to Infinitive • Nouns to singular

  15. Pruning 2 • Suppressunnecessarywords • Adj, whenattributive: To live in a smalltown To live in a town This iseasy to understandeasyto understand • Adv, in all cases Call immediately for help  call for help • NP or PP Une fenêtre qui permet au soleil d’entrer … qui permet d’entrer … au soleil d’entrer

  16. Alternateprepositions • Once pruned, replace the erroneouspreposition by alternates • Most commonprepositions • De, sur, avec, par, pour, à • Prepositions of the samesemantic class • Localization, temporal, cause, goal, manner, material, possession • 1 input sentence = as many sentences as there are alternateprepositions

  17. Preposition Categories

  18. Unilingual • Input Sentence Il y a une grande fenêtre qui permet au soleil <à> entrer (there is a large window which lets the sun come in) • Syntactic Pruning and Lemmatization permettre<à> entrer + au soleil <à> entrer (let come in) (the sun come in) • Generation of alternate prepositions • semanticallyrelated: dans, en, chez, sur, sous, au, dans, après, avant, en, vers • mostcommon: de, avec, par, pour • Query and sort alternative phrases permettre d'entrer: 119 000 hits au soleil d’entrer: 397 hits permettre avant entrer: 12 hits au soleil avant entrer: 0 hits permettre à entrer: 4 hits … permettre en entrer: 2 hits ... • → preposition <d'> is returned as correction

  19. Results • Dataset: 133 sentences extractedfromintermediate-advanced FSL productions • Unilingualreturns hits in only~85% of cases • Impact of L1 on L2 inputs • Incompleteness of the Web as a language model

  20. Hybrid • Agreement between the two strategies is only 65.4% • A thirdstrategy to combine the twomodels • MT as a model of controlled incorrectness (here, anglicisms) • Web as a model of correctness

  21. Hybrid • Triggered when the unilingual approach does not give any hits  Then send to roundtrip MT - prep • Yields results of 82%

  22. Conclusion and Future Work • Unilingual and roundtrip MT equivalent • Hybridapproachseemsrelevant due to the differentparadigms of the twoapproaches • More Data • Enhancepruning • Study in the context of errordetection • Extend MT approach to othererror classes

More Related