1 / 21

Automatic translation verification: can we rely on MOSES? 7-8 June 2018 – Paris

This seminar discusses the use of machine translation verification in large-scale assessments like the SHARE survey. It explores the benefits of using tools like MOSES to improve translation accuracy, data quality, and reduce costs. The seminar also presents an experiment using back-translation and evaluation metrics like BLEU score and cosine similarity.

eddiel
Télécharger la présentation

Automatic translation verification: can we rely on MOSES? 7-8 June 2018 – Paris

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OECD-GESIS Seminar on Translating and Adapting Instruments in Large Scale Assessments Automatic translation verification: can we rely on MOSES?7-8 June 2018 – Paris Yuri PettinicchiJeny Tony Philip

  2. Motivation • Why do we need to use machines to verify translation? • General Aim • Avoiding avoidable mistakes • Improving data quality • Specific to SHARE: • SHARE - cross-national survey • 28 national CAPIs • 40 languages • Translation verification: • National CAPI tools should correctly display translated text • Green flag to go on-field

  3. Background • Workflow • Workload • Volume • Time • Costs • Possible issues

  4. Workflow Area Coordinators Qnn TMT CentERdata QxQ CTs -Translators National CAPI

  5. Workload - 1 Multiple fills in the translation Dynamic fill with hidden text Gender specific fills Numbered answer options

  6. Workload - 2 • Volume • Time 1 year • Costs 12 PM per language (480PM)

  7. Issues • Misspelling a word (minor) • Empty fields / missing sentence • (full sentence vs part of a sentence) • Flipped translations (major) • Negative effects on questionnaire routing

  8. Automatic Translation verification • Ideal environment • High volume of data • Several languages • Available in the translator working environment • Expected results • Improving quality of the translation • Reducing testing time

  9. Experiment design • Automatic back translation • Automatic translation of French version back to English using web interfaces provided by third parties • Compute the BLEU index • Calculate BLEU score per item using Python’s NLTK library for symbolic and statistical natural language processing. • The Bilingual Evaluation Understudy Score(BLEU) is used to evaluate a generated sentence in comparison to a reference sentence. • A perfect match produces a score of 1.0, whereas a perfect mismatch produces a score of 0.0. • Compute the Similarity index • Calculate (cosine) similarity score per item using SciPy open-source Python-based solution for mathematics, science and engineering. • Cosine similarity is computed by representing datasets as vectors(vectorization) and then comparing the angles formed by the vectors • Check flagged items with the help of human translators

  10. Back-translation • Machine translation: • Trained on parallel corpora

  11. The Moses Experience - 1 • Statistical machine translation system (SMT) • Trained on parallel corpora • News-commentary • Europarl v7.0 • UN Corpus • Open source software: http://www.statmt.org/moses/

  12. The Moses Experience - 2 MOSES - Statistical translation machine Training: Performance BLEU (Bilingual Evaluation Understudy) score: 22.95

  13. The SHARE questionnaire

  14. Evaluations

  15. Results – BLEU Score

  16. Results – Similarity Score

  17. Flagged

  18. Conclusions • LessonsLearnt: • „In-house“ solutionsrequire a heavy investmentof time, manpowerand IT infrastructure. • Market solutionsaremoreuserfriendly but are limited bytheir „onesizefits all“ design. • Overall, marketsolutionsaremoreeffective. • Scopeforcollaborationacrosscross-national surveys • Compilationof a domainspecific parallel corpora. • Joint efforttobuilda machinetranslationtool.

  19. Questions?

  20. Translation verification in SHARE

  21. Overview of the third-party technology • What was used: • web interfaces • ScyPy • NLTK Word2vec • Measurements taken: • Cosine similarity (or similarity) • BLEU score VER 1 • Setting and Performance details

More Related