1 / 22

A Brief History

. . Using Machine Translation for fast, inexpensive, and accurate health information assimilation and dissemination: Experiences at the Pan American Health Organization Julia Aymerich. A Brief History. 1976 First contract, Spanish-English 1979 In-house development began

stefanie
Télécharger la présentation

A Brief History

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. . . Using Machine Translation for fast, inexpensive, and accurate health information assimilation and dissemination: Experiences at the Pan American Health Organization Julia Aymerich ICLM-9 September 2005

  2. A Brief History 1976 First contract, Spanish-English 1979 In-house development began 1980 Spanish-English on mainframe 1985 English-Spanish on mainframe 1992 PC version on PAHO LAN 2000 32-bit Windows version 2003 Two new language pairs: Portuguese-English, English-Portuguese 2004-5 Two new language pairs: Spanish-Portuguese, Portuguese-Spanish ICLM-9 September 2005

  3. PAHO HQ Translation Services Unit Chief Spanish translator English translator Computational linguists (2) Office assistants (3) Dictionary coders (2 contractors) Users of PAHO’s Machine Translation Software Licensees Ministries of Health NGOs International Organizations Universities Consulting firms Private industry Translators Physicians Field Offices U.S. Mexico Border Regional Centers Nutrition Environment Food Safety Perinatology PWR Offices Subject AreaSpecialists Administrative and Support Staff Contract translators ICLM-9 September 2005

  4. PAHO Translation Services • 8 staff members • Two in-house contractors (linguists) • Over 100 free-lance contractors (translators) • MT usage: over 90% for documents into Spanish, English, and Portuguese • Productivity gain: 50%-100% • Cost reduction: approx 33% ICLM-9 September 2005

  5. Direct Cost Savings at PAHO HQ • Professional translators: 70% of Human Translation rate • No cost for PAHO HQ LAN users ICLM-9 September 2005

  6. Survey of Translation Costs1998 ICLM-9 September 2005

  7. Keys to Success at PAHO • Appropriate input texts • Conducive institutional setting • Robust software • Years of hard work ICLM-9 September 2005

  8. Input Texts • Large volume in three languages 4 million words per year • Predictable subject areas NOT JUST MEDICINE! • Some style and format guidelines No controlled input No pre-editing ICLM-9 September 2005

  9. Input texts (cont.) ICLM-9 September 2005

  10. Institutional Setting • Management commitment • Translator involvement • Staff support ICLM-9 September 2005

  11. Robust Software • Developed and maintained by PAHO’s language professionals • Designed to handle standard syntax • Fully customizable dictionaries • Used in a broad domain ICLM-9 September 2005

  12. Hard Work • Translators provide feedback • Linguists update dictionaries with new terminology and translation rules • Computational linguists improve the algorithms for analysis, transfer, and synthesis ICLM-9 September 2005

  13. Types of installations • Standalone • Local Area Network (5, 10+ users) • Intranet server • May be licensed by non-profit institutions Language combinations • English-Spanish • Spanish-English • English-Portuguese • Portuguese-English • Spanish-Portuguese • Portuguese-Spanish ICLM-9 September 2005

  14. Some Features of PAHO MTS • All formatting is preserved • Spanish/English/Portuguese user interface • Fully graphic Translation Interface invoked from the desktop, from MS Word toolbar, or from WordPerfect (8.0+) toolbar • Batch translation • Postediting Macros for MS Word • Active X Automation objects for translation of short segments • On-line context-sensitive trilingual help • Translation speed • minimum of 8,000 words/minute • in batch mode: 30,000 words/minute ICLM-9 September 2005

  15. Features of PAHO MTS (cont.) • Dictionary Merge Utility • Used by clients to combine their entries with PAHO master entries • Used at PAHO HQ when users provide us with copies of their dictionaries • Import Utility for Bilingual Glossaries • World Bank glossaries (15,000+ entries) • International Classification of Diseases • INN: official non-proprietary or generic names of pharmaceutical substances • Dictionary Browse Utility • Accessible from MS Word • Copy/paste capabilities • Export Utility • To create lists of words added by users or small dictionaries that can be later merged ICLM-9 September 2005

  16. Recent Developments • Compatible with Translation Memory software • Files pretranslated with TRADOS can be processed by PAHOMTS without modifications • Macro to activate Active X object when processing a file with WordFast • Spanish-Portuguese released in March 2004 • Portuguese-Spanish released in June 2005 • Recently acquired MultiTrans to align corpus of over 60+ million words and automatically extract terminology ICLM-9 September 2005

  17. Machine Translation processing • Several options can be selected at run-time: • Type of grammar • Abstract, letter, manual, report, resolution, survey, speech, post description, news article, summary record • One type of grammar only • Specialized vocabulary • Medical research, financial, environment, equipment, agriculture, patient education, legal, computer science, United Nations, radiation, pharmaceutical, statistics, European variety • Several microglossaries may be activated, in order of priority ICLM-9 September 2005

  18. Information Assimilation • Fast and inexpensive method for gisting, but with much higher quality than free web-based translators • Appropriate options are set at run-time • Used for translating segments, web pages, full documents • No need for postediting • Sample uses: Hospital in Panamá, NorthCom, NLM, Supercourse lectures, health professionals scanning the web ICLM-9 September 2005

  19. Information Dissemination • MT produces a draft – Always needs to be postedited! • Output must be postedited by: • Professional (distribution, publication) • Native speaker of target language (internal distribution) • Publication quality translations or light postediting • Appropriate options are set at run-time • Used for translating segments, web pages or full documents • Benefits • Speed • Lower costs • Terminological consistency • Sample uses: CEPIS, translation unit, USP ICLM-9 September 2005

  20. Postediting • Done directly on the RAW file • Using the side-by-side (SBS) file as a reference • Freelancers don’t have access to PAHOMTS; only the output files. • Making use of the MS Word postediting macros • General macros: changing display,lower/uppercase, switch words to the right or to the left, delete words, look up in PAHOMTS dict. • English macros: delete definite article, create possessive, delete and switch, create NN compound, undo NN compound, serial comma • Spanish/Portuguese macros: singular, plural, masculine, feminine, feminine plural, masculine plural, delete definite article, -mente adverb, add diacritic ICLM-9 September 2005

  21. Bilingual corpus • Source and target documents preserved since 1990 • Aligned using Multitrans • Semi-automatic terminology extraction • Bilingual lists checked by linguist and imported into PAHOMTS dictionaries • Aligned corpus will be made available on-line to freelance posteditors ICLM-9 September 2005

  22. MT enhancements • Daily • Lists of not-found words • Lexical issues pointed out by translators (SBS file) • SBS files examined by computational linguists to improve parsing/synthesis • Occasionally • Incorporation of bilingual glossaries using the Import or Merge utilities • Research into specific linguistic issues, always using bilingual corpus • Automatic terminology extraction from aligned corpus ICLM-9 September 2005

More Related