1 / 13

Enhancing Patent Translation: EPO's Machine Translation Program Overview

The European Patent Office (EPO) has launched a program to improve machine translation (MT) of patents, funded by member states, following the success of Japanese to English translations. The initiative aims to provide translations across three languages annually, focusing on French, German, and Spanish initially, with plans to include Swedish, Dutch, and others in subsequent years. The program leverages specific dictionaries, aligns corpora for term extraction, and utilizes an OLIF format for dictionary management. Key features include support for conditional translations and the development of an OLIF editor to enhance the bilingual term extraction process.

igor-guerra
Télécharger la présentation

Enhancing Patent Translation: EPO's Machine Translation Program Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. European Patent Office European Machine Translation Programme Wolfgang Täger December 2006

  2. Programme Partners and Goals • Trigger: Success of JP-EN patent translation • Agreement EPO - Member States • MT of patents/ abstracts/ communications to/from English • Three language pairs per year • First three languages: FR - DE - ES • Candidates for next year: Swedish, Dutch, Italian, Romanian, Greek

  3. MT engine • Trial with SMT system (Language Weaver) • Call for tender: Winner Worldlingo (Systran) • Going public (esp@cenet): December 2006 • Needed: Improve translation by specific dictionaries

  4. Dictionary format • Desiderata • open standard • XML-Unicode • support features of MT engines • support conditional translations (e.g. based on IPC) • Is not intended for terminology (no definitions, lexical focus and no semantic focus). • OLIF format was chosen How to get dictionaries ? By bilingual term extraction !

  5. Available corpora • 560.000 EP-B publications => claims in EN,DE,FR • 300.000 DE-T2 publications • 37.000 ES-B3/T3 publications • => Align corpora for term extraction, concordancing, translation memory (and SMT) ES B3/T3 (LaTex) DE-T2 EP-B1 DESC ES DESC DE DESC EN OR FR OR DE CL ES (CL DE) CL EN CL FR CL DE

  6. Available corpora • 560.000 EP-B publications => claims in EN,DE,FR • 300.000 DE-T2 publications • 37.000 ES-B3/T3 publications • => Align corpora for term extraction, concordancing, translation memory (and SMT) ES B3/T3 (LaTex) DE-T2 EP-B1 DESC ES DESC DE DESC EN OR FR OR DE CL ES (CL DE) CL EN CL FR CL DE

  7. Alignment & Extraction • Alignment: Trial at EPO with internally developed SW • Result was not improved by external companies during call for tender.

  8. Alignment & Extraction • Call for tender for bilingual term extraction • Winner: DFKI • Alignment of corpora, POS tagging, Identification of terms • Pairing of terms using clues like co-occurrence score, string similarity, grammatical clues, position, available dictionaries, ... • Providing further information like gender, inflection, transitivity, countable, ...

  9. Validation & Concordancing • Development of OLIF editor at EPO • Remove noise • Correct entries • Use concordancer (provides statistics based on parallel corpora) • => DEMO

  10. OLIF format • Support of more languages • Clarification of inflection scheme • Clarification of term vs lex approach • Tools

  11. Relational database ?? Transl SemRel Concept Term Naming SurfForm InflForm Lemma RegEx LexType Infl

  12. Relational database ?? Transl SemRel „hot drink ...“ grüner Tee Naming grüner Nom. Sg. str. f. pos. grün -er DE, Adj iLike „klein“

  13. End • Thank you!

More Related