1 / 20

Machine Translation II

Machine Translation II. How MT works Modes of use. How MT works. distinguish between generic “translation software” ( algorithms ) and language-pair-specific linguistic data Software engineers ~ linguists Idea (from computer science) of modularity

Télécharger la présentation

Machine Translation II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation II How MT works Modes of use

  2. How MT works • distinguish between generic “translation software” (algorithms) and language-pair-specific linguistic data • Software engineers ~ linguists • Idea (from computer science) of modularity • Break down problem into manageable subproblems, essentially independent though linked to each other • Modules usually linguistically motivated • linguistic formalisms for lexicons and grammars • May be more or less like formal linguistic theories • Usually “less” !

  3. Modularity Text normalisation Dictionary lookup Morphological analysis Syntactic parse Attachment disambiguation Semantic roles Possible sequence of modules (fictitious) TL lexical choice TL syntax TL morphology Text reconstitution

  4. generation analysis transfer direct translation Depth of analysis interlingua The “Vauquois triangle” source text target text

  5. generation analysis transfer direct translation Depth of analysis interlingua full meaning representation The “Vauquois triangle” some syntactic awareness word-for-word source text target text

  6. Modes of use fully automatic high quality restricted input impractical interactive low quality unrestricted texts

  7. Different scenarios for MT Assimilation • many SLs, one TL • any style • any topic • partial analysis • post-editing • user is reader Dissemination • one SL, many TLs • controlled style • single topic • full analysis • no post-editing • user is author

  8. Restricted input • Restrictions may be natural (sublanguage) or imposed (controlled language) • Related terms: special language, jargon, register, LSP • For human: (usually) more readable, less ambiguous, more “focussed” • For MT: • fewer syntactic constructions • closed vocabulary with fewer homonyms • greater certainty about interpretation

  9. Features of sublanguage • Lexicon • smaller size: fewer concepts to cover • finite/closed: innovation is controlled • nature: less homonymy, some synonyms (dis)favoured • grammatical use: fewer category ambiguities • Syntax • reduced range of structures • some structures (dis)favoured • less flexibility in choice of structure • some deviance from “standard” grammar

  10. Controlled languages • Widely used in technical authoring • Promotes consistency and readability • Similar features to sublanguage • Can be coupled with grammar checker • Permits “multilingual authoring”

  11. Use of low-quality output • To get a rough idea of content, and to identify which parts need to be translated “properly” • … especially with “exotic” languages • Widely used on the Internet for browsing, chat-rooms and email • Despite low quality, users seem satisfied • Task is especially difficult due to odd grammar, spelling, punctuation (GIGO), and wide variety of subject matter, often mixed • Most MT systems now customized for web-page translation (take HTML mark-up into account)

  12. Interactive translation • Tools for translators • “Translator’s workstation” • Humans and computers cooperate • Which takes the initiative? • MAHT: human translation using translation tools • HAMT: MT with human assistance

  13. Machine-readable version of dictionary for human users

  14. Pre-translation: terminology look-up

  15. Translation Memory • Database of previous translations • More or less sophisticated matching algorithm (“fuzzy match”, simple pattern-matching which may incorporate “linguistic “knowledge”) • But user must decide what to do with them

  16. MT system’s dictionary

  17. Bilingual concordance Source: TransSearch, Laboratoire de Recherche Appliquée en Linguistique Informatique, Université de Montréal http://www-rali.iro.umontreal.ca

  18. Parallel scrolling screens

  19. Interactive translation

  20. Conclusion • Translation is really hard, but lay-people don’t understand this • Example: evaluating systems by use of round-trip translation, often of idioms, jokes, or set phrases • Current MT systems are quite crude, and likely to remain so • But useful nevertheless in appropriate scenarios, under certain conditions of use

More Related