200 likes | 208 Vues
Machine Translation II. How MT works Modes of use. How MT works. distinguish between generic “translation software” ( algorithms ) and language-pair-specific linguistic data Software engineers ~ linguists Idea (from computer science) of modularity
E N D
Machine Translation II How MT works Modes of use
How MT works • distinguish between generic “translation software” (algorithms) and language-pair-specific linguistic data • Software engineers ~ linguists • Idea (from computer science) of modularity • Break down problem into manageable subproblems, essentially independent though linked to each other • Modules usually linguistically motivated • linguistic formalisms for lexicons and grammars • May be more or less like formal linguistic theories • Usually “less” !
Modularity Text normalisation Dictionary lookup Morphological analysis Syntactic parse Attachment disambiguation Semantic roles Possible sequence of modules (fictitious) TL lexical choice TL syntax TL morphology Text reconstitution
generation analysis transfer direct translation Depth of analysis interlingua The “Vauquois triangle” source text target text
generation analysis transfer direct translation Depth of analysis interlingua full meaning representation The “Vauquois triangle” some syntactic awareness word-for-word source text target text
Modes of use fully automatic high quality restricted input impractical interactive low quality unrestricted texts
Different scenarios for MT Assimilation • many SLs, one TL • any style • any topic • partial analysis • post-editing • user is reader Dissemination • one SL, many TLs • controlled style • single topic • full analysis • no post-editing • user is author
Restricted input • Restrictions may be natural (sublanguage) or imposed (controlled language) • Related terms: special language, jargon, register, LSP • For human: (usually) more readable, less ambiguous, more “focussed” • For MT: • fewer syntactic constructions • closed vocabulary with fewer homonyms • greater certainty about interpretation
Features of sublanguage • Lexicon • smaller size: fewer concepts to cover • finite/closed: innovation is controlled • nature: less homonymy, some synonyms (dis)favoured • grammatical use: fewer category ambiguities • Syntax • reduced range of structures • some structures (dis)favoured • less flexibility in choice of structure • some deviance from “standard” grammar
Controlled languages • Widely used in technical authoring • Promotes consistency and readability • Similar features to sublanguage • Can be coupled with grammar checker • Permits “multilingual authoring”
Use of low-quality output • To get a rough idea of content, and to identify which parts need to be translated “properly” • … especially with “exotic” languages • Widely used on the Internet for browsing, chat-rooms and email • Despite low quality, users seem satisfied • Task is especially difficult due to odd grammar, spelling, punctuation (GIGO), and wide variety of subject matter, often mixed • Most MT systems now customized for web-page translation (take HTML mark-up into account)
Interactive translation • Tools for translators • “Translator’s workstation” • Humans and computers cooperate • Which takes the initiative? • MAHT: human translation using translation tools • HAMT: MT with human assistance
Translation Memory • Database of previous translations • More or less sophisticated matching algorithm (“fuzzy match”, simple pattern-matching which may incorporate “linguistic “knowledge”) • But user must decide what to do with them
Bilingual concordance Source: TransSearch, Laboratoire de Recherche Appliquée en Linguistique Informatique, Université de Montréal http://www-rali.iro.umontreal.ca
Conclusion • Translation is really hard, but lay-people don’t understand this • Example: evaluating systems by use of round-trip translation, often of idioms, jokes, or set phrases • Current MT systems are quite crude, and likely to remain so • But useful nevertheless in appropriate scenarios, under certain conditions of use