1 / 18

On translation units and automatic processing

Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July. On translation units and automatic processing. Natural Language Processing -Main lexical problems-. Disambiguation Multiword expressions All levels of language Point of view: Monolingual Multilingual

norton
Télécharger la présentation

On translation units and automatic processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July On translation units and automatic processing

  2. Natural Language Processing -Main lexical problems- • Disambiguation • Multiword expressions • All levels of language • Point of view: • Monolingual • Multilingual • Interlingual

  3. Interlingual task: translation (I) • Problem: text segmentation • Machine translation: • Need for objective criteria for segmentation

  4. Interlingual task: translation (II) • Multiword segments • Multiword expressions • “Of the same order of magnitude as the number of single words” (Jakendoff 1977) • 41% - WordNet 1.7 (Fellbaum 1999)

  5. Linguistic levels • Lexicology (and terminology) • Degree of lexicalization • Morphology and syntax • Components: order, cooccurrence, inflection... • Semantics • Decomposability, other relationships • Pragmatics • Context, equivalent words • Text analysis

  6. Points of view for analysing • Traditional Linguistics • Since 1957... • Computational Linguistics • “A pain in the neck” (Sag et al. 2002) • Translation – Machine Translation • Need for better approaches

  7. Names and definitions for MWE (I) • “Idiosyncratic interpretations that cross word boundaries (or spaces)” (Sag et al. 2002) • “A sequence of words that acts as a single unit at some level of linguistic analysis” (Calzolari et al. 2002) • “Any phrase that is not entirely predictable on the basis of standard grammar rules and lexical entries” (LinGO Lab, Stanford University)

  8. Names and definitions for MWE (II) • English: • Multiword Expressions (MWE) o Units (MWU) (Cowie, 1985) • Multi-word lexemes (MWL) (Gates, 1988) • Multiword lexical unit (Zgusta, 1967) • complex lexemes and lexical units (Lipka, 1983) • Basque: • lexia konplexuak (Abaitua, 2002) • hitz anitzeko unitate lexikalak (HAUL) (Grupo IXA) • Spanish: • expresiones o unidades multipalabra • multiverbales (Alvar Ezquerra, 2000) • poliléxicas (Benson, 1985) • expresiones pluriverbales (Casares, 1992 [1950]) • unidades pluriverbales lexicalizadas y habitualizadas (Haensch et al., 1982) • unidad léxica pluriverbal (Hernández, 1989) • unidades fraseológicas (UFS) o fraseologismo (Zuluaga, 1980) • lexías complejas (Abaitua, 1997)

  9. Classification criteria and linguistic description • Cooccurrence and/or need of some components • Syntactic and semantic transparency • Formal and semantic compositionality • Frozen or fixed status • Selectional restrictions • Violation of some general syntactic patterns or rules • Degree of lexicalization • Degree of conventionality • Idiomaticity

  10. Taxonomy (I) • Lexicalized phrases • Fixed expressions • Semi-fixed expressions • Non-Decomposable idioms • Compound Nominals • Proper Names • Multiword terminology • Syntactically flexible-expressions • Verb-particle constructions • Decomposable idioms • Light verbs • Institutionalized phrases (collocations) Sag et al., 2002

  11. Taxonomy (II) • Fixed expressions: • Adverbial phrases: • Al pie de la letra – to the letter – hitzez hitz • De improviso – suddenly – ziplo • Prepositional phrases: • A causa de – because of - (r)en ondorioz* • En torno a – around – inguruan • Multiword conjunctions: • Mientras tanto – meanwhile – bitartean • Con tal de que – so long as – ba...* • Latin expressions: • Ad hoc, sine dubio, sine die...

  12. Taxonomy (III) • Semi-fixed expressions • Non-Decomposable idioms: • kick the bucket / estirar la pata • Compound Nominals • Viaje de novios – honeymoon – eztei-bidaia • Proper Names • the (Oakland) Raiders (problemática propia) • Multiword terminology • Mayoría absoluta – absolute majority – erabateko gehiengo

  13. Taxonomy (IV) • Syntactically flexible-expressions • Verb-particle constructions • Non-compositionals: write up, look up / acordarse de, constar de / posposizioak • compositionals: break up • Decomposable idioms • spill the beans – revelar un secreto • Light verbs: • make, do, have, give • hacer, tener, ser, dar • egin, izan, eman

  14. Taxonomy (V) • Institutionalized phrases (collocations) • Pay attention – poner/prestar atención – arreta eman • Heavy smoker – fumador empedernido – erretzaile amorratua • Red wine – vino tinto – ardo beltza (Examples from Testuteka http://paginaspersonales.deusto.es/abaitua/deli/ testuteka/index.html)

  15. MultiWord Expression as Translation Unit • Translation Units: difficulty in definition and classification • Vázquez-Ayora (1977): • “simple” • “diluted” – “multiple-to-one-equivalents” (Nida) • “fractionary” "In fact there are good reasons for keeping the UT (in the sense of translation atom) in MT as small -and hence as manageable- as possible" (Bennet, 1994)

  16. Methods for processing • Simbolics • Words-with-spaces • Hierarchical Lexicon with Default Constraint Inheritance • Circumscribed Constructions • Lexical Selection • Information about Frequency • Example: Villavicencio et al. 2004 • Statistics • F. Smadja: Xtract

  17. Conclusions • MWEs as Translation Units • Approach from Translation and, specially, from Machine Translation • Linguistic definition and precision for better processing

  18. That’s all folks! ¡Eso es todo amigos! Agur Ben-Hur!

More Related