650 likes | 777 Vues
This text explores transfer-based translation, focusing on lexical ambiguity and structural differences between source and target languages. It examines examples from the Scania corpus, illustrating the processes of transfer and generation within a multilingual translation engine. The study emphasizes the need for detailed transfer rules, modular architecture in translation design, and efficient linguistic analysis to generate correct target structures. By leveraging rules and dictionaries tailored to each language, the MATS system demonstrates effective transfer mechanisms and syntactic generation capabilities.
E N D
Motivations for transfer-based translation • lexical ambiguity • structural differences See further Ingo 91
Example 1 Sv. Fyll på olja i växellådan. En. Fill gearbox with oil. (from the Scania corpus) • fyll på fill • obj adv • adv obj
Example 2 Sv. I oljefilterhållaren sitter en överströmningsventil. En. The oil filter retainer has an overflow valve. (from the Scania corpus) • sitter has • adv subj • subj obj
Transfer-based translation • intermediary sentence structure • basic processes • analysis • transfer • generation (synthesis) • language modules • dictionary and grammar of SL • transfer dictionary and transfer rules • dictionary and grammar of TL
Direct translation SL TL Metal Transfer Multra Interlingua
Levels of intermediary structure • cf. J&M, Chapter 21 • word order
Metal • See H&S
MULTRA Multilingual Support for Translation and Writing • translation engine • transfer-based • shake-and-bake • modular • unification-based • preference machinery • trace-able
Analysis • chart parser (Lisp C) • procedural formalism • unification and other kinds of operations • sentence structure • feature structure • grammatical relations • surface order implicit via grammatical relations See further Sågvall Hein&Starbäck (99),Weijnitz (02), Dahllöf (89)
Transfer • unification-based • declarative formalism • Multra transfer formalism (Beskow 93) • lexical and structural rules • rules are partially ordered • a more specific rule takes precedence over a less specific one • specificity in terms of number of transfer equations • all applicable rules are applied • written in prolog
Generation • syntactic generation • Multra syntactic generation formalism (Beskow 97a) • PATR-like style • unification • concatenation • typed features • morphological generation (Beskow 97b) • lexical insertion rules • morphological realisation and phonological finish in prolog • written in prolog
An example: Tippa hytten. Tippa hytten. : (* = (PHR.CAT = CL MODE = IMP SUBJ = 2ND VERB = (WORD.CAT = VERB INFF = IMP DIAT = ACT LEX = TIPPA.VB.1 VSURF = +) OBJ.DIR = (PHR.CAT = NP NUMB = SING GENDER = UTR CASE = BASIC DEF = DEF HEAD = (LEX = HYTT.NN.1 WORD.CAT = NOUN))) REG = (V1.LEM = TIPPA.VB) SEP = (WORD.CAT = SEP LEX = STOP.SR.0)))
Transfer structure Transfer structure [VERB : [WORD.CAT : VERB LEX : TILT.VB.0 DIAT : ACT INFF : IMP] OBJ.DIR : [PHR.CAT : NP DEF : DEF NUMB : SING HEAD : [WORD.CAT : NOUN LEX : CAB.NN.0]] MODE : IMP SUBJ: 2ND VSURF: + SEP : [WORD.CAT : SEP LEX : STOP.SR.0] PHR.CAT : CL]
Generation Tilt the cab.
A grammar rule defrule legal.obj { <?1 phr.cat> = 'np, not <?1 case> = 'gen, not <?1 case> = 'subj }
Transfer rules • copy feature • delete feature • transfer feature • assign feature
Copy feature LABEL mode SOURCE <* mode> = ?x1 TARGET <* mode> = ?x2 TRANSFER
Delete feature LABEL REG SOURCE <* REG> = ANY TARGET <*> = <*> TRANSFER
Transfer feature LABEL OBJ.DIR SOURCE <* OBJ.DIR> = ?x1 TARGET <* OBJ.DIR> = ?x2 TRANSFER ?x1 <=> ?x2
Define feature LABEL trycka.in-press SOURCE <* lex sym>=trycka.vb+in.ab.1 <* word.cat>=VERB TARGET <* lex>=press.vb.1 <* word.cat>=VERB TRANSFER
A generation rule LABEL CL.IMP X1 ---> X2 X3 X4 : <X1 PHR.CAT> = CL <X1 VERB> = <X2> <X1 TYPE> = IMP <X1 OBJ.DIR> = <X3> <X1 SEP> = <X4>
A contextual lexical rule LABEL tänka.på-think.about SOURCE <* verb lex sym> = tänka.vb.1 <* obj.prep phr.cat> = pp <* obj.prep prep> = ?prep <* obj.prep prep lex sym> = på.pp.1 <* obj.prep rect> = ?rect1 TARGET <* obj.prep phr.cat> = pp <* obj.prep prep word.cat> = PREP <* obj.prep prep lex> = about.pp.1 <* obj.prep rect> = ?rect2 TRANSFER ?rect1<=>?rect2
A generation trace 1-Applying Rule cl-sep 1- Applying Rule cl.imp 1- Applying Rule subj2nd-verb-obj.dir 1- Applying Rule verb.main.act 1- Applying Rule np.the-df 1- Applying Rule ng.noun-def 1-Success!
Language resources in the MATS system • dictionary in a database with different views • analysis grammar • transfer grammar • incl. contextually defined lexical rules • generation grammar
The MATS system Frozen demo…
Assignment 2: Working with MATS http://stp.ling.uu.se/~evapet/mt04/assignment2.html
Lexicalistic translation • Identify (lexical) translation units in the source sentence • Translate each unit separately (considering the context) • Order the result in agreement with a model of the target language Formulation due to Lars Ahrenberg; see further AH (reading list) ; see also Beaven, L. John, Shake-and-Bake Machine Translation. Coling –92, Nantes, 23-28 Aout 1992.
T4F – a lexicalistic system • processes in T4F • tokenisation • tagging • transfer • transposition • filtering See further AH (in the reading list)
Interlingua translation • See SN
Applications of alignment • translation memories • translation dictionaries • lexicalistic translation • statistical machine translation • example-based translation
Translation memories • based on sentence links • optionally, sub sentence links See further Macklovitch, E. (2000)
Translation dictionaries • based on word links • refinement of word links
Refinement of word alignment data • neutralise capital letters where appropriate • lemmatise or tag source and target units • identify ambiguities • search for criteria to resolve them • identify partial links • compounds? • remove or complete them • manual revision?
Informally about statistical MT • build a translation dictionary based on word alignment • aim for as big fragments as possible • keep information on link frequency • build an n-gram model of the target language • implement a direct translation strategy • including alternatives ordered by length and frequency • process the output by the n-gram model filtering out the best alternatives and adjust the translation accordingly
Example-based MT HS (in the reading list)