Tree Bimorphisms and Their Relevance in the Theory of Translations

Tree Bimorphisms and Their Relevance in the Theory of Translations Cătălin-Ionuţ Tîrnăucă GRLMC, Rovira i Virgili University catalinionut.tirnauca@estudiants.urv.cat CLG 08, Madrid, 28/06/2008

Outline • Motivation • Tree transformations • Ways to define tree transformations • What is a tree bimorphism? • What can we do with a bimorphism? • Conclusions • References CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Motivation: why using trees in NLP? • Strings and finite-state machines are extremely useful in speech recognition, text recognition, etc. • NOT suitable for translations between natural languages: • don’t capture syntax-sensitive transformations; • don’t execute certain reorderings of parts of sentences. • Hence, researchers from computational linguistic community have shown an increasing interest in more powerful formalisms, ones that can model trees and tree transformations. • The new field of syntax-based machine translation was established: (Knight and Graehl 2005). CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Motivation: formal properties required • Accuracy is an important issue in machine translation, so the whole theory should rely on a solid mathematical background (formal languages). • A perfect model: (Knight 2008), should have at least: • expressiveness (reordering parts of sentences, i.e., local rotation), • inclusiveness (generalizing the finite state machines), • modularity (closure under composition, i.e., breaking the initial problem into smaller pieces easier to solve), and • teachability (efficient training). CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Motivation: defining tree transformations • Many of the formalisms proposed until now fail in at least one of the above criteria. • We mention the weak and strongparts of some of them, as well as the relation between them. • We will see (something about and types of): • tree transducers (formal language theory), • tree homomorphisms (formal language theory), • synchronous grammars (linguistics), and • TREE BIMORPHISMS (formal language theory). • We speak more about modularity: think of what success finite-state machines had because of the noisy-channel concept! CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Ranked alphabet: symbols of a given rank, i.e., number of branches (if not unique you can rename); represent parts of sentences (NP, VP) Leaf alphabetX: usual alphabets of “words” called variables (Spanish, Russian) T(X): set of all trees Tree language: subset of T(X) Yield: concatenated sequence of leaf variables read from left to right ={S/2, NP/1, NP/3 N/0,VP/2,V/1} X=Spanish alphabet Yield = Vamos a la playa Tree transformations: first trees S VP NP V NP N Vamos la a playa CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Treetransformations and translations Output Romanian S S t  T (Y) s  T (X) Input English • Tree transformation  : a collection of pairs of parse (syntax) trees of natural language sentences. • Its translation ( ): a set of pairs of words from the two natural languages considered: just take the yield of such a pair of trees. VP NP NP VP  transforms s into t, i.e., (s,t)   D V N N V yd(s) is translated into yd(t) The castle burns Arde castelul Y= Romanian words ={S/2,NP/2,VP/1,NP/1, N/1} T (Y)=Romanian parse trees X= English words ={S/2,NP/2,VP/1,NP/1,D/1, N/1} T (X)= English parse trees  CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Tree Transducer Input tree Output tree Define tree transformations: tree transducers • A tree transducer uses a set of states to encode information, and given an input tree over the input ranked alphabet, computes a (set of) output tree(s) over the output ranked alphabet in two ways: • Top-down: starts at root, finishes at leaves; • Bottom-up: starts at leaves, finishes at root. • Many types introduced (in general, modularity doesn’t hold or cannot be proved), at least two look very promising: • Extended top-down tree transducers: (Knight and Graehl 2005) • Multi bottom-up tree transducers: (Maletti, Engelfriet and Lilin 2008), (Maletti 2007) CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Define tree transformations: tree homomorphisms (I) • A tree homomorphism:T(X)T(Y) is: • a mapping which, based on certain rules defined for every input symbol, transforms recursively an input tree into a (totally or not) different output tree, or • A top-down tree transducer with one state. • Defines a simple tree transformation ={(s,(s))s T(X)}; usually one just identifies  with  treating  as a relation. • We need auxiliary variablesto indicate an occurrence of a subtree in a tree during the processing. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Define tree transformations: tree homomorphisms (II) • :T(X)T(Y) is defined by the rules: • (x)= X(x): each x in the input replaced by a tree X(x) from output; • (c)= 0(c): each constant c in the input replaced by a tree 0(x); • (f(t1,…,tm))= m(f)(1 (t1),…, m (tm)): each root f in the input tree replaced by a tree m(f) from the output, where auxiliary variables appear as leaf symbols;  Y= {the, red, pen, out, of} ={VP/4,N/3,D/1} T (Y)=English parse trees X={in, blue} ={NP/3,ADJ/1,D/0} T (X)= English parse trees 3(NP)=VP(2, 3, 1, of)` 1(ADJ)=N(big, 1, pen) 0(D)=D(the) X(in)=out X(blue)=red VP NP in D D ADJ D N of VP out 3(NP)= 1 X(in)=out 2 1 2 the 3 of X(bleu)=red blue pen the big red 3 N 1(ADJ)= Output tree (t) in T(Y) Input tree s in T(X) big 1 pen CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Define tree transformations: tree homomorphisms (III) • Several types of tree homomorphisms exist depending on the restrictions imposed on each tree m(f) and implicitly on auxiliary variables: • Linear: in each m(f) no copy of a subtree is allowed (no i appears twice or more); • Non-deleting: in each m(f) no subtree information is lost (each i appears at least once); • Strict: no m(f) can be reduced to an auxiliary variable i; • Quasi-alphabetic: linear, non-deleting, strict, each constant is mapped to a constant or a tree of height 1 (all leaves are symbols from the output leaf alphabet), and each m(f) is a tree of height 1 in which the subtrees are reordered and symbols from the output leaf alphabet may appear as leaves. • H: the class of all tree homomorphisms. • lH, nH, sH, qH: the corresponding restricted class. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Define tree transformations: synchronous grammars (I) • Use the idea of synchronous rewriting: two formal grammars work in parallel, productions being linked by some relation and applied synchronously. • Pairs of recursively related words are generated simultaneously. • Syntax-directed translators, first used in compilers, use the power of context-free rewriting and can be viewed as a three-step process one of which is a tree transformation: • For a given input sentence u, construct a parse tree for u. • Transform the parse tree into a tree in the output grammar. • Take the yield of the output tree as a translation for u. S S S VP1P2you VP3 ; P2VP3VP1 P  before ; antes de VP  speak ; hablar VP  think ; pensar , you VP P VP VP VP P pensar hablar think before speak antes de CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Define tree transformations: synchronous grammars (III) • Many types of syntax-directed translations: • Syntax Directed Translation Schemata: (Aho and Ullman 1972), • Synchronous CFGs: (Satta and Peserico 2005), • Inversion Transduction Grammars: (Wu 1997), • Tree-to-string models: (Yamada and Knight 2001), • Hierarchical Phrase-Based Models: (Chiang 2007), etc • Beyond the generative capacity of CF rewriting (multi-level rules): • Synchronous Tree Substitution Grammars: (Shieber 2004). • Involve discontinuous constituents: • Synchronous TAGs: (Shieber and Schabes 1990), • Generalized Multitext Grammars: (Melamed et all 2004). CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Define tree transformations: synchronous grammars (III) • Synchronous grammars derivations can be viewed as pairs of trees, just as usual derivations in formal grammars can be viewed as trees. • From the way of defining them, we can see that they: • easily capture syntax-sensitive transformations, • execute local rotations, and • are in general trainable. • Unfortunately no modularity results were known or could be proved until recently: (Shieber 2004). • What helped us improve the modularity? Answer: TREE BIMORPHISMS. • Why? Don’t be impatient and don’ fell asleep. It follows right now. This is the best part and not too much mathematics. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

What is a tree bimorphism? (I) • A mathematical, nice alternative way to define tree transformations. • Excellent tool for proving mathematical properties of classes of tree transformations. • Consists of two tree homomorphisms defined on the same (regular) tree language, i.e., domains of tree transducers that computes a partial identity function. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

What is a tree bimorphism? (II) Common abstract language: Interlingua • B(H1,Rec,H2): a class of tree bimorphisms. • B(H1,Rec,H2): the class of tree transformations defined by B. • B=(,R,): a tree bimorphism from the class B, with H1, RRec, H2. • B(lnH,Rec,qH): the class of all tree bimorphisms in which the first tree homomorphism component is linear and non-deleting, the second is a quasi-alphabetic tree homomorphism. RT(Z) t   Input language T(X)  (t) T(Y) (t) transforms into Output language is translated into yd((t)) yd((t)) CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

What can we do with a bimorphism? • Concept very similar with synchronous grammars: from derivation trees to derived trees. • This way, we can encode the derivations into the common tree language, and by using the tree homomorphisms to get the derivations in input/output grammars. • In 2004, Shieber was the first one who linked synchronous grammars and tree bimorphisms; he was trying to improve modularity and other mathematical properties of classes of tree transformations defined by synchronous grammars. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Results obtained • The tree bimorphisms B(lnH,Rec,lnH): • not closed under composition: (Maletti 2007); • define the same class of tree transformations as STSGs: (Shieber 2004). • The tree bimorphisms B(lnsH,Rec,lnsH): • not closed under composition: (Arnold and Dauchet 1982); • define the same class of tree transformations as a special STAG: (Shieber 2006). • The (quasi-alphabetic) tree bimorphisms B(qH,Rec,qH): • closed under composition: (Steinby and Tîrnăucă 2007); • define the same translations as STDSs: (Steinby and Tîrnăucă 2007), SCFGs and ITGs: (Tîrnăucă 2007); • the tree transformations are encoded in their bimorphism counterpart. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Conclusions; future work • Properties of/ modeling tree transformations: a new challenge in machine translation. • Modularity, or closure under composition, a big issue. • There are several ways to define tree transformations. One connects all of them and helps us to prove modularity: tree bimorphisms. • We saw some results obtained recently. What about other useful properties of synchronous grammars, other synchronous grammars and other tree bimorphism? Can we obtain more improvements? CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

References (I) • Aho, Alfred and Jeffrey Ullman (1972). The theory of parsing, translation, and compiling, Volume I: Parsing. New Jersey: Prentice Hall Professional Technical Reference. • Arnold, André and Max Dauchet (1982). “Morphismes et bimorphismes d’arbres”, Theoretical Computer Science 20: 33–93. • Chiang, David (2007). “Hierarchical phrase-based translation”, Computational Linguistics 33(2):201–228. • Engelfriet, Joost, Eric Lilin and Andreas Maletti (2008). “Extended multi bottom-up tree transducers”. Manuscript. URL: http://wwwtcs.inf.tu-dresden.de/~maletti/pub/engmal08.pdf. • Knight, Kevin (2008). “Capturing practical natural language transformations”. Manuscript. URL: http://www.isi.edu/natural-language/mt/capturing.pdf. • Knight, Kevin and Jonathan Graehl (2005). “An overview of probabilistic tree transducers for natural language processing”. In: Alexander F. Gelbukh, ed., Computational Linguistics and Intelligent Text Processing 6th International Conference, CICLing 2005, Proceedings, vol. 3406 of LNCS. Berlin: Springer-Verlag, pp. 1–24. • Maletti, Andreas (2007). “Compositions of extended top-down tree transducers”. In: Remco Loos, Szilárd Z. Fazekas and Carlos Martín Vide, eds., Proceedings of the 1st International Conference on Language and Automata Theory and Applications, LATA 2007, vol. 35/07 of GRLMC Reports. Tarragona: Universitat Rovira i Virgili, pp. 379–390. • Melamed, I. Dan (2003). “Multitext grammars and synchronous parsers”. In: HLT-NAACL 2003, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. Morristown: Association for Computational Linguistics, pp. 79–86. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

References (II) • Satta, Girogio and Enoch Peserico (2005). “Some computational complexity results for synchronous context-free grammars”. In: HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Morristown: Association for Computational Linguistics, pp. 803–810. • Shieber, Stuart M. (2004). “Synchronous grammars as tree transducers”. In: Proceedings of the TAG+7: Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms, 2004, pp. 88–95. • Shieber, Stuart M. (2006). “Unifying synchronous tree-adjoining grammars and tree transducers via bimorphisms”. In: EACL 2006, 11st Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference. Association for Computer Linguistics, pp. 377-384. • Steinby, Magnus and Cătălin I. Tîrnăucă (2007). “Syntax-directed translations and quasi-alphabetic tree bimorphisms”. In Jan Holub and Jan Zdárek, eds,, Implementation and Application of Automata, 12th International Conference, CIAA 2007. Berlin: Springer-Verlag, pp. 265–276. • Tîrnăucă, Cătălin I. (2007). “Synchronous context-free grammars by means of tree bimorphisms”. In: Gemma Bel Enguix and Maria Dolores Jiménez Lopez, eds., Proceedings of the 1st International Workshop on Non-Classical Formal Languages in Linguistics (ForLing 2007), vol. 36/07 of GRLMC Reports. Tarragona: Universitat Rovira i Virgili, pp. 97–107. • Yamada, Kenji and Kevin Knight (2001). “A syntax-based statistical translation model”. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. Morristown: Asociation of Computational Linguistics, pp. 523-530. • Wu, Dekai (1997). “Stochastic inversion transduction grammars and bilingual parsing of parallel corpora”, Computational Linguistics 23(3): 377–403. CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

That’s all folks! Thank you! Graçias! Arigato Mila esker! Moltes gràcies! 謝謝你。 Grazie! Spasiva! Mulţumesc! Kiitos CLG 08, Madrid, 28/06/2008: C. Tîrnăucă, Tree bimorphisms and their relevance in translations

Tree Bimorphisms and Their Relevance in the Theory of Translations