Non-deterministic Tree Automata Models for Statistical Machine Translation

Non-deterministic Tree Automata Models for Statistical Machine Translation ChiaraMoraglia

Mathematical Linguistics • Branch of computational linguistics • The study of mathematical structures and methods that pertain to linguistics. • Combines aspects of computer science, mathematics and linguistics.

Problem: translational ambiguity • Words: anchor • Sentences : Cleaning fluid can be dangerous. Claire kicked the bucket.

Statistical Machine Translation • Machine translation that keeps in mind the problem of ambiguity. • A sequence of reordering decisions and word translation decisions, each with a probability assigned based upon linguistic data. • 2 main reordering models: 1) phrase-based models: re-align phrases (strings of words) 2) syntax-based models: can use tree transducers to permute trees (syntactic structure) with words as leaves

Example of a syntax-based translation http://people.csail.mit.edu/koehn/publications/tutorial2003.pdf

My project • Generalize the work on tree automata and tree transductions to non-deterministic models and explore the equivalence properties that were proven to hold in the deterministic case.

Tree • A hierarchical collection of labeled nodes connected by edges, starting at a root node https://upload.wikimedia.org/wikipedia/commons/f/f7/Binary_tree.svg

Tree Transducers • A tree transducer is a 5-tuple <F,H,Q,qin,R> where i) F is a functional signature of input symbols ii) H is a functional signature of output symbols iii) Q is a finite set of states iv) qin∈Q is the initial state v) R is a finite set of rules <q, φ> ζ where ζ is • <q’, ψ> • h(< q1, ψ1>,…,<qk,ψk>) Φgives the conditions the currentnode must satisfy, Ψsayswhichnode to go to from the currentnode (Courcelle & Engelfriet, 2012)

Functional Signature • A functional signature is a set of function symbols, each with an associated arityρ(f) (the number of arguments the function takes on) • E.g. f(x), ρ(f)=1 h(x,y,z), ρ(h)=3 (Courcelle & Engelfriet, 2012)

Example of a Tree Transduction i) F={f,a,b} where ρ(f)=2, ρ(a)=ρ(b)=0 ii) H= {a,b,ε} where ρ(a)=ρ(b)= 1, ρ(ε)=0 iii) Q={qin,q1,q2} iv) qin∈Q is the initial state v) R= • <qin, labf(x1)> <qin, down1> • <qin, labx(x1)^bri(x1)> x(< qi, up>) • <q1, True> <qin, down2 > • <q2, bri(x1)> < qi, up> • <q2, rt(x1)> ε • <qin, labx(x1)^rt(x1)> x(< q2, stay>) (Courcelle & Engelfriet, 2012)

Graphical Representation a or b(<a or b &1st child, >) q1 <f, > <True, > <1st child, > qin <a or b & root, stay> <2nd child, > q2 ε a or b (<a or b & 2nd child, >) <root, stay>

Example input tree output tree f a a b b ε

Deterministic vs. Non-deterministic • A tree transducer is deterministic if the state and the position in the tree uniquely determine what rule should be applied • Otherwise, it is non-deterministic • E.g. <qin, labf(x1)> <qin, down> <qin, labf(x1)> <q1, up>

Non-deterministic Tree Transducer g(<f, >) <a, stay> qin a h(<f, >) Modified from (Fülöp, 1981)

Example f g h g h f g h h g a aaaa input possible outputs

Application to Statistical Machine Translation • The possible output trees would be assigned probabilities • Then the words would be translated into the target language

References Courcelle, B., & Engelfriet, J. (2012). Graph Structure and Monadic Second-Order Logic. Cambridge: University Press. Fülöp. (1981). On attributed tree transducers. ActaCybernetica, 5, p.261-279. Knight, K., & Koehn, P. What’s new in statistical machine translation [PDF document]. Retrieved from http://people.csail.mit.edu/koehn/publications/tutorial2003.pdf Tree (data structure). (n.d.). Retrieved from http://people.csail.mit.edu/koehn/publications/tutorial2003.pdf

Non-deterministic Tree Automata Models for Statistical Machine Translation

Non-deterministic Tree Automata Models for Statistical Machine Translation

Presentation Transcript

Statistical Machine Translation

Statistical Machine Translation

Non-Deterministic Finite Automata

Non-Deterministic Finite Automata

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Non Deterministic Automata

Non-Deterministic Finite Automata

Statistical Machine Translation

Statistical Machine Translation

Deterministic Finite Automata Machine

Non-Deterministic Finite Automata

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation Models for Personalized Search

Statistical Machine Translation

Non Deterministic Automata

Non-Deterministic Finite Automata