1.04k likes | 1.48k Vues
Finite-State Transducers. Shallow Processing Techniques for NLP Ling570 October 10, 2011. Announcements. Wednesday online GP meeting scheduling Seminar on Friday: Luke Zettlemoyer (CSE) Automatic grammar induction Treehouse Friday: Classifiers – Memory Lane. Roadmap. Motivation:
E N D
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011
Announcements • Wednesday online • GP meeting scheduling • Seminar on Friday: Luke Zettlemoyer (CSE) • Automatic grammar induction • Treehouse Friday: Classifiers – Memory Lane
Roadmap • Motivation: • FST applications • FST perspectives • FSTs and Regular Relations • FST Operations
FSTs • Finite automaton that maps between two strings • Automaton with two labels/arc • input:output
FST Applications • Tokenization • Segmentation • Morphological analysis • Transliteration • Parsing • Translation • Speech recognition • Spoken language understanding….
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages • FST as translator: • Reads an input string and prints output string
Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages • FST as translator: • Reads an input string and prints output string • FST as set relator: • Computes relations between sets
FSTs & Regular Relations • FSAs: equivalent to regular languages
FSTs & Regular Relations • FSAs: equivalent to regular languages • FSTs: equivalent to regular relations • Sets of pairs of strings
FSTs & Regular Relations • FSAs: equivalent to regular languages • FSTs: equivalent to regular relations • Sets of pairs of strings • Regular relations: • For all (x,y) in Σ1x Σ2, {(x,y)} is a regular relation • The empty set is a regular relation • If R1,R2 are regular relations, • R1R2 , R1 U R2 and R1* are regular relations
Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages
Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:
Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:R1 ={(anb*,cn)} & R2={(a*bm,cm)}, intersection is {(anbn,cn)} => not regular
Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:R1 ={(anb*,cn)} & R2={(a*bn,cn)}, intersection is {(anbn,cn)} => not regular • Difference
Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:R1 ={(anb*,cn)} & R2={(a*bn,cn)}, intersection is {(anbn,cn)} => not regular • Difference • Complementation
Regular Relation Closures • Regular relations are also closed under: • Composition:
Regular Relation Closures • Regular relations are also closed under: • Composition: • Inversion:
Regular Relation Closures • Regular relations are also closed under: • Composition: • Inversion: • Operations: • Projection:
Regular Relation Closures • Regular relations are also closed under: • Composition: • Inversion: • Operations: • Projection: • Identity & cross-product of regular languages
FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ
FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ
FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F
FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transition relations between states: • δsubset Q x (Σuε) x (ΓU ε) x Q
FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transition relations between states: • δsubset Q x (Σuε) x (ΓU ε) x Q • FSAs are a special case of FSTs
FST Operations • Union:
FST Operations • Union: • Concatenation:
FST Operations • Inversion: Switching input and output labels • If T maps from I to O, T-1 maps from O to !
FST Operations • Inversion: Switching input and output labels • If T maps from I to O, T-1 maps from O to I • Composition: • If T1 is a transducer from I1 to O2 and T2 is a transducer from O2 to O3, then T1T2 is a transducer from I1 to O3
FST Operations • Inversion: Switching input and output labels • If T maps from I to O, T-1 maps from O to I • Composition: • If T1 is a transducer from I1 to O2 and T2 is a transducer from O2 to O3, then T1T2 is a transducer from I1 to O3
FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}
FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}
FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}
FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….} • R(T) = {(a,x),(ab,xy),(abb,xyy),…}
FST Application Examples • Case folding: • He said he said
FST Application Examples • Case folding: • He said he said • Tokenization: • “He ran.” “ He ran . “
FST Application Examples • Case folding: • He said he said • Tokenization: • “He ran.” “ He ran . “ • POS tagging: • They can fish PRO VERB NOUN
FST Application Examples • Pronunciation: • B AH T EH R B AH DX EH R • Morphological generation: • Fox s Foxes • Morphological analysis: • cats cat s
FST Application Examples • Pronunciation: • B AH T EH R B AH DX EH R
FST Application Examples • Pronunciation: • B AH T EH R B AH DX EH R • Morphological generation: • Fox s Foxes
FST Application Examples • Pronunciation: • B AH T EH R B AH DX EH R • Morphological generation: • Fox s Foxes • Morphological analysis: • cats cat s
FST Algorithms • Recognition: • Is a given string pair (x,y) accepted by the FST? • (x,y) yes/no
FST Algorithms • Recognition: • Is a given string pair (x,y) accepted by the FST? • (x,y) yes/no • Composition: • Given a pair of transducers T1 and T2, create a new transducer T1T2.
FST Algorithms • Recognition: • Is a given string pair (x,y) accepted by the FST? • (x,y) yes/no • Composition: • Given a pair of transducers T1 and T2, create a new transducer T1T2. • Transduction: • Given an input string and an FST, compute the output string. • x y
WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q
WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q • Initial state probabilities: Q R+
WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q • Initial state probabilities: Q R+ • Transition probabilities: δ R+
WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q • Initial state probabilities: Q R+ • Transition probabilities: δ R+ • Final state probabilities: Q R+
Summary • FSTs • Equivalent to regular relations • Transduce strings to strings • Useful for range of applications