1 / 50

Human Language Technology

Human Language Technology. Finite State Transducers. Acknowledgement. Material in this lecture derived/copied in part from Richard Sproat CL46 Lectures Lauri Karttunen LSA lectures 2005 Shuly Wintner 2008 Malta. Three Key Concepts. Regular Relations. Finite State Transducers.

arden-joyce
Télécharger la présentation

Human Language Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Language Technology Finite State Transducers

  2. Acknowledgement • Material in this lecture derived/copied in part from • Richard Sproat CL46 Lectures • Lauri Karttunen LSA lectures 2005 • Shuly Wintner 2008 Malta HLT: finite state transducers

  3. Three Key Concepts Regular Relations Finite State Transducers Computational Morphology HLT: finite state transducers

  4. Three Key Concepts Regular Relations Finite State Transducers Computational Morphology HLT: finite state transducers

  5. A Regular Set L1 ab abab ababab abababab ababababab . . HLT: finite state transducers

  6. Two Regular Sets L1 L2 ab abab ababab abababab ababababab . . ba baba bababa babababa bababababa . . HLT: finite state transducers

  7. A Regular Relation  L1 x L2 L1 L2 ab abab ababab abababab ababababab . . ba baba bababa babababa bababababa . . or {("ab","ba"), ("abab","baba"),...} HLT: finite state transducers

  8. Some closure properties for regular relations • Concatenation [R1 R2] • Power (Rn) • Reversal • Inversion (R-1) • Composition: R1 ○ R2 HLT: finite state transducers

  9. Concatenation and Power Concatenation R1 = {("a","b")} R2 = {("c","d")} [R1 R2] = {("ac","bd")} Power R1+ = {("a","b"),("aa","bb"), ("aaa","bbb"), ...} HLT: finite state transducers

  10. Composition • R1 ○ R2 denotes the composition of relations R1 and R2. • Definition If R1 contains <x,y> And R2 contains <y,z> Then R1 ○ R2 contains <x,z> • R1 and R2 and B must be relations. If either is just a language, it is assumed to abbreviate the identity relation. • R1 ○ R2 is written [R1 .o. R2] in xfst 61 HLT: finite state transducers

  11. Closure Properties of Regular Languages and Relations Operation Regular Languages Regular Relations Union yes yes Concatenation yes yes Iteration yes yes Intersection yes no Subtraction yes no Complementation yes no Composition n/a yes HLT: finite state transducers

  12. Morphology as a Regular Relation lexical language surface language cat cats mice lives . . . cat cat+N+PL mouse+N+PL life+N+PL live+V+3SING . . or {("cat,cat"),("cats","cat+N+PL"),......} HLT: finite state transducers

  13. Part-of-Speech Tagging • I know some new tricks • PRON V DET ADJ N • said the Cat in the Hat • V DET N P DET N HLT: finite state transducers

  14. Singular-to-plural mapping: • cat hat ox child mouse sheep • cats hats oxen children mice sheep HLT: finite state transducers

  15. Three Key Concepts Regular Relations Finite State Transducers Computational Morphology HLT: finite state transducers

  16. FSA a • Used for • Recognition • Generation HLT: finite state transducers

  17. Finite State Transducers • A finite state transducer (FST) is essentially an FSA finite state automaton that works on two (or more) tapes. • The most common way to think about transducers is as a kind of translating machine which works by reading from one tape and writing onto the other. HLT: finite state transducers

  18. FST Definition • A 2 way FST is a quintuple (K,s,F,ixo,) where • i, o are input and output alphabets • K is a finite set of states • s  K is an initial state • FK are final states •  is a transition relation of typeK x i x o x K HLT: finite state transducers

  19. FST a upper tape • Used for • Recognition • Generation • Translation lower tape b HLT: finite state transducers

  20. A Very Simple Transducer a b Relation { ("a","b") } Notation a:b encodes the transition HLT: finite state transducers

  21. A Very Simple Transducer a b also written as a:b HLT: finite state transducers

  22. A Very Simple Transducer upper side a b lower side a:b HLT: finite state transducers

  23. a Symbol Pairs • Symbols vs. symbol pairs • In general, no distinction is made in xfst betweena the language {“a”}a:a the identity relation {(“a”, “a”)} HLT: finite state transducers

  24. Relation { ("a","b"), ("aa","bb"), ...} Notationa:b* N.B. with this notation a and b must be single symbols A (more interesting) Transducer a:b 1 HLT: finite state transducers

  25. Transducer have SeveralModes of Operation • generation mode: It writes on both tapes. A string of as on one tape and a string of bs on the other tape. Both strings have the same length. • recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. • translation mode (left to right): It reads as from the first tape and writes a b for every a that it reads onto the second tape. • translation mode (right to left): It reads bs from the second tape and writes an a for every b that it reads onto the first tape. HLT: finite state transducers

  26. The Basic Idea • Morphology is regular • Morphology is finite state HLT: finite state transducers

  27. Morphology is Regular • The relation between the surface forms of a language and the corresponding lexical forms can be described as a regular relation, e.g.{ ("leaf+N+Pl","leaves"),("hang+V+Past","hung"),...} • Regular relations are closed under operations such as concatenation, iteration, union, and composition. • Complex regular relations can be derived from simpler relations. HLT: finite state transducers

  28. Morphology is finite-state • A regular relation can be defined using the metalanguage of regular expressions. • [{talk} | {walk} | {work}] • [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; • A regular expression can be compiled into a finite-state transducer that implements the relation computationally. HLT: finite state transducers

  29. Finite-state transducer +Base: final state +3rdSg:s a t +Progr:i :n :g a l k w o r +Past:e :d initial state Compilation Regular expression • [{talk} | {walk} | {work}] • [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; HLT: finite state transducers

  30. Generation work+3rdSg --> works +Base: +3rdSg:s a:a t:t +Progr:i :n :g a:a l:l k:k w:w o:o r:r +Past:e :d HLT: finite state transducers

  31. Analysis +Base: +3rdSg:s a:a t:t +Progr:i :n :g a:a l:l k:k w:w o:o r:r +Past:e :d talked --> talk+Past HLT: finite state transducers

  32. XFST Demo 2 start xfst % xfst xfst[0]: • xfst[0]: regex • [{talk} | {walk} | {work}] • [% +Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; compile a regular expression xfst[1]: apply up walked walk+Past apply the result xfst[1]: apply down talk+SgGen3 talks HLT: finite state transducers

  33. vouloir +IndP +SG + P3 Finite-state transducer veut citation form inflection codes v o u l o i r +IndP +SG +P3 v e u t inflected form Lexical transducer • Bidirectional: generation or analysis • Compact and fast • Comprehensive systems have been built for over 40 languages: • English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, Korean, Basque, Greek, Arabic, Hebrew, Bulgarian, … HLT: finite state transducers

  34. Morphotactics Lexicon Regular Expression Lexicon FST Lexical Transducer (a single FST) Compiler composition Rules Regular Expressions Rule FSTs Alternations f a t +Adj +Comp t e f a t r How lexical transducers are made HLT: finite state transducers

  35. fst 1 fst 2 fst n Sequential Model Lexical form Ordered sequence of rewrite rules (Chomsky & Halle ‘68) can be modeled by a cascade of finite-state transducers Johnson ‘72 Kaplan & Kay ‘81 Intermediate form ... Surface form HLT: finite state transducers

  36. fst n Parallel Model Lexical form ... fst 2 fst 1 Surface form Set of parallel of two-level rules (constraints) compiled into finite-state automata interpreted as transducers Koskenniemi ‘83 HLT: finite state transducers

  37. Koskenniemi 1983 Chomsky&Halle 1968 Lexical form Lexical form rule 1 rule 1 ... rule 2 rule 1 rule n Intermediate form Surface form intersect ... FST rule n Surface form Sequential vs. Parallel rules compose HLT: finite state transducers

  38. Sequential vs. Parallel Rules • Sequential rules are combined by means of composition. • Advantage: FSTs are closed under composition • Disadvantage: order of operations is sensitive • Parallel rules are combined by means of intersection • In general, FSTs are not closed under intersection. • … but FSTs without ε-transitions are closed under intersection. HLT: finite state transducers

  39. b:y a:x c:0 Crossproduct • A .x. B The relation that maps every string in A to every string in B, and vice versa • A:B Same as [A .x. B]. a b c .x. x y [a b c] : [x y] {abc}:{xy} HLT: finite state transducers

  40. b a c b:B a:A c:C b:B c:C a:A d:D Composition • A .o. B The relation C such that if A maps x to y and B maps y to z, C maps x to z. {abc} .o. [a:A | b:B | c:C | d:D]* HLT: finite state transducers

  41. Transducers are not closed under intersection c:a ε:b ε:b T1(Cn) = { anbm | m≥0 } T1∩T2 (Cn) = { anbn } c:b ε:a c:b T2(Cn) = { ambn | m≥0 } HLT: finite state transducers

  42. Xerox RE Operators • $ containment • => restriction • -> replacement • Make it easier to describe complex languages and relations without extending the formal power of finite-state systems. HLT: finite state transducers

  43. a ? ? a Containment $a [?* a ?*] HLT: finite state transducers

  44. b a => b _ c b ? a c “Anyamust be preceded byb and followed byc.” ? c c ~[~[?* b] a ?*] & ~[?* a ~[c ?*]] Equivalent expression Restriction HLT: finite state transducers

  45. a:b a b -> b a b:a ? a:b b “Replace ‘ab’ by ‘ba’.” ? a a [[~$[a b] [[a b] .x. [b a]]]* ~$[a b]] Equivalent expression Replacement HLT: finite state transducers

  46. a|e|i|o|u -> %[ ... %] 0:[ a [ i o ? e ] u 0:] Replacement + Marking p o t a t o p[o]t[a]t[o] HLT: finite state transducers

  47. L _ R A -> B Context Replacement The relation that replaces A by B between L and R leaving everything else unchanged. Conditional Replacement HLT: finite state transducers

  48. Sequential application k a N p a n k a m p a n k a m m a n N -> m / _ p p -> m / m _ HLT: finite state transducers

  49. N:m 2 p m N:m ? m 0 ? p 1 N N m p 1 ? m 0 ? p:m Sequential application in detail k a N p a n k a m p a n k a m m a n 0 0 0 2 0 0 0 0 0 0 1 0 0 0 HLT: finite state transducers

  50. 3 0 1 2 Composition N:m p:m k a N p a n k a m m a n N:m m 0 0 0 3 0 0 0 m ? ? p p:m N:m m N ? N N HLT: finite state transducers

More Related