1 / 32

LING/C SC/PSYC 438/538 Computational Linguistics

LING/C SC/PSYC 438/538 Computational Linguistics. Sandiway Fong Lecture 10 : 9/27. Today’s Topics. Homework 2 review Recap: DCG system New topic: Finite State Automata (FSA). Question 1

hani
Télécharger la présentation

LING/C SC/PSYC 438/538 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING/C SC/PSYC 438/538Computational Linguistics Sandiway Fong Lecture 10: 9/27

  2. Today’s Topics • Homework 2 review • Recap: DCG system • New topic: Finite State Automata (FSA)

  3. Question 1 Consider a language L3or5 = {111, 11111, 111111, 111111111, 1111111111, 111111111111, 111111111111111,...} each member of L3or5 is a string containing only 1s the number of 1s in each string is divisible by 3 or 5 Give a regular grammar that generates language L3or5 Homework 2 Review

  4. Regular grammar: divisible by 3 side s --> [1], a. a --> [1], b. b --> [1]. b --> [1], c. c --> [1], d. d --> [1], b. Regular grammar: divisible by 5 side s --> [1], one. one --> [1], two. two --> [1], three. three --> [1], four. four --> [1]. four --> [1], five. five --> [1], six. six --> [1], seven. seven --> [1], eight. eight --> [1], four. Homework 2 Review take the disjunction of the two sub-grammars, i.e. merge them

  5. Question 2: Language L = {a2nbn+1 | n ≥1} is also non-regular but can be generated by a regular grammar extended to allow left and right recursive rules Given a Prolog grammar satisfying these rules for L Legit rules: X -> aY X -> a X -> Ya where X, Y are non-terminals, a is some arbitirary terminal Homework 2 Review

  6. L = {a2nbn+1 | n≥ 1} DCG: s --> [a], b. b --> [a], c. c --> [b], d. c --> s, [b]. d --> [b]. s a b a c b d b s a b a c s b Homework 2 Review generated by rules 1 + 2 + 3 + 5 generated by rules 1 + 2 + 4 (center-embedded recursive tree)

  7. Question 3 (Optional 438) A right recursive regular grammar that generates a (rightmost) parse for the language {an| n>=2}: s(s(a,B)) --> [a], b(B). b(b(a,B)) --> [a], b(B). b(b(a))--> [a]. Example ?- s(Tree,[a,a,a],[]). Tree = s(a,b(a,b(a)))? A corresponding left recursive regular grammar will not halt in all cases when an input string is supplied Modify the right recursive grammar to produce a left recursive parse, e.g. ?- s(Tree,[a,a,a],[]). Tree = s(b(b(a),a),a) Can you do this for any right recursive regular grammar? e.g. the one for sheeptalk Homework 2 Review

  8. Original: s(s(a,B)) --> [a], b(B). b(b(a,B)) --> [a], b(B). b(b(a))--> [a]. New: s(s(B,a)) --> [a], b(B). b(b(B,a)) --> [a], b(B). b(b(a))--> [a]. Homework 2 Review taking advantage of the fact that we know we’re generating only a’s example: in rule 1, we match a terminal a on the left side but place an a at the end for the tree

  9. Extra Arguments: Recap • some uses for extra arguments on non-terminals in the grammar: • to generate a parse tree • recording the derivation history • feature agreement • implement counting

  10. Implement Determiner-Noun Number Agreement Data: the man/men a man/*a men Modified grammar: np(np(D,N)) --> det(D,Number), common_noun(N,Number). det(det(the),_) --> [the]. det(det(a),sg) --> [a]. common_noun(n(ball),sg) --> [ball]. common_noun(n(man),sg) --> [man]. common_noun(n(men),pl) --> [men]. Extra Arguments: Recap np(np(D,N)) --> detsg(D), common_nounsg(N). np(np(D,N)) --> detpl(D), common_nounpl(N). detsg(det(a)) --> [a]. detsg(det(the)) --> [the]. detpl(det(the)) --> [the]. common_nounsg(n(man)) -->[man]. common_nounpl(n(men)) --> [men]. syntactic sugar could just encode feature values in nonterminal name

  11. Class Exercise: Implement Case = {nom,acc} agreement system for the grammar Examples: the man hit me vs. *the man hit I Modified grammar: s(Y,Z)) --> np(Y,Case), vp(Z), { Case = nom }. np(np(Y),Case) --> pronoun(Y,Case). pronoun(i,nom) --> [i]. pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num). vp(vp(Y,Z)) --> transitive(Y), np(Z,Case), { Case = acc }. Extra Arguments: Recap { ... Prolog code ... }

  12. Class Exercise: Implement Case = {nom,acc} agreement system for the grammar Examples: the man hit me vs. *the man hit I Modified grammar: s(Y,Z)) --> np(Y,nom), vp(Z). np(np(Y),Case) --> pronoun(Y,Case). pronoun(i,nom) --> [i]. pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num). vp(vp(Y,Z)) --> transitive(Y), np(Z,acc). Extra Arguments: Recap without { ... Prolog code ... }

  13. Extra Arguments: Recap s --> [a],b. b --> [a],b. b --> [b],c. b --> [b]. c --> [b],c. c --> [b]. s(X) --> [a],b(a(X)). b(X) --> [a],b(a(X)). b(a(X)) --> [b],c(X). b(a(0)) --> [b]. c(a(X)) --> [b],c(X). c(a(0)) --> [b]. Lab = { anbn | n ≥ 1 } regular grammar + extra argument L = a+b+ a regular grammar Query:?- s(0,L,[]).

  14. s(X) --> [a],b(a(X)). b(X) --> [a],b(a(X)). b(a(X)) --> [b],c(X). b(a(0)) --> [b]. c(a(X)) --> [b],c(X). c(a(0)) --> [b]. derivation tree: Extra Arguments: Recap s(0) rule 1 b(a(0)) rule 2 a b(a(a(0))) rule 3 a logic: ?- s(0,L,[]). start with 0 wrap an a(_) for each “a” unwrap a(_) for each “b” match #a’s with #b’s = counting c(a(0)) rule 6 b b

  15. FSA So far • Equivalent formalisms regexp regular grammars

  16. FSA • example (sheeptalk) • baa! • baaa! … • regexp • baa+! basic idea: “just follow the arrows” a s w x y a b a > state transition: in state s, if current input symbol is b go to state w ! state z accepting computation: in a final state, if there is no more input, accept string points to start state end state marked in red

  17. FSA: Construction • step-by-step • regexp • baa+! = • baaa*! s >

  18. FSA: Construction • step-by-step • regexp • baaa*! • b • from state s, • see a ‘b’, • move to state w s w b >

  19. FSA: Construction • step-by-step • regexp • baaa*! • ba • from state w, • see an ‘a’, • move to state x s w x b a >

  20. FSA: Construction • step-by-step • regexp • baaa*! • baa • from state x, • see an ‘a’, • move to state y s w x b a > a y

  21. FSA: Construction • step-by-step • regexp • baaa*! • baaa* • baa • baaa • baaaa... • from y, • see an ‘a’, • move to ? s w x b a > a y but machine must have a finite number of states! a y’ a ...

  22. Regular Expressions: Example • step-by-step • regexp • baaa*! • baa* • baa • baaa • baaaa... • from state y, • see an ‘a’, • “loop”, i.e. move back to, state y s w x b a > a y a

  23. s w x b a > a z ! y a Regular Expressions: Example • step-by-step • regexp • baaa*! • baaa*! • from state y, • see an ‘!’, • move to final state z (indicated in red) Note: machine cannot finish (i.e. reach the end of the input string) in states s, w, x or y

  24. Finite State Automata (FSA) • construction • the step-by-step FSA construction method we just used • works for any regular expression • conclusion • anything we can encode with a regular expression, we can build a FSA for it • an important step in showing that FSA and regexps are formally equivalent

  25. a b c d x y e ... ... z etc. a etc. one loop for each character y regexp operators and FSA • basic wildcards • . and * • . any single character • e.g. p.t • put, pit, pat, pet • * zero or more characters over the whole alphabet

  26. a x y a a e i o x y u regexp operators and FSA • basic wildcards • + • one or more of the preceding character • e.g. a+ • [ ] • range of characters • e.g. [aeiou]

  27. x y > a regexp operators and FSA λ denotes the empty transition • basic wildcards • ? • zero or one of the preceding character • e.g. a? • Non-determinism • any FSA with an empty transition is non-deterministic • see example: could be in state x or y simultaneous • any FSA with an empty transition can be rewritten without the empty transition λ x y a

  28. ya a ye a a a e e e e yi i i i o o o y x y z x o yo z u u u u yu regexp operators and FSA • backreferences: • by state splitting • e.g. ([aeiou])\1

  29. s w x b a > a z ! y a Finite State Automata (FSA) • more formally • (Q,s,f,Σ,) • set of states (Q): {s,w,x,y,z} 4 states must be a finite set • start state (s): s • end state(s) (f): z • alphabet (Σ): {a, b, !} • transition function : signature: character × state → state • (b,s)=w • (a,w)=x • (a,x)=y • (a,y)=y • (!,y)=z

  30. FSA • Finite State Automata (FSA) have a limited amount of expressive power • Let’s look at a modification to FSA and its effect on its power

  31. b a b b abb String Transitions • so far... • all machines have had just a single character label on the arc • so if we allow strings to label arcs • do they endow the FSA with any more power? • Answer: No • because we can always convert a machine with string-transitions into one without

  32. a s w x b a > s y baa > a z ! y ! a z Finite State Automata (FSA) • equivalent 3 state machine using string transitions 5 state machine

More Related