html5-img
1 / 66

LING 438/538 Computational Linguistics

LING 438/538 Computational Linguistics. Sandiway Fong Lecture 25: 11/21. Administrivia. Lecture schedule (from last time) Tuesday 21st November Homework #6: Context-free Grammars and Parsing due Tuesday 28th Thursday 23rd November Turkey Day Tuesday 28th November Thursday 30th November

paxton
Télécharger la présentation

LING 438/538 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 438/538Computational Linguistics Sandiway Fong Lecture 25: 11/21

  2. Administrivia • Lecture schedule (from last time) • Tuesday 21st November • Homework #6: Context-free Grammars and Parsing • due Tuesday 28th • Thursday 23rd November • Turkey Day • Tuesday 28th November • Thursday 30th November • Homework #7: Machine Translation • due December 7th • 538 Presentations • Tuesday 5th December • Homework #7: Machine Translation • 538 Presentations

  3. Administrivia • 538 Presentations: assignments

  4. Last Time • Chapter 10: • Parsing with Context-Free Grammars • Top-down Parsing • Prolog’s DCG rule system • Left recursion • Left-corner idea • Bottom-up Parsing • Dotted rules • LR parsing: shift and reduce operations

  5. LR(0) parsing An example of bottom-up tabular parsing Similar to the top-down Earley algorithm described in the textbook in that both methods use the idea of dotted rules LR is more efficient it computes the dotted rules offline (during parser/grammar construction) Earley computes the dotted rules at parse time LR actions Shift: read an input word i.e. advance current input word pointer to the next word Reduce: complete a nonterminal i.e. complete parsing a grammar rule Accept: complete the parse i.e. start symbol (e.g. S) derives the terminal string Bottom-Up Parsing

  6. Dotted Rule Notation “dot” used to indicate the progress of a parse through a phrase structure rule examples vp --> v . np means we’ve seen v and predict np np --> . d np means we’re predicting a d (followed by np) vp --> vp pp. means we’ve completed a vp state a set of dotted rules encodes the state of the parse kernel vp --> v . np vp --> v . completion (of predict NP) np --> . d n np --> . n np --> . np cp Tabular Parsing

  7. Tabular Parsing • compute possible states by advancing the dot • example: • (Assume d is next in the input) • vp --> v . np • vp --> v . (eliminated) • np --> d . n • np --> . n (eliminated) • np --> . np cp

  8. Dotted rules example State 0: s -> . np vp np -> .d np np -> .n np -> .np pp possible actions shiftd and go to new state shiftn and go to new state Creating new states State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing shift d shift n

  9. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing • State 1: Shift N, goto State 2

  10. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP [V hit ] … [N man] [D a ] NP -> N . Input • state 3 Stack State 0 State 2 Tabular Parsing shift n shift d • Shift • take input word, and • place on stack

  11. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing • State 2: Reduce action NP -> N .

  12. [NP milk] Tabular Parsing • Reduce NP -> N . • pop [N milk] off the stack, and • replace with [NP [N milk]] on stack [V is ] … [N milk] Input • State 2 Stack

  13. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing • State 3: Reduce NP -> D N .

  14. [NP[D a ][N man]] Tabular Parsing • Reduce NP -> D N . • pop [N man] and [D a] off the stack • replace with [NP[D a][N man]] [V hit ] … [N man] [D a ] Input • State 3 Stack

  15. State 2 NP -> N . State 4 S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP S -> NP . VP NP -> NP . PP VP -> . V NP VP -> . V VP -> . VP PP PP -> . P NP State 0 Tabular Parsing • State 0: Transition NP

  16. Tabular Parsing • for both states 2 and 3 • NP -> N . (reduce NP -> N) • NP -> D N . (reduce NP -> D N) • after Reduce NP operation • Goto state 4 • notes: • states are unique • grammar is finite • procedure generating states must terminate since the number of possible dotted rules

  17. Tabular Parsing

  18. Tabular Parsing • Observations • table is sparse • example • State 0, Input: [V ..] • parse fails immediately • in a given state, input may be irrelevant • example • State 2 (there is no shift operation) • there may be action conflicts • example • State 1: shift D, shift N • more interesting cases • shift-reduce and reduce-reduce conflicts

  19. Tabular Parsing • finishing up • an extra initial rule is usually added to the grammar • SS --> S . $ • SS = start symbol • $ = end of sentence marker • input: • milk is good for you $ • accept action • discard $ from input • return element at the top of stack as the parse tree

  20. LR Parsing in Prolog • Recap • finite state machine • each state represents a set of dotted rules • example • S --> . NP VP • NP --> . D N • NP --> . N • NP --> . NP PP • we transition, i.e. move, from state to state by advancing the “dot” over terminal and nonterminal symbols

  21. LR Parsing in Prolog • Plan: • formally describe a LR finite state machine construction process • define the parse procedure • parse(Sentence,Tree) in terms of the LR finite state machine • run • John saw the man with a telescope • ? - parse([john,saw,the,man,with,a,telescope],T). which produces two parses (PP-attachment ambiguity)

  22. Grammar • assume grammar rules and lexicon: • rule(s,[np,vp]).convenient format for the LR(0) generator • rule(np,[d,n]). • rule(np,[n]). • rule(np,[np,pp]). • rule(vp,[v,np]). • rule(vp,[v]). • rule(vp,[vp,pp]). • rule(pp,[p,np]). • lexicon(the,d). lexicon(a,d). • lexicon(man,n). lexicon(john,n). lexicon(telescope,n). • lexicon(saw,v). lexicon(v,runs). • lexicon(with,p).

  23. Grammar • extra definitions • :- dynamic rule/2. • start(ss). • rule(ss,[s,$]). • nonT(ss). nonT(s). nonT(np). nonT(vp). nonT(pp). • term(n). term(v). term(p). term(d). term($). • notes: • $ = end of sentence marker • Prolog programming trick • declaring rule/2 as dynamic allows us to use the builtin clause(rule(LHS,RHS),true,Ref) to keep a pointer (Ref) to a particular rule

  24. Grammar Rule Predicates • define • %%% Assume grammar rules are stored as database facts • %%% rule(LHS,RHS) • ruleLHS(NonT,Ref) :- clause(rule(NonT,_),true,Ref). • ruleRHS(RHS,Ref) :- clause(rule(_,RHS),true,Ref). • ruleElements(LHS,RHS,Ref) :- % assume Ref instantiated • clause(rule(LHS,RHS),true,Ref). • note • Ref (when instantiated) is a pointer to an instance of rule(LHS,RHS).

  25. A Counter in Prolog • define • stateCounter(N) to hold the current state number (N = 0,1,2,3…) • define predicates • resetStateCounter :- • retractall(stateCounter(_)), • assert(stateCounter(0)). • incStateCounter :- • retract(stateCounter(X)), • Y is X + 1, • assert(stateCounter(Y)). Prolog builtins used: retract/1 - removes matching item from the database retractall/1 - removes all matching items from the database assert/1 - adds item to the database

  26. Data Structures • define cfsm/3 • cfsm(L,CFSet,N) “state configuration” • CFSet = list of dotted rules for state N • L = |CFSet| (used for quicker lookup) • define cf/3 • cf(Ref,I) “dotted rule configuration” • Ref points to a rule(LHS,RHS) • (I = 0,1,2…) is the index of the “dot” in RHS

  27. SS --> . S $ S --> . NP VP NP --> . D N NP --> . N NP --> . NP PP State 0 Build FSA • initially • R1 = rule(ss,[s,$]). • ss --> . s $ • cf(R1,0) • do a closure on the dotted rule, adding • s --> . np vp • np --> . d n • …

  28. Build FSM: Closure Operation • define • mkStartCF(cf(Ref,0)) :- start(Start),ruleLHS(Start,Ref). • call • mkStartCF(StartCF), • closure([StartCF],S0), • define closure/2 recursively • closure(CFSet,CFSet1) :- • dotNonT(CFSet,NonT), • predict(NonT,CFSet,CFSet2), • closure(CFSet2,CFSet1). • closure(CFSet,CFSet).

  29. Build FSM: Closure Operation • define dotNonT/2 to pick out possible instances ofYinX --> … .Y … • dotNonT([cf(Ref,Pos)|_],NonT) :- • dotNonT1(Ref,Pos,NonT). • dotNonT([_|L],NonT) :- dotNonT(L,NonT). • dotNonT1(Ref,Pos,NonT) :- • ruleRHS(RHS,Ref), nth(Pos,RHS,NonT), nonT(NonT). • notes • dotNonT/2 works just like list member/2 • nth(N,L,X) picks out (N+1)th element (X) in list L

  30. Build FSM: Closure Operation • define predict/3 to add new dotted rules for NonT • predict(NonT,CFSet,NewCFSet) :- • findall(cf(Ref,0),ruleLHS(NonT,Ref),NewCFs), • merge(NewCFs,CFSet,NewCFSet,[],new). • define merge/3 to add new dotted rules only if there’re not already present in CFSet • merge([],L,L,Flag,Flag). • merge([cf(Ref,Pos)|L],CFSet,CFSet1,Flag,Flag1) :- % already present • member(cf(Ref,Pos),CFSet), • merge(L,CFSet,CFSet1,Flag,Flag1). • merge([CF|L],CFSet,CFSet1,_,Flag) :- % CF is new • merge(L,[CF|CFSet],CFSet1,new,Flag). • note • the variable Flag ([]/new) is used to make sure something has been added to CFSet

  31. Build FSM: Closure Operation • call • mkStartCF(StartCF), • closure1([StartCF],S0), • resetStateCounter, • length(S0,L), • cfsmEntry(S0,L), • define storage predicate cfsmEntry/2 • cfsmEntry(CFSet,L) :- • stateCounter(State), • incStateCounter, • asserta(cfsm(L,CFSet,State)). • cfsm(L,CFSet,N) “state configuration” • CFSet = list of dotted rules for state N • L = |CFSet| (used for quicker lookup)

  32. Build FSM: Build new state • define buildState/1 • buildState(CFSet,S1) :- • transition(CFSet,Symbol,CFSet1), • length(CFSet1,L), • addCFSet(CFSet1,L,S2), • assert(goto(S1,Symbol,S2)), • fail. • buildState(_,_). • notes • transition/3 produces a new CFSet by advancing the dot over Symbol • addCFSet/3 will add a new state represented by CFSet1 (if it doesn’t already exist) • State transitions represented by goto(S1,Symbol,S2)

  33. State 4 S --> . NP VP NP --> . D N NP --> . N NP --> . NP PP S --> NP . VP NP --> NP . PP VP --> . V NP VP --> . V VP --> . VP PP PP --> . P NP State 0 Build FSM: Build new state • define transition/3 • transition(CFSet,Symbol,CFSet1) :- • pickSymbol(CFSet,Symbol), • advanceDot(CFSet,Symbol,CFSet2), • closure(CFSet2,CFSet1). • Note: pickSymbol/2 picks a symbol next to a dot in a dotted rule in CFSet • define advanceDot/3 • advanceDot([cf(Ref,Pos)|L],Symbol,[cf(Ref,Pos1)|CFSet]) :- • ruleRHS(RHS,Ref), nth(Pos,RHS,Symbol), • !, • Pos1 is Pos+1, • advanceDot(L,Symbol,CFSet). • advanceDot([_|L],Symbol,CFSet) :- !, advanceDot(L,Symbol,CFSet). • advanceDot([],_,[]).

  34. Build FSM: Build new state • define addCFSet/3 • addCFSet(CFSet,L,S) :- % CFSet already established • findCFSet(CFSet,S,L), • !. • addCFSet(CFSet,L,S) :- % CFSet is new state #N • cfsmEntry(CFSet,L,S). % add it • Note: • findCFSet/3 will succeed only if CFSet exists in the current cfsm/3 database • cfsmEntry/3 defined earlier will increment the state number (S) and perform: • ?- asserta(cfsm(L,CFSet,S)).

  35. Build Actions • two main actions • Shift • move a word from the input onto the stack • Example: • NP --> .D N • Reduce • build a new constituent • Example: • NP --> D N.

  36. Build Actions • Machine components [V hit ] … Input [N man] [D a ] 3 2 0 • A machine operation step (action) will have signature: • CS x Input x SS  CS’ x Input’ x SS’ • where • CS = control stack • SS = (constituent) structure stack Structure Stack (items) Control Stack (states)

  37. Build Actions: shift action • example • shift(n) • code • action(S, CS, Input, SS, CS2, Input2, SS2 ) :- • Input = [Item|Input2], • category(Item,n), • goto(S,n,S2), • CS2 = [S2|CS], • SS2 = [Item|SS]. • notes: (changes) • Input2 is Input minus Item • SS2 is SS plus Item • CS2 is CS plus S2 from goto(S,n,S2)

  38. Build Actions: shift action • calling pattern for action/7 • given values for: • current state (S) • control and structure stacks (CS,SS) • compute new values of: • state (S2) • control and structure stacks (CS2,SS2) action(S, CS, Input, SS, CS2, Input2, SS2 ) Given Compute

  39. Build Actions: reduce action • example • reduce NP --> D N. • code • action(S, CS, Input, SS, CS2, Input2, SS2 ) :- • Input = Input2, • SS = [N,D|SS3], • SS2 = [np(D,N)|SS3], • CS = [_,_,S1|CS3], • CS2 = [S2,S1|CS3], • goto(S1,np,S2). • notes • input is unchanged • pop 2 items off the stacks • goto is not based on current state

  40. Build Actions • define shift/reduce action generation procedure • buildActions :- • cfsm(_,CFSet,State), • actions(CFSet,Instructions), • genActions(State,Instructions), • fail. • buildActions. • define actions/2 • actions([],[]). • actions([CF|CFs],L) :- • reduceAction(CF,L1), • shiftAction(CF,L2), • append(L1,L2,L3), • actions(CFs,L4), • union(L3,L4,L) % should be no duplicate actions

  41. Build Actions • define shift and reduce actions • reduceAction(cf(Ref,Pos),[reduce(Ref)]) :- • ruleRHS(RHS,Ref), • length(RHS,Pos), % finds config. A-->. • !. • reduceAction(_,[]). • % assume that Symbol in Vt • shiftAction(cf(Ref,Pos),[shift(Symbol)]) :- • ruleRHS(RHS,Ref), % finds config. A-->.a • nth(Pos,RHS,Symbol), • term(Symbol), • !. • shiftAction(_,[]). • builds sequences of instructions of the form • [shift(n), reduce(R3)]etc.

  42. Build Actions • define procedure genActions/2 • which turns instructions such as: • shift(n) • into code like • action(S, CS, Input, SS, CS2, Input2, SS2 ) :- • Input = [Item|Input2], • category(Item,n), • goto(S,n,S2), • CS2 = [S2|CS2].

  43. Build Actions • genActions/2 • processes a list of actions for a given state S • genActions(_,[]). • genActions(S,[Action|As]) :- • nl, • actionClause(S,Action,Clause), • write(Clause), write('.'), • genActions(S,As). Prolog builtins nl - writes a newline to standard output write/1 - writes supplied argument to standard output

  44. Build Actions: shift • generate action/7 for shift • % shifting a $ • actionClause(State,shift($),action(State,_,[$],SS,accept,[],SS)) :- !. • % shifting anything other than a $ • actionClause(State,shift(Symbol), • (action(State,CS,[I|Is],SS,[S|CS],Is,[I|SS]) :- • functor(I,Symbol,_), • goto(State,Symbol,S))). • note: • see words/2 later • assume input item is of form c(word), e.g. n(john)

  45. Build Actions: reduce • generate action/7 for reduce • actionClause(State,reduce(Ref), • (action(State,CS,I,SS,[S2,Last|CS1],I,[Item|SS1]) :- • goto(Last,NT,S2))) :- • ruleElements(NT,RHS,Ref), • popStk(RHS,CS,Last,CS1), • popAndLink(RHS,SS,SS1,L), • Item =.. [NT|L]. • note • popStk/4 and popAndLink/4 both generate code to pop the control and structure stacks

  46. example of LR Machine constructed % State 8: pp->.p np vp->vp .pp s->np vp. goto(4,vp,8). % State 9: vp->vp pp. goto(8,pp,9). goto(8,p,6). goto(7,d,2). goto(7,n,3). % State 10: pp->.p np np->np .pp vp->v np. goto(7,np,10). goto(10,pp,5). goto(10,p,6). goto(6,d,2). goto(6,n,3). % State 11: pp->.p np np->np .pp pp->p np. goto(6,np,11). goto(11,pp,5). goto(11,p,6). % State 12: np->d n. goto(2,n,12). % State 13: ss->s $. goto(1,$,13). LR Machine: goto table

  47. LR Machine: action table • example of action table constructed • action(State,CS,Input,SS,CS’,Input’,SS’) • % 7 • action(7,_14,[_20|_18],_16,[_22|_14],_18,[_20|_16]):-functor(_20,n,_32),goto(7,n,_22). • action(7,_58,[_64|_62],_60,[_66|_58],_62,[_64|_60]):-functor(_64,d,_76),goto(7,d,_66). • action(7,[_38,_10|_11],_03,[_44|_13],[_08,_10|_11],_03,[vp(_44)|_13]):-goto(_10,vp,_08). • % 6 • action(6,_78,[_84|_82],_80,[_86|_78],_82,[_84|_80]):-functor(_84,n,_96),goto(6,n,_86). • action(6,_22,[_28|_26],_24,[_30|_22],_26,[_28|_24]):-functor(_28,d,_40),goto(6,d,_30). • % 5 • action(5,[_68,_70,_38|_39],_31,[_78,_82|_41],[_36,_38|_39],_31,[np(_82,_78)|_41]):-goto(_38,np,_36).

  48. Parser • define parse/2 as follows • parse(Words,Parse) :- • words(Words,L), • machine([0],L,[],Parse). • machine(CS,Input,SS,Parse) :- • CS = accept • -> SS = [Parse] • ; CS = [State|_], • action(State,CS,Input,SS,CS2,Input2,SS2), • machine(CS2,Input2,SS2,Parse). • words([],[$]). • words([W|Ws],[I|Is]) :- lexicon(W,C), I =.. [C,W], words(Ws,Is).

  49. Administrivia • Prolog code available on the course webpage • files • grammar0.pl - example grammar • lr0.pl - LR(0) parser/generator • machine0.pl - generated tables

  50. How to use steps ?- [grammar0]. (consult toy grammar) ?- [lr0]. (consult LR code) ?- build. (constructs goto table) ?- buildActions. (constructs shift/reduce actions) How to use (saving output to a file) steps ?- [grammar0]. (consult toy grammar) ?- [lr0]. (consult LR code) ?- tell(‘filename.pl’). (redirect screen output to filename.pl) ?- build. (constructs goto table) ?- buildActions. (constructs shift/reduce actions) ?- told. (close filename.pl) LR Parsing in Prolog

More Related