270 likes | 367 Vues
CSA3050: Natural Language Algorithms. Finite State Devices. Sources. Blackburn & Striegnitz Ch. 2. Parsers vs. Recognisers. Recognizers tell us whether a given input is accepted by some finite state automaton. Often we would like to have an explanation of why it was accepted.
E N D
CSA3050: Natural Language Algorithms Finite State Devices
Sources • Blackburn & Striegnitz Ch. 2 CSA3050 NLP Algorithms
Parsers vs. Recognisers • Recognizers tell us whether a given input is accepted by some finite state automaton. • Often we would like to have an explanation of why it was accepted. • Parsers give us that kind of explanation. • What form does it take? CSA3050 NLP Algorithms
Finite State Parser • The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4]. • The technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found. CSA3050 NLP Algorithms
Recogniser recognize1(Node,[ ]) :- final(Node). Parser parse1(Node,[ ],[Node]) :- final(Node). Base Case CSA3050 NLP Algorithms
Recogniser recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString). Parser parse1(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label, String, NewString), parse1(Node2, NewString, Path). Recursive Case CSA3050 NLP Algorithms
Complex Labels • So far we have only considered transitions with single-character labels. • More complex labels are possible – e.g. symbols comprising several characters. • We can construct an FSA recognizing English noun phrases that can be built from the words:the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast. CSA3050 NLP Algorithms
FSA for Noun Phrases CSA3050 NLP Algorithms
initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch). initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch). arc(2,3,wizard).arc(2,3,broomstick).arc(2,3,rat).arc(1,3,harry).arc(1,3,ron).arc(1,3,hermione).arc(3,1,with). FSA for NPs in Prolog CSA3050 NLP Algorithms
Parsing a Noun Phrase testparse1(Symbols,Parse) :- initial(Node),parse1(Node,Symbols,Parse). ?- testparse1([the,fast,wizard],Z). Z=[1, the, 2, fast, 2, wizard, 3] CSA3050 NLP Algorithms
Rewriting Categories • It is also possible to obtain a more abstract parse, e.g. ?- testparse2([the,fast,wizard],Z). Z=[1, det, 2, adj, 2, noun, 3] • What changes are required to obtain this behaviour? CSA3050 NLP Algorithms
1. Changes to the FSA %FSA %Lexicon initial(1). lex(a,det).final(3). lex(the,det).arc(1,2,det). lex(fast,adj).arc(2,2,adj). lex(brave,adj).arc(2,3,cn). lex(witch,cn).arc(1,3,pn). lex(wizard,cn).arc(3,1,prep). lex(broomstick,cn). lex(rat,cn). lex(harry,pn). lex(hermione,pn). lex(ron,pn). lex(with,prep). CSA3050 NLP Algorithms
Parse1 parse1(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label, String, NewString), parse1(Node2, NewString, Path). Changes to the Parser Parse2 parse2(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse2( Label, String, NewString), parse2(Node2, NewString, Path). traverse2(Label,[Symbol|Symbols],Symbols) :- lex(Symbol,Label).
Handling Jumps traverse3('#',String,String). traverse3(Cat,[Word|Words],Words) :- lex(Word,Cat). CSA3050 NLP Algorithms
Finite State Transducers • A finite state transducer essentially is a finite state automaton that works on two (or more) tapes. • The most common way to think about transducers is as a kind of ``translating machine'‘ which works by reading from one tape and writing onto the other. CSA3050 NLP Algorithms
initial state: arrowhead final state:double circle a:b read from first tape and write to second tape A Translator from a to b CSA3050 NLP Algorithms
Prolog Representation :- op(250,xfx,:).initial(1).final(1).arc(1,1,a:b). CSA3050 NLP Algorithms
Modes of Operation • generation mode: It writes a string of as on one tape and a string bs on the other tape. Both strings have the same length. • recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. • translation mode (left to right): It reads as from the first tape and writes an b for every a that it reads onto the second tape. • translation mode (right to left): It reads bs from the second tape and writes an a for every f that it reads onto the first tape. CSA3050 NLP Algorithms
Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes. So, transitions of the form a:# or #:a or #:# are possible. Transducers and Jumps CSA3050 NLP Algorithms
Simple Transducer in Prolog transduce1(Node,[ ],[ ]) :- final(Node). transduce1(Node1,Tape1,Tape2) :-arc(Node1,Node2,Label),traverse1(Label, Tape1, NewTape1, Tape2, NewTape2),transduce1(Node2,NewTape1,NewTape2). CSA3050 NLP Algorithms
Traverse for FST traverse1(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2). testtrans1(Tape1,Tape2) :- initial(Node), transduce1(Node,Tape1,Tape2). CSA3050 NLP Algorithms
Handling Jumps:4 cases • Jump on both tapes. • Jump on the first but not on the second tape. • Jump on the second but not on the first tape. • Jump on neither tape (this is what traverse1 does). CSA3050 NLP Algorithms
4 Corresponding Clauses traverse2('#':'#',Tape1,Tape1,Tape2,Tape2). traverse2('#':L2,Tape1,Tape1,[L2|RestTape2],RestTape2). traverse2(L1:'#',[L1|RestTape1],RestTape1,Tape2,Tape2). traverse2(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2). CSA3050 NLP Algorithms
Morphological Analysis with FSTs • Morphology is concerned with the internal structure of words. • How can a word be decomposed into morphemes? • How do the morphemes combine? • What are legitimate combinations? • Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa. • Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST. CSA3050 NLP Algorithms
Plural Nouns in English • Regular Forms • add an s as in wizard+s. • add –es as in witch +s • Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative. • Irregular forms • mouse/mice • automaton/automata • Handled on a case-by-case basis • Require transducer that translates wizard+s into wizard+PL, witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL. CSA3050 NLP Algorithms
FST for English Plurals CSA3050 NLP Algorithms
FST in Prolog lex(wizard:wizard,`STEM-REG1').lex(witch:witch,`STEM-REG2').lex(automaton:automaton,`IRREG-SG').lex(automata:`automaton-PL',`IRREG-PL').lex(mouse:mouse,`IRREG-SG').lex(mice:`mouse-PL',`IRREG-PL'). CSA3050 NLP Algorithms