190 likes | 303 Vues
This document explores the various levels of linguistic analysis relevant to Natural Language Processing (NLP) and their applications in understanding and processing human language. It covers phonology, morphology, syntax, semantics, and pragmatics, emphasizing their roles in recognizing speech sounds, analyzing word forms, and constructing meaningful sentences. Important processes like tokenization, template systems, and generative grammars are discussed to illustrate how language structures can be effectively processed by computational systems.
E N D
Natural Language Processing CS480/580
Levels of Linguistic Analysis • Phonology---recognize speech sounds • Morphology---analysis of word forms (e.g., adding s to make a plural etc.) • Syntax---sentence structure • Semantics---meaning • Pragmatics---relation of language to context
Tokenization • A string broken into words, punctuations removed, and key information represented as a sequence of words or tokens. • E.g., “How are you today?” is converted to [how, are, you, today].
Tokenize.pl lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). tokenize("This is CS480/580 course", X). X = [this, is, cs480580, course]. name(john,X). X = [106, 111, 104, 110].
Template System • Templates --- stored sentence patterns • Each template is accompanied by a translation schema • E.g., [X, is, a , Y] is translated to Y(X). • process([X, is, a, Y]) :- Fact =.. [Y, X], assert(Fact). • Process([is, X, a T]) :- Query =.. [Y, X], call(Query).
Template.pl grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). write_str([A|B]) :- put(A), write_str(B). write_str([]). read_str_aux(-1, []) :- !. read_str_aux(10, []) :- !. read_str_aux(13, []) :- !. read_str_aux(A, [A|B]) :- read_str(B). do_one_sentence :- write(>), read_str(A), tokenize(A, B), process(B). note(A) :- asserta(A), write('OK'), nl. read_atom(A) :- read_str(B), name(A, B). start :- write('TEMPLATE.PL at your service.'), nl, write('Terminate by pressing Break.'), nl, repeat, do_one_sentence, fail. check(A) :- call(A), !, write('Yes.'), nl. check(_) :- write('Not as far as I know.'), nl. read_num(A) :- read_str(B), name(A, B).
remove_s(A, C) :- name(A, B), remove_s_list(B, D), name(C, D). read_str(B) :- get0(A), read_str_aux(A, B). remove_s_list([115], []). remove_s_list([A|B], [A|C]) :- remove_s_list(B, C). process([B, is, a, A]) :- !, C=..[A, B], note(C). process([A, is, an, B]) :- !, process([A, is, a, B]). process([is, B, a, A]) :- !, C=.. [A, B], check(C). process([is, A, an, B]) :- !, process([is, A, a, B]). process([A, are, B]) :- !, remove_s(A, D), remove_s(B, C), F=..[C, E], G=..[D, E], note((F:-G)). process([does, B, A]) :- !, C=..[A, B], check(C). process([A, B]) :- \+ remove_s(A, _), remove_s(B, C), !, D=..[C, A], note(D). process([A, B]) :- remove_s(A, C), \+ remove_s(B, _), !, E=..[B, D], F=..[C, D], note((E:-F)). process(_) :- write('I do not understand.'), nl. tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). start. TEMPLATE.PL at your service. Terminate by pressing Break. >CS480 is a course. OK >is CS480 a course? Yes. >is cs471 a course? Not as far as I know. >cs471 is a course. OK >is cs471 a course? Yes.
Generative Grammars • Templates are inadequate to describe human language (in the last example only sentences that were allowed was X is a Y.) • John arrived • Max said John arrived • Bill claimed Max said John arrived • Mary thought Bill claimed Max said John arrived • Chomsky’s suggestion: Treat syntax as a problem in set theory---express infinite set as a finite description
Context Free Grammars • Phrase Structure Rules • S NP VP • NP Det N • N N PP • N N N • PP P NP • VP IV VP TV NP VP DV NP NP • Lexical Entries • N book, cow, course, … • P in, on, with, … • Det the, every, … • IV ran, hid, … • TV likes, hit, … • DV gave, showed Noam Chomsky
Context-Free Derivations • S NP VP Det N VP the N VP the kid VP the kid IV the kid ran • Penn TreeBank bracketing notation (Lisp-like) • (S (NP (Det the) (N kid)) (VP (IV ran))) • Theorem: A sequence has a derivation if and only if it has a parse tree
A simple Parser verb_phrase(A, C) :- verb(A, B), noun_phrase(B, C). verb_phrase(A, C) :- verb(A, B), sentence(B, C). determiner([the|A], A). determiner([a|A], A). sentence(A, C) :- noun_phrase(A, B), verb_phrase(B, C). noun_phrase(A, C) :- determiner(A, B), noun(B, C). noun([dog|A], A). noun([cat|A], A). noun([boy|A], A). noun([girl|A], A). verb([chased|A], A). verb([saw|A], A). verb([said|A], A). verb([believed|A], A). 2 ?- sentence([the, cat, saw, the, dog], []). true . 3 ?- sentence([the, dog, saw, the, dog], []). true . 4 ?- sentence([a, dog, chased, the, cat], []). true . 5 ?- sentence([that, dog, chased, the, cat], []). false.
Definite Clause Grammar (DCG) • This is a Prolog notation to provide an easy way to write grammar rules. • E.g., sentence non_phrase, verb_phrase. • This is equivalent to the rule: • sentence(X,Z) :- noun_phrase(X,Y), verb_phrase(Y,Z). • Also, noun [dog] or noun [dog] [cat]; [boy]; [girl] • or verb [gives, up] where “gives up” is a single verb. • A query to the above sentence rule will be sentence/2 E.g., sentence([the dog, chased, the, cat],[]). Try sentence([A,B,C,D,E],[]) or sentence([the, A, B, C, cat|E],[]). Non-terminal symbols can also take arguments: e.g., sentence(N) noun_phrase(N), verb_phrase(N).
Parser2.pl based on DCG sentence --> noun_phrase, verb_phrase. noun_phrase --> determiner, noun. verb_phrase --> verb, noun_phrase. verb_phrase --> verb, sentence. determiner --> [the]. determiner --> [a]. noun --> [dog]; [cat]; [boy]; [girl]. verb --> [chased]; [saw]; [said]; [believed]. verb --> [saw]. verb --> [said]. verb --> [believed].
Grammatical Features • How to handle agreement in tense and number between the noun and the verb? sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(singular) --> [dog];[cat];[boy];[girl]. noun(plural) --> [dogs];[cats];[boys];[girls]. verb(singular) --> [chases];[sees];[says];[believes]. verb(plural) --> [chase];[see];[say];[believe].
sentence(plural, [the, dogs, A, B, C],[]). A = chase, B = a, C = dog ; A = chase, B = a, C = cat ; A = chase, B = a, C = boy ; A = chase, B = a, C = girl ; A = chase, B = the, C = dog
Morphology • How to generate plural nouns from singular? • How to generate third person singular verbs from plural verbs? • Mostly by adding: s
Sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(N) --> [X], { morph(noun(N),X) }. verb(N) --> [X], { morph(verb(N),X) }. morph(noun(singular),dog). % Singular nouns morph(noun(singular),cat). morph(noun(singular),boy). morph(noun(singular),girl). morph(noun(singular),child). morph(noun(plural),children). % Irregular plural nouns morph(noun(plural),X) :- % Rule for regular plural nouns remove_s(X,Y), morph(noun(singular),Y). morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(singular),X) :- % Rule for singular verbs remove_s(X,Y), morph(verb(plural),Y). % remove_s(+X,-X1) [lifted from TEMPLATE.PL] % removes final S from X giving X1, % or fails if X does not end in S. remove_s(X,X1) :- name(X,XList), remove_s_list(XList,X1List), name(X1,X1List). remove_s_list("s",[]). remove_s_list([Head|Tail],[Head|NewTail]) :- remove_s_list(Tail,NewTail).
morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(singular),X) :- % Rule for singular verbs remove_s(X,Y), morph(verb(plural),Y). % remove_s(+X,-X1) [lifted from TEMPLATE.PL] % removes final S from X giving X1, % or fails if X does not end in S. remove_s(X,X1) :- name(X,XList), remove_s_list(XList,X1List), name(X1,X1List). remove_s_list("s",[]). remove_s_list([Head|Tail],[Head|NewTail]) :- remove_s_list(Tail,NewTail).