560 likes | 801 Vues
Natural Language Processing. Spring 2007 V. “Juggy” Jagannathan. Course Book. Foundations of Statistical Natural Language Processing. By Christopher Manning & Hinrich Schutze. Chapter 3. Linguistic Essentials January 22, 2007. Parts of Speech and Morphology.
E N D
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan
Course Book Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze
Chapter 3 Linguistic Essentials January 22, 2007
Parts of Speech and Morphology • Syntactic/Grammatical categories – Parts of Speech (POS) • Nouns – refer to people, animal, concepts & things • Verbs – to express action in a sentence • Adjectives – describe properties of nouns • Substitution test for adjectives • Ex: The {sad, intelligent, green, fat…} one is in the corner.
Word class/lexical categories • Open or lexical categories • Nouns, verbs and adjectives that have a large membership and continually grows as new words are added to the language • Closed word or functional categories • Prepositions and determiners • Ex. Of, on, the, a • Words are listed in a “dictionary” referred to by linguists as the “lexicon”
Tags • Parts of Speech tagging – 8 categories – referred to as POS tags. • Corpus Linguists use more fine grained tagging • Various corpus have been tagged extensively and the pioneering one is the Brown corpus. • Adjectives in Brown corpus are referred by the tag “JJ”
Morphological process • Source: http://www.sil.org/LINGUISTICS/GlossaryOfLinguisticTerms/WhatIsAMorphologicalProcess.htm • “Definition A morphological process is a means of changing a stem to adjust its meaning to fit its syntactic and communicational context.” • Examples • Plural form (dog-s) derived from (dog)
Morphological processes • Major forms of morphological processes • Inflection • Systematic modification of a root (stem) form by means of prefixes and suffixes • Inflection does not change the meaning of the word but does change word features such as tense and plurality. • All of the inflectional forms of a word are grouped as manifestation of a “lexeme” • Derivation • Can dramatically change the meaning of the derived word. • Ex: Adverb “widely” derived from adjective “wide” • Ex: suffix use – weak-en; soft-en; understand-able; accept-able; teach-er; lead-er; • Compounding • Merging of two or more words into a new word (concept) • Ex. Disk drive, tea kettle, college degree, down market, mad cow disease, overtake
Nouns and Pronouns • Nouns – refers to people, animals and things • Dog, tree, person, hat, speech, idea, philosophy • Inflection is a process by which stem of a word can be modified to create new word • English the only form of inflection is one indicating whether a noun is singular or plural • Ex. Dogs, trees, hats, speeches, persons • Irregular inflection examples: women • Other languages use inflection to convey “gender – masculine, feminine, neuter” and “case – nominative, genitive, dative, accusative).
Gender forms • Pronouns • Masculine (he), feminine (she), neuter (it) • Case relationship in English – the genitive case • Ex: the woman’s house; the students’ grievances • Possessive pronouns • Ex: my car • Second possessive form of pronoun: a friend of mine • Reflexive pronouns – ex. Herself, myself • Ex: • Mary saw herself in the mirror. • Mary saw her in the mirror. • Also referred to as “anaphors” must refer to something nearby in the text.
Brown tags ** Examples from: http://www.tameri.com/edit/doubles.html
Words that accompany nouns: determiners and adjectives • Determiners – describe the particular reference of a noun • Articles – refers to someone or something • “the” refers to someone or some thing we already know about and is being referenced • Ex. “the tree” refers to a known tree. • “a” or “an” introduces a new reference to some thing that has not appeared before or its identity cannot be inferred from the context.
Determiners and adjectives • Demonstratives • “this” or “that” • Adjectives • Describe properties of nouns • ex: a red rose, this long journey, many intelligent children, a very trendy magazine. • The above is also referred to as: attributive or adnominal. • Predicative form of adjective (appearing in the object place of a sentence) • Ex. The rose is red. The journey will be long.
Agreement • Agreement, here refers to congruence in gender, case and number between the determiner, adjective and the noun. Many languages, this can be quite complex.
Adjectives and Brown tags • Positive – the basic form of an adjective [JJ] • Ex. Rich, trendy, intelligent • Comparative [JJR] • Ex. Richer, trendier • Superlative [JJT] • Ex. Richest, trendiest • Semantically superlative adjectives [JJS] • Ex. Chief, main and top • Numbers – are subclasses of adjectives • Cardinals [CD] • Ex. One, two, and 6,000,000 • Ordinals [OD] • Ex. First, second, tenth • Periphrastic forms - forms made by using auxiliary words • Ex. More intelligent, most intelligent
Brown tags for determiners, quantifiers • Determiners • Articles [AT] • Singular determiners [DT] • This, that • Plural determiners [DTS] • These, those • Determiners that can be both singular or plural [DTI] • Some, any • Double conjunction determiners [DTX] • Either, neither • Quantifiers • Words that express ideas like “all”, “many”, “some” • Pre-quantifier [ABN] • All, many • Nominal pronoun [PN] • One, something, anything, something • Interrogative pronouns • [WDT] – wh-determiner – what, which • [WP$] – possesive wh-pronoun: whose • [WPO] – objective wh-pronoun: whom, which, that • [WPS] – nominative wh-pronoun: who, which, that
Phrase Structure • Noun phrases [NP] • Noun is the head of the noun phrase • Prepositional phrases [PP] • Headed by preposition and contain a NP complement • Verb phrases [VP] • Headed by a verb • Ex. Getting to school on time was a struggle. • Adjective phrases [AP] • She is very sure of herself • He seemed a man who was quite certain to succeed.
Phrase Structure Grammars • Syntactic analysis allows us to infer the meaning – meaning completely different in the following two sentences that use the same words • Mary gave Peter a book • Peter gave Mary a book • Some languages the order of the words does not matter – free word order language
Non-local and long-distance dependencies • Subject-verb agreement • The women who found the wallet were given a reward. • Long-distance relationship • Which book should Peter buy? • These dependencies impact statistical NLP approaches
Dependency: Arguments and adjuncts • Dependency • Concept of dependents • “Sue watched the man at the next table” • Sue and man are dependent on watched. • The PP “at the next table” is dependent of man. It modifies man. • The two phrases can be viewed as “arguments” of the verb “watched”. • Semantic roles • Agent of an action is the person or thing doing the action [also viewed as subject] • Patient – is the person or thing that is being acted on [also viewed as the object]
Active & Passive voice • Example • Children eat candy. • Candy is eaten by children
Sub categorization Frame The set of arguments that a verb can appear with is referred to as sub categorization frame.
X’ Theory • N’ – “N bar nodes” • http://en.wikipedia.org/wiki/X-bar_theory
Garden Paths • Parsing the following sentence • The horse raced past the barn fell. • Garden path parse is the phenomenon by which a parse that is generated from “the horse raced past the barn” will have to be abandoned to accommodate “fell”.
Ungrammatical constructs • Parsing may fail or can get multiple parses due to ungrammatical constructs • Slept children the • Some sentences may be grammatically correct but meaningless • Colorless green ideas sleep furiously. • The cat barked.
Semantics and Pragmatics Lexical Semantics: study of how meanings of individual words are combined into the meaning of sentences. Hypernymy vs Hyponymy animal is a hypernym of cat cat is a hyponym of animal Antonym – words with opposite meanings Meronymy – part belonging to a whole tire is a meronym of a car Holonym – whole corresponding to a part Synonyms – words with similar meanings Homonyms – words that are spelled the same but have different meanings bank – river bank; bank – a financial institution Senses Polyseme – if the different senses (meanings) of the word are related. Example “branch” could mean part of a tree; could mean dependant part of an organization. Ambiguity – lexical ambiguity refers to both homonymy and polyseme Homophony – homonyms that are also pronounced the same. “bass” for example could mean a fish or low pitched sound – and is NOT a homophone.