Natural Language Processing COMPSCI 423/723

Natural Language ProcessingCOMPSCI 423/723 Rohit Kate

POS Tagging and HMMs Some of the slides have been adapted from Raymond Mooney’s NLP course at UT Austin.

Parts of Speech (POS) • Linguists group words of a language into categories which occur in similar places in a sentence and have similar type of meaning: e.g. nouns, verbs, adjectives; these are called parts of speech (POS) • A basic test to see if words belong to the same category or not is the substitution test • This is a good [dog/chair/pencil]. • This is a [good/bad/green/tall] chair.

Parts of Speech(POS) Tagging • Given a sentence, tag each word with its POS tag • Lowest level of syntactic analysis John saw the saw and decided to take it to the table. NOUN VERB DT NOUN CONJ VERB TO VERB PRP PREP DT NOUN

Why do POS tagging? • Useful for subsequent syntactic analysis: Which are noun/verb phrases? Where are the prepositional phrases connected? • Identifying nouns can help information retrieval as they are the important words usually searched for • Can help in identifying names, places etc. for information extraction • Can help word sense disambiguation and subsequent semantic analysis

POS Tagging is Not Trivial • The same word can have different tags in different sentences or even in the same sentence • “Like” can be a verb or a preposition • I like/VERB candy. • Time flies like/PREPOSITION an arrow. • “Around” can be a preposition, particle, or adverb • I bought it at the shop around/PREPOSITION the corner. • I never got around/PARTICLE to getting a car. • A new Prius costs around/ADVERB $25K. • “Saw” could be a verb or a noun • John saw/VERB the saw/NOUN

POS Tagging is Not Trivial • Degree of ambiguity in English (based on Brown corpus) • 11.5% of words are ambiguous • Rest of them are used with only one POS tags • 40% of word instances are ambiguous • Some frequently used words are ambiguous • New words or names may be encountered or get invented • Zizappi/NOUN googled/VERB geekishly/ADVERB. • Can’t simply use a dictionary of words with their usual POS tags

Deciding a POS Tagset • How fine-grained should be the distinction between POS tags? • Nouns: common, plurals, adverbial (east, west, Monday, Tuesday) • Pronouns: personal(I,me), possessive personal (my, your), second possessive personal (mine, yours) • Verbs: past tense, past participle

English POS Tagsets and Corpora Original Brown corpus used a large set of 87 POS tags. Most common in NLP today is the Penn Treebank set of 45 tags. Tagset used in these slides. Reduced from the Brown set for use in the context of a parsed corpus (i.e. treebank). The C5 tagset used for the British National Corpus (BNC) has 61 tags. They also come with tagging conventions cotton/(NOUN or ADJECTIVE) sweater Chinese/(NOUN or ADJECTIVE) cooking 9

English Parts of Speech (Penn Treebank) Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Pronoun Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3rd person singular present tense (VBP): eat 3rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat) 10

English Parts of Speech (Penn Treebank) Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest Preposition (IN): on, in, by, to, with Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that Coordinating Conjunction (CC): and, but, or Particle (RP): off (took off), up (put up) 11

Closed vs. Open Class • Closed class categories are composed of a small, fixed set of grammatical function words for a given language • Pronouns, Prepositions, Modals, Determiners, Particles, Conjunctions • Open class categories have large number of words and new ones are easily invented • Nouns (Googler, textlish), Verbs (Google), Adjectives (geeky), Abverb (chompingly) 12

POS Tagging Process Usually assume a separate initial tokenization process that separates and/or disambiguates punctuation, including detecting sentence boundaries Average POS tagging disagreement amongst expert human judges for the Penn treebank was 3.5% Baseline: Picking the most frequent tag for each specific word type gives about 90% accuracy 93.7% if use model for unknown words for Penn Treebank tagset 13

POS Tagging Approaches Rule-Based Learning-Based 14

Rule Based Approaches • Early approaches from 60s and 70s • Use a dictionary to assign each word in a sentence a list of potential POS tags • Use a few thousand hand-crafted rules based on linguistic knowledge to narrow down the list to one POS tag for each word Example: If the next word is adjective and the previous word is not a verb then eliminate adverb tag…

Learning Based Approaches • Trained on human annotated corpora like the Penn Treebank • Statistical models: Hidden Markov Model (HMM), Conditional Random Field (CRF) • Rule learning: Transformation Based Learning (TBL) • Generally, learning-based approaches have been found to be more effective overall, taking into account the total amount of human expertise and effort involved

Transformation-Based Tagging • Developed by Eric Brill (1995) • Uses rules but they are learned from annotated corpus • First label each word in the sentence with the most likely POS tag for that word • Transform the tags by applying the transformation rules in order • Rule templates: • Change tag A to B when: • The following word is tagged Z • The word two before is tagged Z • One of the preceding words is tagged Z • The preceding word is tagged Z and the following word is tagged W • …

Transformation-Based Tagging • How to learn the rules (i.e. instantiate the rule templates)? • Efficiently try all possible instantiations of the templates and include the one that corrects most tags when tried on the training corpus • Iterate the above step till transformations don’t help much • Performs competetively

Sequence Labeling Problem Many NLP problems like POS tagging can viewed as sequence labeling Each token in a sequence is assigned a label Labels of tokens are dependent on the labels of other tokens in the sequence, particularly their neighbors John saw the saw and decided to take it to the table. NNP VBD DT NN CC VBD TO VB PRP IN DT NN 19

Information Extraction Identify phrases in language that refer to specific types of entities and relations in text. Named entity recognition is task of identifying names of people, places, organizations, etc. in text. peopleorganizationsplaces Michael Dellis the CEO of Dell Computer Corporationand lives inAustin Texas. Extract pieces of information relevant to a specific application, e.g. used car ads: makemodelyearmileageprice For sale, 2002ToyotaPrius, 20,000 mi, $15K or best offer. Available starting July 30, 2006. 20

Semantic Role Labeling For each clause, determine the semantic role played by each noun phrase that is an argument to the verb agent patientsourcedestinationinstrument JohndroveMaryfromAustintoDallasinhis Toyota Prius. The hammerbrokethe window. Johnbrokethe window. Also referred to a “case role analysis,” “thematic analysis,” and “shallow semantic parsing” 21

Bioinformatics Sequence labeling also valuable in labeling genetic sequences in genome analysis extron intron AGCTAACGTTCGATACGGATTACAGCCT 22

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NNP 24

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VBD 25

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier DT 26

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NN 27

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier CC 28

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VBD 29

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier TO 30

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VB 31

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier PRP 32

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier IN 33

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier DT 34

Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NN 35

Sequence Labeling as ClassificationUsing Outputs as Inputs Better input features are usually the categories of the surrounding tokens, but these are not available yet Can use category of either the preceding or succeeding tokens by going forward or back and using previous output 36

Forward Classification John saw the saw and decided to take it to the table. classifier NNP 37

Forward Classification NNP John saw the saw and decided to take it to the table. classifier VBD 38

Forward Classification NNP VBD John saw the saw and decided to take it to the table. classifier DT 39

Forward Classification NNP VBD DT John saw the saw and decided to take it to the table. classifier NN 40

Forward Classification NNP VBD DT NN John saw the saw and decided to take it to the table. classifier CC 41

Forward Classification NNP VBD DT NN CC John saw the saw and decided to take it to the table. classifier VBD 42

Forward Classification NNP VBD DT NN CC VBD John saw the saw and decided to take it to the table. classifier TO 43

Forward Classification NNP VBD DT NN CC VBD TO John saw the saw and decided to take it to the table. classifier VB 44

Forward Classification NNP VBD DT NN CC VBD TO VB John saw the saw and decided to take it to the table. classifier PRP 45

Forward Classification NNP VBD DT NN CC VBD TO VB PRP John saw the saw and decided to take it to the table. classifier IN 46

Forward Classification NNP VBD DT NN CC VBD TO VB PRP IN John saw the saw and decided to take it to the table. classifier DT 47

Forward Classification NNP VBD DT NN CC VBD TO VB PRP IN DT John saw the saw and decided to take it to the table. classifier NN 48

Backward Classification Disambiguating “to” in this case would be even easier backward. John saw the saw and decided to take it to the table. classifier NN 49

Backward Classification Disambiguating “to” in this case would be even easier backward. NN John saw the saw and decided to take it to the table. classifier DT 50

Natural Language Processing COMPSCI 423/723

Natural Language Processing COMPSCI 423/723

Presentation Transcript

CSC 9010- Natural Language Processing

Natural Language Processing August 23, 2007

Natural Language Processing and Speech Enabled Applications

CS 388: Natural Language Processing: N-Gram Language Models

Natural Language Processing

Natural Language Processing (NLP)

Introduction to Natural Language Processing

Natural Language Processing and Machine Learning

“ Open the pod bay doors, HAL.” “I’m sorry Dave, I’m afraid I can’t do that.”

Introduction to Natural Language Processing

Natural Language Processing

Natural Language Processing (NLP)

NLTK: The Natural Language Toolkit

CS4705 Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing COMPSCI 423/723 Rohit Kate

Genetic Programming Applied to Natural Language Processing

Natural Language Processing for Human-Computer Interaction

CS626-460: Language Technology for the Web/Natural Language Processing

Natural Language Processing COMPSCI 423/723 Rohit Kate

74.406 Natural Language Processing