1 / 103

Natural Language Processing COMPSCI 423/723

Natural Language Processing COMPSCI 423/723. Rohit Kate. POS Tagging and HMMs. Some of the slides have been adapted from Raymond Mooney’s NLP course at UT Austin. Parts of Speech (POS).

shirin
Télécharger la présentation

Natural Language Processing COMPSCI 423/723

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language ProcessingCOMPSCI 423/723 Rohit Kate

  2. POS Tagging and HMMs Some of the slides have been adapted from Raymond Mooney’s NLP course at UT Austin.

  3. Parts of Speech (POS) • Linguists group words of a language into categories which occur in similar places in a sentence and have similar type of meaning: e.g. nouns, verbs, adjectives; these are called parts of speech (POS) • A basic test to see if words belong to the same category or not is the substitution test • This is a good [dog/chair/pencil]. • This is a [good/bad/green/tall] chair.

  4. Parts of Speech(POS) Tagging • Given a sentence, tag each word with its POS tag • Lowest level of syntactic analysis John saw the saw and decided to take it to the table. NOUN VERB DT NOUN CONJ VERB TO VERB PRP PREP DT NOUN

  5. Why do POS tagging? • Useful for subsequent syntactic analysis: Which are noun/verb phrases? Where are the prepositional phrases connected? • Identifying nouns can help information retrieval as they are the important words usually searched for • Can help in identifying names, places etc. for information extraction • Can help word sense disambiguation and subsequent semantic analysis

  6. POS Tagging is Not Trivial • The same word can have different tags in different sentences or even in the same sentence • “Like” can be a verb or a preposition • I like/VERB candy. • Time flies like/PREPOSITION an arrow. • “Around” can be a preposition, particle, or adverb • I bought it at the shop around/PREPOSITION the corner. • I never got around/PARTICLE to getting a car. • A new Prius costs around/ADVERB $25K. • “Saw” could be a verb or a noun • John saw/VERB the saw/NOUN

  7. POS Tagging is Not Trivial • Degree of ambiguity in English (based on Brown corpus) • 11.5% of words are ambiguous • Rest of them are used with only one POS tags • 40% of word instances are ambiguous • Some frequently used words are ambiguous • New words or names may be encountered or get invented • Zizappi/NOUN googled/VERB geekishly/ADVERB. • Can’t simply use a dictionary of words with their usual POS tags

  8. Deciding a POS Tagset • How fine-grained should be the distinction between POS tags? • Nouns: common, plurals, adverbial (east, west, Monday, Tuesday) • Pronouns: personal(I,me), possessive personal (my, your), second possessive personal (mine, yours) • Verbs: past tense, past participle

  9. English POS Tagsets and Corpora Original Brown corpus used a large set of 87 POS tags. Most common in NLP today is the Penn Treebank set of 45 tags. Tagset used in these slides. Reduced from the Brown set for use in the context of a parsed corpus (i.e. treebank). The C5 tagset used for the British National Corpus (BNC) has 61 tags. They also come with tagging conventions cotton/(NOUN or ADJECTIVE) sweater Chinese/(NOUN or ADJECTIVE) cooking 9

  10. English Parts of Speech (Penn Treebank) Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Pronoun Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3rd person singular present tense (VBP): eat 3rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat) 10

  11. English Parts of Speech (Penn Treebank) Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest Preposition (IN): on, in, by, to, with Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that Coordinating Conjunction (CC): and, but, or Particle (RP): off (took off), up (put up) 11

  12. Closed vs. Open Class • Closed class categories are composed of a small, fixed set of grammatical function words for a given language • Pronouns, Prepositions, Modals, Determiners, Particles, Conjunctions • Open class categories have large number of words and new ones are easily invented • Nouns (Googler, textlish), Verbs (Google), Adjectives (geeky), Abverb (chompingly) 12

  13. POS Tagging Process Usually assume a separate initial tokenization process that separates and/or disambiguates punctuation, including detecting sentence boundaries Average POS tagging disagreement amongst expert human judges for the Penn treebank was 3.5% Baseline: Picking the most frequent tag for each specific word type gives about 90% accuracy 93.7% if use model for unknown words for Penn Treebank tagset 13

  14. POS Tagging Approaches Rule-Based Learning-Based 14

  15. Rule Based Approaches • Early approaches from 60s and 70s • Use a dictionary to assign each word in a sentence a list of potential POS tags • Use a few thousand hand-crafted rules based on linguistic knowledge to narrow down the list to one POS tag for each word Example: If the next word is adjective and the previous word is not a verb then eliminate adverb tag…

  16. Learning Based Approaches • Trained on human annotated corpora like the Penn Treebank • Statistical models: Hidden Markov Model (HMM), Conditional Random Field (CRF) • Rule learning: Transformation Based Learning (TBL) • Generally, learning-based approaches have been found to be more effective overall, taking into account the total amount of human expertise and effort involved

  17. Transformation-Based Tagging • Developed by Eric Brill (1995) • Uses rules but they are learned from annotated corpus • First label each word in the sentence with the most likely POS tag for that word • Transform the tags by applying the transformation rules in order • Rule templates: • Change tag A to B when: • The following word is tagged Z • The word two before is tagged Z • One of the preceding words is tagged Z • The preceding word is tagged Z and the following word is tagged W • …

  18. Transformation-Based Tagging • How to learn the rules (i.e. instantiate the rule templates)? • Efficiently try all possible instantiations of the templates and include the one that corrects most tags when tried on the training corpus • Iterate the above step till transformations don’t help much • Performs competetively

  19. Sequence Labeling Problem Many NLP problems like POS tagging can viewed as sequence labeling Each token in a sequence is assigned a label Labels of tokens are dependent on the labels of other tokens in the sequence, particularly their neighbors John saw the saw and decided to take it to the table. NNP VBD DT NN CC VBD TO VB PRP IN DT NN 19

  20. Information Extraction Identify phrases in language that refer to specific types of entities and relations in text. Named entity recognition is task of identifying names of people, places, organizations, etc. in text. peopleorganizationsplaces Michael Dellis the CEO of Dell Computer Corporationand lives inAustin Texas. Extract pieces of information relevant to a specific application, e.g. used car ads: makemodelyearmileageprice For sale, 2002ToyotaPrius, 20,000 mi, $15K or best offer. Available starting July 30, 2006. 20

  21. Semantic Role Labeling For each clause, determine the semantic role played by each noun phrase that is an argument to the verb agent patientsourcedestinationinstrument JohndroveMaryfromAustintoDallasinhis Toyota Prius. The hammerbrokethe window. Johnbrokethe window. Also referred to a “case role analysis,” “thematic analysis,” and “shallow semantic parsing” 21

  22. Bioinformatics Sequence labeling also valuable in labeling genetic sequences in genome analysis extron intron AGCTAACGTTCGATACGGATTACAGCCT 22

  23. Classification for Sequence Labeling Can we use classification for sequence labeling? For example, naïve Bayes? P(Y) Y Naïve Bayes P(X2|Y) P(X1|Y) P(X3|Y) … Xn X2 X1 P(POS Tag) POS Tag P(Previous word|POS Tag) P(Next word|POS Tag) Previous Word Next Word Word P(Word|POS Tag) 23

  24. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NNP 24

  25. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VBD 25

  26. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier DT 26

  27. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NN 27

  28. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier CC 28

  29. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VBD 29

  30. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier TO 30

  31. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier VB 31

  32. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier PRP 32

  33. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier IN 33

  34. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier DT 34

  35. Sequence Labeling as Classification Classify each token independently but use as input features information about the surrounding tokens (sliding window). John saw the saw and decided to take it to the table. classifier NN 35

  36. Sequence Labeling as ClassificationUsing Outputs as Inputs Better input features are usually the categories of the surrounding tokens, but these are not available yet Can use category of either the preceding or succeeding tokens by going forward or back and using previous output 36

  37. Forward Classification John saw the saw and decided to take it to the table. classifier NNP 37

  38. Forward Classification NNP John saw the saw and decided to take it to the table. classifier VBD 38

  39. Forward Classification NNP VBD John saw the saw and decided to take it to the table. classifier DT 39

  40. Forward Classification NNP VBD DT John saw the saw and decided to take it to the table. classifier NN 40

  41. Forward Classification NNP VBD DT NN John saw the saw and decided to take it to the table. classifier CC 41

  42. Forward Classification NNP VBD DT NN CC John saw the saw and decided to take it to the table. classifier VBD 42

  43. Forward Classification NNP VBD DT NN CC VBD John saw the saw and decided to take it to the table. classifier TO 43

  44. Forward Classification NNP VBD DT NN CC VBD TO John saw the saw and decided to take it to the table. classifier VB 44

  45. Forward Classification NNP VBD DT NN CC VBD TO VB John saw the saw and decided to take it to the table. classifier PRP 45

  46. Forward Classification NNP VBD DT NN CC VBD TO VB PRP John saw the saw and decided to take it to the table. classifier IN 46

  47. Forward Classification NNP VBD DT NN CC VBD TO VB PRP IN John saw the saw and decided to take it to the table. classifier DT 47

  48. Forward Classification NNP VBD DT NN CC VBD TO VB PRP IN DT John saw the saw and decided to take it to the table. classifier NN 48

  49. Backward Classification Disambiguating “to” in this case would be even easier backward. John saw the saw and decided to take it to the table. classifier NN 49

  50. Backward Classification Disambiguating “to” in this case would be even easier backward. NN John saw the saw and decided to take it to the table. classifier DT 50

More Related