1 / 26

Introduction to Natural Language Processing (NLP)

Introduction to Natural Language Processing (NLP). Dekang Lin Department of Computing Science University of Alberta lindek@cs.ualberta.ca. Outline. What is NLP Applications Challenges Linguistics Issues Part of Speech Tagging. What is Natural Language Processing?.

kasa
Télécharger la présentation

Introduction to Natural Language Processing (NLP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Natural Language Processing (NLP) Dekang Lin Department of Computing Science University of Alberta lindek@cs.ualberta.ca

  2. Outline • What is NLP • Applications • Challenges • Linguistics Issues • Part of Speech Tagging

  3. What is Natural Language Processing? • Natural Language Processing • Process information contained in natural language text. • Also known as Computational Linguistics • Can machines understand human language? • Define ‘understand’ • Understanding is the ultimate goal. However, one doesn’t need to fully understand to be useful.

  4. Why Study NLP? • A hallmark of human intelligence. • Text is the largest repository of human knowledge and is growing quickly. • emails, news articles, web pages, IRC, scientific articles, insurance claims, customer complaint letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, …… • Are we reading any faster than before?

  5. NLP Applications • Question answering • Who is the first Taiwanese president? • Text Categorization/Routing • e.g., customer e-mails. • Text Mining • Find everything that interacts with BRCA1. • Machine (Assisted) Translation • Language Teaching/Learning • Usage checking • Spelling correction • Is that just dictionary lookup?

  6. Challenges in NLP: Ambiguity • Words or phrases can often be understood in multiple ways. • Teacher Strikes Idle Kids • Killer Sentenced to Die for Second Time in 10 Years • They denied the petition for his release that was signed by over 10,000 people. • child abuse expert/child computer expert • Who does Mary love?

  7. Probabilistic/Statistical Resolution of Ambiguities • When there are ambiguities, choose the interpretation with the highest probability. • Example: how many times peoples say • “Mary loves …” • “the Mary love” • Which interpretation has the highest probability?

  8. Challenges in NLP: Variations • The same meaning can be expressed in different ways • Who wrote “The Language Instinct”? • Steven Pinker, a MIT professor and author of “The Language Instinct”, ……

  9. Linguistic Issues • Morphology • Internal structure of words • Syntax • Internal structure of sentences • Semantics • How to interpret the meanings of words, phrases and sentences.

  10. Morphology • Morphology is concerned with the internal make-up of words • The fearsome cats attacked the foolish dog • The fear-some cat-s attack-ed the fool-ish dog • Inflectional morphology • Does not change the grammatical category of words: cats/cat-s, attacked/attack-ed • Derivational morphology • May involve changes to grammatical categories: fearsome/fear-some, foolish/fool-ish

  11. Morphology Is not as Easy as It May Seem to be • Examples from Woods et. al. 2000 • delegate (de + leg + ate) take the legs from • caress (car + ess) female car • cashier (cashy + er) more wealthy • lacerate (lace + rate) speed of tatting • ratify (rat + ify) infest with rodents • infantry (infant + ry) childish behavior

  12. A Turkish Example [Oflazer & Guzey 1994] • uygarlastiramayabileceklerimizdenmissinizcesine • urgar/civilized las/BECOME tir/CAUS ama/NEG yabil/POT ecek/FUT ler/3PL imiz/POSS-1SG den/ABL mis/NARR siniz/2PL cesine/AS-IF • an adverb meaning roughly “(behaving) as if you were one of those whom we might not be able to civilize.”

  13. Sentence Structures • Sentences have structures and are made up of constituents. • The constituents are phrases. • A phrase consists of a head and modifiers. • The categorial type of the head determines the categorial type of the phrase (e.g., a phrase headed by a noun is a noun phrase).

  14. Parsing • Analyze the structure of a sentence S VP NP PP NP NP D N V D N P D N The student put the book on the table

  15. S S VP VP NP NP NP NP N N V N N V A N Teacher strikes idle kids Teacher strikes idle kids

  16. Syntax • Syntax is the study of the regularities and constraints of word order and phrase structure • How words are organized into phrases • How phrases are combined into larger phrases (including sentences).

  17. Phrase Structures • Noun phrases • A noun phrase consists of a head noun and a set of modifiers. • The meaning of the noun phrase is largely determined by the noun. • Verb phrases • A verb phrase consists of a head verb and a set of modifiers • the head verb denotes the action/activity/state

  18. Part of Speech • Syntactic categories that words belong to • N, V, Adj/Adv, Prep, Aux, • Open/Closed class, lexical/functional categories • Also known as: grammatical categories, syntactic tags, POS tags, word classes, …

  19. POS Examples Open Class N noun baby, toy V verb see, kiss ADJ adjective tall, grateful, alleged ADV adverb quickly, frankly, ... P preposition in, on, near DET determiner the, a, that WhPron wh-pronoun who, what, which, … COORD coordinator and, or

  20. Substitution Test • Two words belong to the same category if replacing one with another does not change the grammaticality of a sentence. • The _____ is angry. • The ____ dog is angry. • Fifi ____ . • Fifi ____ the book.

  21. POS Tags • There is no standard set of POS tags • Some use coarse classes: e.g., N • Others prefer finer distinctions (e.g., Penn Treebank): • PRP: personal pronouns (you, me, she, he, them, him, her, …) • PRP$: possessive pronouns (my, our, her, his, …) • NN: singular common nouns (sky, door, theorem, …) • NNS: plural common nouns (doors, theorems, women, …) • NNP: singular proper names (Fifi, IBM, Canada, …) • NNPS: plural proper names (Americas, Carolinas, …)

  22. PRP PRP$

  23. Part of Speech Tagging • Words often have more than one POS: back • The back door = JJ • On my back = NN • Win the voters back = RB • Promised to back the bill = VB • The POS tagging problem is to determine the POS tag for a particular instance of a word.

  24. POS Ambiguity (in the Brown Corpus) Unambiguous (1 tag): 35,340 Ambiguous (2-7 tags): 4,100 (Derose, 1988)

  25. POS Tagging with HMM

More Related