1 / 23

Introduction to Natural Language Processing (NLP)

Introduction to Natural Language Processing (NLP). Dekang Lin Department of Computing Science University of Alberta lindek@cs.ualberta.ca. Outline. What is NLP? Applications Challenges Linguistics Issues Course Overview. Textbook.

jory
Télécharger la présentation

Introduction to Natural Language Processing (NLP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Natural Language Processing (NLP) Dekang Lin Department of Computing Science University of Alberta lindek@cs.ualberta.ca

  2. Outline • What is NLP? • Applications • Challenges • Linguistics Issues • Course Overview

  3. Textbook • Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000. • Note errata available on website; check before reading each chapter please

  4. What is Natural Language Processing? • Natural Language Processing • Process information contained in natural language text. • Also known as Computational Linguistics (CL), Human Language Technology (HLT), Natural Language Engineering (NLE) • Can machines understand human language? • Define ‘understand’ • Understanding is the ultimate goal. However, one doesn’t need to fully understand to be useful.

  5. Why Study NLP? • A hallmark of human intelligence. • Text is the largest repository of human knowledge and is growing quickly. • emails, news articles, web pages, IM, scientific articles, insurance claims, customer complaint letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, …… • Are we reading any faster than before?

  6. NLP Applications • Question answering • Who is the first Taiwanese president? • Text Categorization/Routing • e.g., customer e-mails. • Text Mining • Find everything that interacts with BRCA1. • Machine (Assisted) Translation • Language Teaching/Learning • Usage checking • Spelling correction • Is that just dictionary lookup?

  7. Challenges in NLP: Ambiguity • Words or phrases can often be understood in multiple ways. • Teacher Strikes Idle Kids • Killer Sentenced to Die for Second Time in 10 Years • They denied the petition for his release that was signed by over 10,000 people. • child abuse expert/child computer expert • Who does Mary love? (three-way ambiguous)

  8. Probabilistic/Statistical Resolution of Ambiguities • When there are ambiguities, choose the interpretation with the highest probability. • Example: how many times peoples say • “Mary loves …” • “the Mary love” • Which interpretation has the highest probability?

  9. Challenges in NLP: Variations • Syntactic Variations • I was surprised that Kim lost • It surprised me that Kim lost • That Kim lost surprised me. • The same meaning can be expressed in different ways • Who wrote “The Language Instinct”? • Steven Pinker, a MIT professor and author of “The Language Instinct”, ……

  10. Subareas of Linguistics • Morphology: • structures and patterns in words • analyzes how words are formed from minimal units of meaning, or morphemes, e.g., dogs= dog+s. • Syntax: • structures and patterns in phrases • how phrases are formed by smaller phrases and words

  11. Subareas of Linguistics • Semantics: the meaning of a word or phrase within a sentence • How to represent meaning? • Semantic network? Logic? Policy? • How to construct meaning representation? • Is meaning compositional? • Pragmatics: structures and patterns in discourses • Co-reference resolution • Jane races Mary on weekends. She often beats her. • Implicatures: • How many times do you go skating each week? • Speech acts: • Do you know the time?

  12. Morphology • Morphology is concerned with the internal make-up of words • Input: The fearsome cats attacked the foolish dog • Output: The fear-some cat-s attack-ed the fool-ish dog • Inflectional morphology • Does not change the grammatical category of words: cats/cat-s, attacked/attack-ed • Derivational morphology • May involve changes to grammatical categories: fearsome/fear-some, foolish/fool-ish

  13. Morphology Is not as Easy as It May Seem to be • Examples from Woods et. al. 2000 • delegate (de + leg + ate) take the legs from • caress (car + ess) female car • cashier (cashy + er) more wealthy • lacerate (lace + rate) speed of tatting • ratify (rat + ify) infest with rodents • infantry (infant + ry) childish behavior

  14. A Turkish Example [Oflazer & Guzey 1994] • uygarlastiramayabileceklerimizdenmissinizcesine • urgar/civilized las/BECOME tir/CAUS ama/NEG yabil/POT ecek/FUT ler/3PL imiz/POSS-1SG den/ABL mis/NARR siniz/2PL cesine/AS-IF • an adverb meaning roughly “(behaving) as if you were one of those whom we might not be able to civilize.”

  15. Why not just Use a Dictionary? • How many words are there in a language? • English: OED 400K entries • Turkish: 600x106 forms • Finnish: 107 forms • New words are being invented all the time • e-mail • IM

  16. Syntax is about Sentence Structures • Sentences have structures and are made up of constituents. • The constituents are phrases. • A phrase consists of a head and modifiers. • The category of the head determines the category of the phrase • e.g., a phrase headed by a noun is a noun phrase

  17. S VP NP PP NP NP D N V D N P D N The student put the book on the table Parsing • Analyze the structure of a sentence

  18. S S VP VP NP NP NP NP N N V N N V A N Teacher strikes idle kids Teacher strikes idle kids

  19. Syntax • Syntax is the study of the regularities and constraints of word order and phrase structure • How words are organized into phrases • How phrases are combined into larger phrases (including sentences).

  20. Course Overview: Background Theories • Linguistics • Syntax • Binding theory • Probability and Information Theory • Markov model • Bayesian network • EM (expectation/estimation maximization)

  21. Course Overview: Enabling Technologies • Stemming • Reduce detects, detected, detecting, detect, to the same form. • POS Tagging • Determine for each word whether it is a noun, adjective, verb, ….. • Parsing • sentence  parse tree • Word Sense Disambiguation • orange juice vs. orange coat • Learning from text

  22. Course Overview: Applications • Question Answering • Machine Translation • Text Mining/Information Extraction

More Related