320 likes | 718 Vues
Natural Language Processing. Meeting 2 — 9/4/2012 CSCE 5290 Rodney Nielsen. Natural Language Processing. We’re going to study what goes into getting computers to perform useful and interesting tasks involving human language. Natural Language Processing.
E N D
Natural Language Processing Meeting 2 — 9/4/2012 CSCE 5290 Rodney Nielsen
Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting tasks involving human language.
Natural Language Processing More specifically, it’s about the algorithms that we use to process human language, the formal basis for those algorithms, and the facts about human language that allow those algorithms to work.
Major Topics • Morphology / Words • Syntax / Structure • Semantics / Meaning • Pragmatics & Dialog / Texts, Context & Implicatures 5. Applications
How? • Exploiting regularities • Complex and trivial ways Language structure Formal models Practical applications
Topics: Techniques • Finite-state methods • Context-free methods • Probabilistic models Supervised machine learning methods
Categories of Knowledge Morphological Processing Syntactic Analysis Semantic Interpretation Context • Phonology • Morphology • Syntax • Semantics • Pragmatics • Discourse Typically mapped to separate processes Interfaces Leads to:
Ambiguity • Ambiguity is a fundamental problem in computational linguistics • Hence, resolving, or managing, ambiguity is a recurrent theme
Ambiguity • How many meanings can you find for this sentence: • I made her duck
Ambiguity • Find at least 5 meanings of this sentence: • I made her duck • I cooked waterfowl for her benefit (to eat) • I cooked waterfowl belonging to her • I created the (ceramic?) duck she owns • I caused her to quickly lower her upper body • I waved my magic wand and turned her into undifferentiated waterfowl
Ambiguity is Pervasive • I caused her to quickly lower her head or body • Lexical category: “duck” can be a noun or verb • I cooked waterfowl belonging to her. • Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun • I made the (ceramic) duck statue she owns • Lexical Semantics: “make” can mean “create” or “cook”, and about 100 other things as well
Ambiguity is Pervasive • Grammar: Make can be: • Transitive: (verb has a noun direct object) • I cooked [waterfowl belonging to her] • Ditransitive: (verb has 2 noun objects) • I made [her] (into) [undifferentiated waterfowl] • Action-transitive (verb has a direct object and another verb) • I caused [her] [to move her body]
Ambiguity is Pervasive • Phonetics! • I mate or duck • I’m eight or duck • Eye maid; her duck • Aye mate, her duck • I maid her duck • I’m aid her duck • I mate her duck • I’m ate her duck • I’m ate or duck • I mate or duck
Problem Morphological Processing Syntactic Analysis Semantic Interpretation Context • Remember our pipeline...
Really it’s this Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Morphological Processing Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation
Or is it? Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Syntactic Analysis Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Morphological Processing Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation
Dealing with Ambiguity • Four possible approaches: • Tightly coupled • Pipeline • Probabilistic • Or n-best • Don’t do anything, maybe it won’t matter • We’ll leave when the duck is ready to eat. • The duck is ready to eat now.
Models and Algorithms • Models • linguistic knowledge • Algorithms
Models • State machines • Rule-based approaches • Logical formalisms • Probabilistic models
Algorithms • Transducers • But ambiguity…
Paradigms • In particular.. • State-space search • To manage the problem of making choices during processing when we lack the information needed to make the right choice • Dynamic programming • To avoid having to redo work during the course of a state-space search • CKY, Earley, Minimum Edit Distance, Viterbi, Baum-Welch • Classifiers • Machine learning based classifiers that are trained to make decisions based on features extracted from the local context
Administrivia • Course web page: • http://www.cse.unt.edu/~nielsen/csce5290/ • Syllabus, readings, slides, assignments, announcements, etc. • E-mail • Office hours – open door • TR 12:20-12:50 • W 2:00-…
Readings • Readings: • Speech and Language Processing by Jurafsky and Martin, 2ed. Prentice-Hall 2009 • A few conference or journal papers
Grading • 5% Reading responses / questions • 30% Quiz / class participation • Question responses (20%) Bring laptops Thur 13th • Discussion (10%) • 45% Semester project • Project proposal (5%) • Project literature review (5%) • Intermediate progress (18%) • Final paper (10%) • Final presentation (7%): Tuesday Dec 11, 10:30-12:30 • 20% Significant constructive peer feedback
Projects • Thesis related • Question Answering • Robotic CSE guide • Other
Introductions • Area of specialization / primary interests
Your Questions • Uncanny valley? • How do we detect sentence boundaries? • Questions about "grep”? • "grep -i” – case insensitive • "grep -v” – inverted search • Lazy regex
Your Questions • Why does the author stress that results of a turing machine will not determine whether or not a computer will ever be intelligent or understand languages. (Is he inferring the idea of computer learning is impossible or the limitations of turing machines)? • Would there be any issues with regular expressions handling foreign characters (i.e Mandarin Chinese Symbols)? • Can or can't DFSA be converted into NFSA?
Data: She [the Borg Queen] brought me closer to humanity than I ever thought possible. And for a time, I was tempted by her offer. • Picard: How long a time? • Data: Zero point six-eight seconds, sir... For an android, that is nearly an eternity. • Star Trek: First Contact • http://www.youtube.com/watch?v=kSHytxvDDqU&feature=related 5:20
Your Questions • Regular Expression: All these symbols (., ^, $ etc...) points to be working with Perl language. Will other languages compilers recognize and process them? • How come Memory (\1 together with ()) operation is considered part of regular expression, but it cannot be realized as a finite automaton?
Your Questions • How does a Lexical Disambiguation and syntactic disambiguation technique work? • What is probabilistic parsing and speech act interpretation? • What does Hidden Markov model, Maximum Entropy Markov model and Conditional Random Fields model do? In what aspects are they different from one another?