1 / 47

Shallow Processing: Summary

Shallow Processing: Summary. Shallow Processing Techniques for NLP Ling570 December 7, 2011. Roadmap. Looking back: Course units Tools and Techniques Looking forward Upcoming courses. Units #0,#1. Unit #0 (0.5 weeks): HW #1 Introduction to NLP & shallow processing Tokenization.

krysta
Télécharger la présentation

Shallow Processing: Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shallow Processing:Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011

  2. Roadmap • Looking back: • Course units • Tools and Techniques • Looking forward • Upcoming courses

  3. Units #0,#1 • Unit #0 (0.5 weeks): • HW #1 • Introduction to NLP & shallow processing • Tokenization

  4. Units #0,#1 • Unit #0 (0.5 weeks): • HW #1 • Introduction to NLP & shallow processing • Tokenization • Unit #1 : • HW #2, #3 • Formal Languages and Automata (2.25 weeks) • Formal languages • Finite-state Automata • Finite-state Transducers • Morphological analysis

  5. Unit #2 • Unit #2 (3.25 weeks): • HW #4, #5, #6, #7 • NgramLanguage Models and HMMs • NgramLanguage Models and Smoothing • Part-of-speech (POS) tagging: • Ngram • Hidden Markov Models

  6. Units #3, #4 • Unit #3: Classification (3 weeks) • HW #8, #9 • Intro to classification & Mallets • POS tagging with classifiers • Chunking • Named Entity (NE) recognition

  7. Units #3, #4 • Unit #3: Classification (3 weeks) • HW #8, #9 • Intro to classification & Mallets • POS tagging with classifiers • Chunking • Named Entity (NE) recognition • Unit #4: Selected Topics (1.5 weeks) • HW #10 • Clustering • Information Extraction • Summary

  8. Techniques & Tools

  9. Main Techniques • Probability: • Chain rule:

  10. Main Techniques • Probability: • Chain rule: • Bayes’ Rule:

  11. Main Techniques • Formal languages: • Chomsky hierarchy

  12. Main Techniques • Formal languages: • Chomsky hierarchy • Regular languages, regular expressions, regular grammars, finite state automata

  13. Main Techniques • Formal languages: • Chomsky hierarchy • Regular languages, regular expressions, regular grammars, finite state automata • Regular relations, finite state transducers

  14. Main Techniques • Formal languages: • Chomsky hierarchy • Regular languages, regular expressions, regular grammars, finite state automata • Regular relations, finite state transducers • Finite state morphology: two-level morphological analysis

  15. Main Techniques • Formal languages: • Chomsky hierarchy • Regular languages, regular expressions, regular grammars, finite state automata • Regular relations, finite state transducers • Finite state morphology: two-level morphological analysis • Cascades of finite state transducers

  16. Main techniques • Language Modeling and Hidden Markov Models

  17. Main techniques • Language Modeling and Hidden Markov Models • N-gram language models • Maximum likelihood estimation

  18. Main techniques • Language Modeling and Hidden Markov Models • N-gram language models • Maximum likelihood estimation • Smoothing • Laplace, Good-Turing, Backoff, Interpolation

  19. Main techniques • Language Modeling and Hidden Markov Models • N-gram language models • Maximum likelihood estimation • Smoothing • Laplace, Good-Turing, Backoff, Interpolation • Hidden Markov Models • Markov assumptions • Forward algorithm & Viterbi decoding

  20. Main Techniques • Classification for NLP:

  21. Main Techniques • Classification for NLP: • Modeling NLP tasks as classification problems

  22. Main Techniques • Classification for NLP: • Modeling NLP tasks as classification problems • Developing feature representations for instances

  23. Main Techniques • Classification for NLP: • Modeling NLP tasks as classification problems • Developing feature representations for instances • Sequence labeling tasks and algorithms

  24. Main Techniques • Classification for NLP: • Modeling NLP tasks as classification problems • Developing feature representations for instances • Sequence labeling tasks and algorithms • Beam search

  25. Tools Developed • English tokenizer: HW#1 • FSA & FST acceptors: HW#2,#3 • FST morphological analyzer: HW#3 • N-gram language models with smoothing: HW#4,#5 • Authorship identification system: HW#5 • Hidden Markov Model: Training & Decoding: HW#6,7 • HMM POS Tagger • Classification-based text categorization, POS tagging • HW#8,#9 • Unsupervised POS tagger: HW#10

  26. Corpora & Systems • Data:

  27. Corpora & Systems • Data: • Penn Treebank • Wall Street Journal • Air Travel Information System (ATIS)

  28. Corpora & Systems • Data: • Penn Treebank • Wall Street Journal • Air Travel Information System (ATIS) • Project Gutenberg • Federalist Papers, Jane Austen novels • Systems:

  29. Corpora & Systems • Data: • Penn Treebank • Wall Street Journal • Air Travel Information System (ATIS) • Project Gutenberg • Federalist Papers, Jane Austen novels • Systems: • CARMEL Finite State Toolkit • Mallet Machine Learning Toolkit

  30. Looking Forward

  31. Winter Courses • Ling571: Deep Processing Techniques for NLP • Parsing, Semantics (Lambda Calculus), Generation

  32. Winter Courses • Ling571: Deep Processing Techniques for NLP • Parsing, Semantics (Lambda Calculus), Generation • Ling572: Advanced Statistical Methods in NLP • Roughly, machine learning for CompLing • Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,…

  33. Winter Courses • Ling571: Deep Processing Techniques for NLP • Parsing, Semantics (Lambda Calculus), Generation • Ling572: Advanced Statistical Methods in NLP • Roughly, machine learning for CompLing • Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,… • Ling567: Knowledge Engineering for Deep NLP • HPSG and MRS for novel languages

  34. Winter Courses • Ling571: Deep Processing Techniques for NLP • Parsing, Semantics (Lambda Calculus), Generation • Ling572: Advanced Statistical Methods in NLP • Roughly, machine learning for CompLing • Decision Trees, Naïve Bayes, MaxEnt, SVM, CRF,… • Ling567: Knowledge Engineering for Deep NLP • HPSG and MRS for novel languages • Ling575: Spoken Dialog Systems • Design, analysis, and implementation of SDS

  35. Tentative Outline for Ling572 • Unit #0 (0.5 weeks): Basics • Introduction • Feature representations • Classification review

  36. Tentative Outline for Ling572 • Unit #0 (0.5 weeks): Basics • Introduction • Feature representations • Classification review • Unit #1 (3 weeks): Classic Machine Learning • K Nearest Neighbors • Decision Trees • Naïve Bayes

  37. Tentative Outline for Ling572 • Unit #3: (4 weeks): Discriminative Classifiers • Feature Selection • Maximum Entropy Models • Support Vectors Machines

  38. Tentative Outline for Ling572 • Unit #3: (4 weeks): Discriminative Classifiers • Feature Selection • Maximum Entropy Models • Support Vectors Machines • Unit #4: (1.5 weeks): Sequence Learning • Conditional Random Fields • Transformation Based Learning

  39. Tentative Outline for Ling572 • Unit #3: (4 weeks): Discriminative Classifiers • Feature Selection • Maximum Entropy Models • Support Vectors Machines • Unit #4: (1.5 weeks): Sequence Learning • Conditional Random Fields • Transformation Based Learning • Unit #5: (1 week): Other Topics • Semi-supervised learning,…

  40. Ling572 Information • No required textbook: • Online readings and articles

  41. Ling572 Information • No required textbook: • Online readings and articles • More math/stat content than 570 • Probability, Information Theory, Optimization

  42. Ling572 Information • No required textbook: • Online readings and articles • More math/stat content than 570 • Probability, Information Theory, Optimization • Please try to register at least 2 weeks in advance

  43. Beyond Ling572 • Machine learning: • Graphical models • Bayesian approaches • Online learning • Reinforcement learning • ….

  44. Beyond Ling572 • Machine learning: • Graphical models • Bayesian approaches • Online learning • Reinforcement learning • …. • Applications: • Information Retrieval • Question Answering • Generation • Machine translation • ….

  45. Ling 575: Spoken Dialog Systems • Design, analysis, and implementation of SDS • Will be offered online • Please make sure you have a microphone • Arrive early to test • Loew 202: T: 3:30-5:50

  46. Notes • Grades should be submitted by 12/16 • Any issues (errors) with grades in Gradebook, please email by 12/14

  47. Notes • Grades should be submitted by 12/16 • Any issues (errors) with grades in Gradebook, please email by 12/14 • Graduation Planning: • Students in Group 1 (planning to graduate ~8/12) • Due dates in early-mid January • 1st thesis proposal draft • 1st pre-internship report

More Related