1 / 23

Overview of Statistical NLP

Overview of Statistical NLP. IR Group Meeting March 7, 2006. Outline. Some basic/important NLP problems Topics that recently attracted many interests NLP research groups Discussion on the relation between NLP and IR. Levels of Analysis in NLP (from Dan Roth’s CS598). Morphology

yvonne
Télécharger la présentation

Overview of Statistical NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Statistical NLP IR Group Meeting March 7, 2006

  2. Outline • Some basic/important NLP problems • Topics that recently attracted many interests • NLP research groups • Discussion on the relation between NLP and IR IR Group Meeting -- NLP

  3. Levels of Analysis in NLP(from Dan Roth’s CS598) • Morphology • How words are constructed • Syntax • Structural relation between words • Semantics • The meaning of words and of combinations of words • Pragmatics. • How is a sentence used? What’s its purpose? • Discourse (sometimes distinguished as a subfield of Pragmatics) • Relationships between sentences; global context. IR Group Meeting -- NLP

  4. Some NLP Problems • N-gram Models • Word Sense Disambiguation • Lexical Acquisition • (POS) Tagging • (Syntactic) Parsing • Semantic Role Labeling (Semantic Parsing) • Named Entity Recognition • Textual Entailment • … IR Group Meeting -- NLP

  5. N-gram Models • The task: to estimate P(wn|w1,…,wn-1) • Approaches: • Maximum likelihood estimation • Various smoothing methods • Applications: • Automatic speech recognition • Spelling correction • Handwriting recognition • Statistical machine translation IR Group Meeting -- NLP

  6. Word Sense Disambiguation (WSD) • The task: to determine which of the senses of an ambiguous word is involved in a particular use of the word • Approaches: • Supervised: • Log-linear models • Information-theoretic • Memory-based learning (kNN) • Dictionary-based: • Sense definitions • Thesauri • Translations in a second language • Unsupervised: • Clustering using EM algorithm IR Group Meeting -- NLP

  7. Word Sense Disambiguation (WSD) • Accuracy: • Word-specific • Easy words: > 90% • Hard words: 50~70% • Applications: • Statistical machine translation • Information retrieval IR Group Meeting -- NLP

  8. Lexical Acquisition • The task: to develop algorithms and statistical techniques for filling the holes in existing machine-learnable dictionaries by looking at the occurrence patterns of words in large text corpora • Examples: • Verb subcategorization • Propositional phrase attachment disambiguation • Selectional preferences • Semantic similarity IR Group Meeting -- NLP

  9. Semantic Similarity • The task: to acquire a relative measure of similarity between two words • Approaches: • Vector space measures (document space, word space, modifier space, etc.) • Probabilistic measures (KL-divergence, etc.) • Applications: • Information retrieval (query expansion) IR Group Meeting -- NLP

  10. POS Tagging • The task: labeling each word in a sentence with its appropriate part of speech • Major approaches • HMM • Transformation-based • Advantages: speed and storage • Other approaches • Neural networks, decision trees, memory-based learning, maximum entropy models IR Group Meeting -- NLP

  11. POS Tagging • Accuracy: • 95~97% • Achieved only when the application text and the training text are from the similar source • Applications • For higher-level NLP tasks: partial parsing, parsing, NER, etc. • “…the best lexicalized probabilistic parsers are now good enough that they perform better starting with untagged text and doing the tagging themselves, rather than using a tagger as preprocessor.” (Charniak 1997) IR Group Meeting -- NLP

  12. (Syntactic) Parsing • The task: to find the most likely syntactic parse tree of a sentence • Approaches: • Probabilistic context free grammar (PCFG) • Supervised • Unsupervised • Lexicalized models • Dependency-based models IR Group Meeting -- NLP

  13. (Syntactic) Parsing • Accuracy: • Charniak 1997: Rec 0.875 Prec 0.874 • Collins 1997: Rec 0.881 Prec 0.886 • Applications: • For other NLP tasks such as semantic role labeling and relation extraction IR Group Meeting -- NLP

  14. Semantic Role Labeling • The task: to identify the predicate-argument structures in sentences • Approaches: • Supervised learning • Accuracy: • Best ~70% (CoNLL 04 shared task) • Applications: • Information extraction • Question answering IR Group Meeting -- NLP

  15. Textual Entailment • The task: given two text fragments, to recognize whether the meaning of one text is entailed (can be inferred) from the other text • Approaches: • Word overlap • Statistical lexical relations • Syntactic matching • Logic inference • Accuracy: • ~0.56, best ~0.60 (PASCAL Challenge 05) • Applications: • Question answering • Multi-document summarization IR Group Meeting -- NLP

  16. Tools • Brill Tagger • Charniak Parser • Collins Parser • MiniPar • Semantic Parser • ASSERT Parser • CCG’s demo IR Group Meeting -- NLP

  17. Corpora • WordNet • Penn Treebank (Sample) • PropBank • FrameNet IR Group Meeting -- NLP

  18. Other Tasks • Automatic Speech Recognition • Natural Language Generation • Automatic Summarization • … IR Group Meeting -- NLP

  19. Outline • Some basic/important NLP problems • Topics that recently attracted many interests • NLP research groups • Discussion on the relation between NLP and IR IR Group Meeting -- NLP

  20. Recent topics • Unsupervised and semi-supervised approaches • Knowledge acquisition bottleneck • Semantic role labeling • Improve the performance of SRL • Use the results for other tasks • Relation extraction • WSD • Parsing • Statistical machine translation • Word alignment IR Group Meeting -- NLP

  21. Outline • Some basic/important NLP problems • Topics that recently attracted many interests • NLP research groups • Discussion on the relation between NLP and IR IR Group Meeting -- NLP

  22. NLP Research Groups • USC/ISI • Stanford • UPenn • Johns-Hopkins • UIUC • … IR Group Meeting -- NLP

  23. Outline • Some basic/important NLP problems • Topics that recently attracted many interests • NLP research groups • Discussion on the relation between NLP and IR IR Group Meeting -- NLP

More Related