1 / 17

Natural Language Toolkit(NLTK)

Natural Language Toolkit(NLTK). April Corbet. Overview. What is NLTK? NLTK Basic Functionalities Part of Speech Tagging Chunking and Trees Example: Calculating WordNet Synset Similarity Other Functionalities. What is NLTK?.

myles-beck
Télécharger la présentation

Natural Language Toolkit(NLTK)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Toolkit(NLTK) April Corbet

  2. Overview • What is NLTK? • NLTK Basic Functionalities • Part of Speech Tagging • Chunking and Trees • Example: Calculating WordNetSynset Similarity • Other Functionalities

  3. What is NLTK? • A tool consisting of a collection of libraries and programs in python that allows for customization and optimization of NLP processes • Downloading

  4. What is NLTK? • NLP tools typically use other NLP tools • Other tools include • Wordnet • Stanford Dependency Parser • Conceptnet • DBPedia • Google Mate-Tools

  5. Overview • What is NLTK? • NLTK Basic Functionalities • Part of Speech Tagging • Chunking and Trees • Other Functionalities • Works Cited

  6. NLTK Basic Functionalities • Sentence Tokenization • Word Tokenization • Wordnet, Synsets, and Synonyms • Stemming Words and Lemmas

  7. Sentence Tokenization • Basic Tokenization • Statistically Based Training Methodology • Tokenizing for Multiple Sentences • Pickle File • Tokenizing with Other Languages

  8. Word Tokenization • Basic Word Tokenizer • Penn Treebank Project • Other Types of Word Tokenizers: • PunctWordTokenizer: splits on punctuation but keeps it with the punctuation with the associated word token • WordPunctTokenizer: splits all punctuation onto separate tokens • Word Tokenizers and Regular Expressions • Match on tokens separators, or gaps • Stopwords and Filtering

  9. Wordnet, Synsets, and Synonyms • Wordnet is a tool integrated into NLTK that contains listings of word relations (i.e. a lexical database) • Groupings of synonymous meanings that express the same concept are synsetinstances • Expressed in a tree • Hypernyms and Hyponyms • Synonyms and Antonyms

  10. Overview • What is NLTK? • NLTK Basic Functionalities • Part of Speech Tagging • Chunking and Trees • Other Functionalities • Works Cited

  11. POS Tagging • String Representation for Tagged Tokens (tuples) • Default Tagging • Tagging based off a Trained Corpus (Brown)

  12. POS Tagging • Types of Tagging • Unigram/Bigram Tagger • Regexp Tagging • Brill: uses and initial tagger than then applies transformation rules learned from the training corpus using “rule templates”

  13. Overview • What is NLTK? • NLTK Basic Functionalities • Part of Speech Tagging • Chunking and Trees • Other Functionalities • Works Cited

  14. Chunking and Trees • Default Chunking • Trees and Parsing • Drawing Trees

  15. Overview • What is NLTK? • NLTK Basic Functionalities • Part of Speech Tagging • Chunking and Trees • Other Functionalities • Works Cited

  16. Other Functionalities • Replacing and Correcting Words • Calculating WordNetSynsetSimilarity • Word Collections • Text Classification • Transforming Chunks and Trees • Processes for Distributed Processing and Handling Large Datasets • Parsing for Specific Data(Location, Dates and Times)

  17. Works Cited • Perkins, Jacob. Python Text Processing with NLTK 2.0 Cookbook. • http://wordnet.princeton.edu/ • http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html • http://nltk.org

More Related