1 / 6

Natural Language Toolkit

Natural Language Toolkit. Overview. The NLTK is a set of Python modules to carry out many common natural language tasks. Access it at nltk.sourceforge.net There are versions for Windows, OS X, Unix, Linux. Detailed instructions on Installation tab

paiva
Télécharger la présentation

Natural Language Toolkit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Toolkit Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

  2. Overview • The NLTK is a set of Python modules to carry out many common natural language tasks. • Access it at nltk.sourceforge.net • There are versions for Windows, OS X, Unix, Linux. Detailed instructions on Installation tab • In addition to the toolkit you will need two other modules: tkinter and Numeric. We haven’t been able to get numeric to install smoothly with Python 2.4 under Windows, only with 2.3. • You do also want the contrib and data packages. • Pay attention to what INSTALL.TXT in the data package says about the NLTK_CORPORA path. Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

  3. Accessing NLTK • Standard Python import command • >>> from nltk.corpus import gutenberg • >>> gutenberg.items() • ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'blake-songs.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt'] Or • >>> import nltk.corpus • >>> nltk.corpus.gutenberg.items() • ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'blake-songs.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt'] Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

  4. Modules • The NLTK modules include: • token: classes for representing and processing individual elements of text, such as words and sentences • probability: classes for representing and processing probabilistic information. • tree: classes for representing and processing hierarchical information over text. • cfg: classes for representing and processing context free grammars. • fsa: finite state automata • tagger: tagging each word with a part-of-speech, a sense, etc • parser: building trees over text (includes chart, chunk and probabilistic parsers) • classifier: classify text into categories (includes feature, featureSelection, maxent, naivebayes • draw: visualize NLP structures and processes • corpus: access (tagged) corpus data • We will cover some of these explicitly as we reach topics. Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

  5. One Simple Example IDLE 1.0.3 >>> from nltk.tokenizer import * >>> text_token = Token(TEXT='Hello world. This is a test file.') >>> print text_token <Hello world. This is a test file.> >>> WhitespaceTokenizer(SUBTOKENS='WORDS').tokenize(text_token) >>> print text_token <[<Hello>, <world.>, <This>, <is>, <a>, <test>, <file.>]> >>> print text_token['TEXT'] Hello world. This is a test file. >>> print text_token['WORDS'] [<Hello>, <world.>, <This>, <is>, <a>, <test>, <file.>] Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

  6. LAB • Detailed documentation and tutorials under the Documentation tab at the Sourceforge site. • Work through the “gentle introduction” and “elementary language processing” tutorials on the NLTK: nltk.sourceforge.net/tutorial/introduction/index.html Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

More Related