1 / 11

SNoW & FEX Libraries; Document Classification

SNoW & FEX Libraries; Document Classification. A third task. SNoW and FEX libraries. SNoW and FEX class libraries Alternative to scripts/server mode Functions for processing single examples Examples: Main.cpp in SNoW, FEX directories ShallowParser. Using SNoW, FEX classes.

ena
Télécharger la présentation

SNoW & FEX Libraries; Document Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SNoW & FEX Libraries;Document Classification A third task

  2. SNoW and FEX libraries • SNoW and FEX class libraries • Alternative to scripts/server mode • Functions for processing single examples • Examples: • Main.cpp in SNoW, FEX directories • ShallowParser

  3. Using SNoW, FEX classes • Parameters set in GlobalParams.h for SNoW, FexGlobalParams.h for FEX • Change defaults, or change in code • #include class headers as normal • Makefile: • compile with –I $(path-to-headers) • link with –L $(path to library) –lsnow (or –lfex)

  4. New task: Document-level Predicates • NOTE: actually easier than phrase-level • How to classify documents? • In reality, can be hard – multiple labels for each document • We’ll simplify – categorize by source (so single label for each example) • Dataset: 20 Newsgroups (bydate) • http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/

  5. Framing the Problem • First question: FEX format and mode • FEX has a document mode: ./fex –D <scr> <lex> <corp> <out> • Format: • Sentence, POS-tagged and column • Labelling is complicated

  6. Some Convenient Scripts • If pre-processing NG data doesn’t appeal... • Data in appropriate format is provided • Not labeled – use script to add labels to fex output ./run-ng-fex.sh input label output • Concatenate files with cat cat file1 file2 file3 > file4

  7. Applying the Tools... • ...should be easy! • Could implement solution in C++ with class libraries • When we’re done... • Summarize the process • Finish up

  8. Summarizing the Process • Frame problem as classification task • Find labeled data (for training/testing) • Decide how much preprocessing needed (how much info beyond bare text) • Custom & standard preprocessing • Apply and tune FEX and SNoW

  9. Summarizing the Process (cont’d) • Use ./analyze.pl to evaluate performance on test data • Use standard tools to process new text • May need custom post-processing to make use of newly classified data

  10. Summary • Have looked at three NLP tasks: • Context-sensitive spelling • Named Entity Tagging • Document Classification • Framed each problem as ML task • Word-, phrase- and document-level predicates

  11. Summary (cont’d) • Applied SNoW and FEX to each problem • Looked at different ways to run SNoW and FEX • Stand-alone & server mode • Scripts • Class libraries • Looked at ways to post-process data • Outlined standard procedure for using these tools

More Related