1 / 12

PoS tagging and Chunking with HMM and CRF

PoS tagging and Chunking with HMM and CRF. Dept. Of CSE IIT Madras. Pranjal Awasthi, Delip Rao, Ravindran Balaraman. Outline. Overview of the system PoS tagging with HMM Chunking with CRF Results Summary. Overview of the system.

jaimie
Télécharger la présentation

PoS tagging and Chunking with HMM and CRF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PoS tagging and Chunking with HMM and CRF Dept. Of CSE IIT Madras Pranjal Awasthi, Delip Rao, Ravindran Balaraman

  2. Outline • Overview of the system • PoS tagging with HMM • Chunking with CRF • Results • Summary

  3. Overview of the system Aim: To leverage existing tools and algorithms (for English) for the NLPAI task Tools used: TnT tagger, TBL, MALLET

  4. Overview of the system TNT CRF (MALLET) + TBL PoS Tagging Chunking

  5. The TnT tagger (Brants, 2000) • A Second Order Hidden Markov Model based tagger • Used for English and other languages • On NLPAI dataset, TnT alone gave F1=78.9 • Why TnT? • PoS tagging a sequence labeling task • HMM, CRFs are good candidates

  6. Poor performance of CRFs in PoS tagging • For NLPAI dataset F1 = 69.4 • Features used: wi-1, wi-1wi, wi+1, wiwi+1 • Linear chain CRF was used (MALLET) • Reasons for poor performance • Large number of PoS tags (26) compared to Chunking • Selection of features • Type of CRF?

  7. Transformation Based Learning (Brill, 1995) • Added as a post processing step to “correct” TnT output • Idea: • Derive correction rules during training based on observing what has gone wrong • Apply these rules for testing

  8. Transformation Based Learning (contd …) • Use of TnT improved F1 by 1% • TnT is sensitive to the templates used • Possible improvements on template selection • Training time can be long unless indexing is used

  9. Summary of PoS tagging Results

  10. Chunking with CRF • Based on (Sha & Periera, 2003) • Using SimpleTagger providedwith MALLET • Chunking accuracies

  11. Summary • Demonstrated the use of off-the-shelf software for Tagging and Chunking • Only code written: TBL + glue scripts • Overall PoS F1 = 80.74 and Chunk F1 = 79.58 • Have we “hit the wall” in pure ML based tools • Not sure yet!

  12. Thanks!

More Related