1 / 14

Text classification from unlabeled documents with bootstrapping and feature projection techniques

Text classification from unlabeled documents with bootstrapping and feature projection techniques. Presenter : You Lin Chen Authors :Youngjoong Ko a , Jungyun Seo b,*. 2009.IPM. Outline. Motivation Objective Methodology Experiments Conclusion Comments. Motivation. Automobile. Sport.

lyndonm
Télécharger la présentation

Text classification from unlabeled documents with bootstrapping and feature projection techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text classification from unlabeled documents with bootstrappingand feature projection techniques Presenter : You Lin Chen Authors :Youngjoong Koa, Jungyun Seo b,* 2009.IPM.

  2. Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments

  3. Motivation Automobile Sport Travel ??? ??? ??? Training Classifier Text classifier which uses supervised learning method requires a lot of labeld document.

  4. Objectives ??? ??? ??? classifier Title word Title word Title word This Paper propose a new text classification method. Use unlabeled data documents and the title word of each category for learning.

  5. Methodology

  6. Methodology http://www.cst.dk/online/pos_tagger/uk/index.html A sequence of 60 content words within a document is re- garded as the window size for one context.

  7. Methodology Category ‘Autos’;title word ‘ car ’

  8. Methodology 0.01 0.01 • context similarity • (…,engine,..,buy,car,have,,,…) (.,engine,..,sell,car,is,,,…) Centroid context (is,is,is,...car,is,is,is,…)

  9. Methodology • Word similarity • (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…) • Assignment of remaining contexts to a category Word similarity (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…)

  10. Methodology

  11. Methodology

  12. Experiments

  13. Conclusion Labeled data is expensive while unlabeled data is inexpensive and plentiful. This Paper proposed method is useful for low-cost text classification.

  14. Comments • Advantage • This idea is practice. • Drawback • Example is too less. • Application • Web Mining

More Related