1 / 14

Some Effective Techniques for Naive Bayes Text Classification

Some Effective Techniques for Naive Bayes Text Classification. Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng. 2006 . TKDE . Page(s) : 1457 - 1466. Outline. Motivation Objective

niyati
Télécharger la présentation

Some Effective Techniques for Naive Bayes Text Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Effective Techniques for Naive Bayes Text Classification Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng 2006 . TKDE . Page(s) : 1457 - 1466

  2. Outline • Motivation • Objective • About Naïve Bayes • Method • A per-document length normalization approach • Weight-enhancing method • Experimental Result • Conclusion • Personal Opinions

  3. Motivation • While naïve Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. • Based on the observation of naïve Bayes for the natural language text, we found a serious problem in the parameter estimation process, which cause poor results in text classification domain.

  4. Objective • We hope to propose some methods that can improve these problems.

  5. About Naive Bayes • Multivariate Bernoulli naïve Bayes A document is considered as a binary feature vector representing whether each word is present or absent. It is not equipped to utilize term frequencies in documents. Multinomial model Two serious problems: (1) rough parameter estimation (2) handling rare categories

  6. About Naive Bayes

  7. Method ─ Multivariate Poisson Model for Text Classification λ表示某特定區間內某事件所發生的平均次數

  8. Method ─ A per-document length normalization approach

  9. Method ─ Feature Weighting Scheme

  10. Experimental Results DS1: Reuters21578 (consists of 21,578 news articles) DS2: 20Newsgroups (consists of 19,997 Usenet articles collected from 20 different newsgroups)

  11. Experimental Results high high high high

  12. Experimental Results

  13. Conclusion • We propose a Poisson naive Bayes text classification model with weight-enhancing method. • We suggest per-document term frequency normalization to estimate the Poisson parameter, while the traditional multinomial classifier estimates its parameters by considering all the training documents as a unique huge training document.

  14. Personal Opinions • Advantage • … • Drawback • … • Application • Text classification…

More Related