Download
some effective techniques for naive bayes text classification n.
Skip this Video
Loading SlideShow in 5 Seconds..
Some Effective Techniques for Naive Bayes Text Classification PowerPoint Presentation
Download Presentation
Some Effective Techniques for Naive Bayes Text Classification

Some Effective Techniques for Naive Bayes Text Classification

116 Vues Download Presentation
Télécharger la présentation

Some Effective Techniques for Naive Bayes Text Classification

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Some Effective Techniques for Naive Bayes Text Classification Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng 2006 . TKDE . Page(s) : 1457 - 1466

  2. Outline • Motivation • Objective • About Naïve Bayes • Method • A per-document length normalization approach • Weight-enhancing method • Experimental Result • Conclusion • Personal Opinions

  3. Motivation • While naïve Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. • Based on the observation of naïve Bayes for the natural language text, we found a serious problem in the parameter estimation process, which cause poor results in text classification domain.

  4. Objective • We hope to propose some methods that can improve these problems.

  5. About Naive Bayes • Multivariate Bernoulli naïve Bayes A document is considered as a binary feature vector representing whether each word is present or absent. It is not equipped to utilize term frequencies in documents. Multinomial model Two serious problems: (1) rough parameter estimation (2) handling rare categories

  6. About Naive Bayes

  7. Method ─ Multivariate Poisson Model for Text Classification λ表示某特定區間內某事件所發生的平均次數

  8. Method ─ A per-document length normalization approach

  9. Method ─ Feature Weighting Scheme

  10. Experimental Results DS1: Reuters21578 (consists of 21,578 news articles) DS2: 20Newsgroups (consists of 19,997 Usenet articles collected from 20 different newsgroups)

  11. Experimental Results high high high high

  12. Experimental Results

  13. Conclusion • We propose a Poisson naive Bayes text classification model with weight-enhancing method. • We suggest per-document term frequency normalization to estimate the Poisson parameter, while the traditional multinomial classifier estimates its parameters by considering all the training documents as a unique huge training document.

  14. Personal Opinions • Advantage • … • Drawback • … • Application • Text classification…