1 / 11

Naive Bayes for Document Classification

Naive Bayes for Document Classification. Illustrative Example. Document Classification. Given a document, find its class (e.g. headlines, sports, economics, fashion…) We assume the document is a “ bag-of-words ” . d ~ { t 1 , t 2 , t 3 , … t nd }

lindaray
Télécharger la présentation

Naive Bayes for Document Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Naive Bayes for Document Classification Illustrative Example

  2. Document Classification • Given a document, find its class (e.g. headlines, sports, economics, fashion…) • We assume the document is a “bag-of-words”. d ~ { t1, t2, t3, … tnd } • Using Naive Bayes with multinomial distribution:

  3. Binomial Distribution • n independent trials (a Bernouilli trial), each of which results in success with probability of p • binomial distribution gives the probability of any particular combination of numbers of successes for the two categories. • e.g. You flip a coin 10 times with PHeads=0.6 • What is the probability of getting 8 H, 2T? • P(k) = • with k being number of successes (or to see the similarity with multinomial, consider first class is selected k times, ...)

  4. Multinomial Distribution • Generalization of Binomial distribution • n independent trials, each of which results in one of the k outcomes. • multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories k. • e.g. You have balls in three colours in a bin (3 balls of each color => pR=PG=PB), from which you draw n=9 balls with replacement. What is the probability of getting 8 Red, 1 Green, 0 Blue. • P(x1,x2,x3) =

  5. Naive Bayes w/ Multinomial Model from McCallum and Nigam, 1995 Advanced

  6. Naive Bayes w/ Multivariate Binomial from McCallum and Nigam, 1995 Advanced

  7. Smoothing For each term, t, we need to estimate P(t|c) Tct is the count of term t in all documents of class c 7

  8. Smoothing Because an estimate will be 0 if a term does not appear with a class in the training data, we need smoothing: Laplace Smoothing |V| is the number of terms in the vocabulary 8

  9. Two topic classes: “China”, “not China” V = {Beijing, Chinese, Japan, Macao, Tokyo, Shangai} N = 4 9

  10. Classification Probability Estimation 10

  11. Summary: Miscellaneous Naïve Bayes is linear in the time is takes to scan the data When we have many terms, the product of probabilities with cause a floating point underflow, therefore: For a large training set, the vocabulary is large. It is better to select only a subset of terms. For that is used “feature selection”. However, accuracy is not badly affected by irrelevant attributes, if data is large. 11

More Related