1 / 15

Dear SIR,

Dear SIR,

jgolson
Télécharger la présentation

Dear SIR,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you in absolute confidence primarily to seek your assistance to transfer our cash of twenty one Million Dollars ($21,000.000.00) now in the custody of a private Security trust firm in Europe the money is in trunk boxes deposited and declared as family valuables by my late father as a matter of fact the company does not know the content as money, although my father made them to under stand that the boxes belongs to his foreign partner. …

  2. This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See http://spamassassin.org/tag/ for more details. Content analysis details: (12.20 points, 5 required) NIGERIAN_SUBJECT2 (1.4 points) Subject is indicative of a Nigerian spam FROM_ENDS_IN_NUMS (0.7 points) From: ends in numbers MIME_BOUND_MANY_HEX (2.9 points) Spam tool pattern in MIME boundary URGENT_BIZ (2.7 points) BODY: Contains urgent matter US_DOLLARS_3 (1.5 points) BODY: Nigerian scam key phrase ($NN,NNN,NNN.NN) DEAR_SOMETHING (1.8 points) BODY: Contains 'Dear (something)' BAYES_30 (1.6 points) BODY: Bayesian classifier says spam probability is 30 to 40% [score: 0.3728]

  3. Bayes Classifiers • Bayesian classifiers use Bayes theorem, which says p(cj | d) = p(d | cj ) p(cj) p(d)where p(cj | d) = probability of instance d being in class cj, p(d | cj) = probability of generating instance d given class cj, p(cj) = probability of occurrence of class cj, and p(d) = probability of instance d occurring

  4. Bayesian classifiers use Bayes theorem, which says p(cj | d) = p(d | cj ) p(cj) p(d)where p(cj | d) = probability of instance d being in class cj, p(d | cj) = probability of generating instance d given class cj, p(cj) = probability of occurrence of class cj, and p(d) = probability of instance d occurring Assume that we have two classes c1 = male, and c2 = female. We have a person whose sex we do no know, say “drew” or d. Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male| drew) or p(female| drew) p(male| drew) = p(drew | male) p(male) p(drew)

  5. p(cj | d) = p(d | cj ) p(cj) p(d) p(cj | d) = probability of instance d being in class cj, p(d | cj) = probability of generating instance d given class cj, p(cj) = probability of occurrence of class cj, and p(d) = probability of instance d occurring Officer Drew p(male| drew) = p(drew | male) p(male) p(drew) p(male| drew) = 1/3 * 3/8 = 0.125 3/8 3/8 p(female| drew) = 2/5 * 5/8 = 0.250 3/8 3/8

  6. Officer Drew IS a female! Officer Drew p(male| drew) = 1/3 * 3/8 = 0.125 3/8 p(female| drew) = 2/5 * 5/8 = 0.250 3/8

  7. Naïve Bayesian Classifiers • Bayesian classifiers require • computation of p(d | cj) • computation of p(cj) • p(d) can be ignored since it is the same for all classes • To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate • p(d|cj) = p(d1|cj) * p(d2|cj) * ….* (p(dn|cj) • Each of the p(di|cj) can be estimated from the training data p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj) Height Eye-color … Long-hair

  8. p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj) Naïve Bayesian Classifier p(d|cj) p(d1|cj) p(d2|cj) p(dn|cj)

  9. Naïve Bayes is NOT sensitive to irrelevant features. Suppose we are trying to classify sex based on eye color… p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj) • p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj) • p(drew|male) = p(d1|cj) * ….* p(blue_eyes| male) • p(drew|female) = p(d1|cj) *….* p(blue_eyes|female)

  10. Naïve Bayes is fast and does not need much space We can look up all the probabilities once and store them in a table.. p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj)

  11. Problem! Naïve Bayes assumes independence of features… p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj)

  12. Solution Consider the relationships between attributes… p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj)

  13. Solution Consider the relationships between attributes… p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj) But how do we find the set of connecting arcs?? ReadKeogh, E. & Pazzani, M. (1999). Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches. In Uncertainty 99, 7th. Int'l Workshop on AI and Statistics, Ft. Lauderdale, FL, pp. 225--230. Don’t bother writing a reaction paper, but if we had a pop quiz…

  14. Naïve Bayesian Classifiers Visual Intuition I 5 foot 8 4 foot 8 6 foot 6 5 foot 8

  15. Naïve Bayesian Classifiers Visual Intuition II p(cj | d) = probability of instance d being in class cj, P(male | 5 foot 8 ) = 10 / (10 + 2) = 0.833 P(female | 5 foot 8 ) = 2 / (10 + 2) = 0.166 10 2 5 foot 8

More Related