Bayesian Classification Dr. Navneet Goyal BITS, Pilani
200 likes | 351 Vues
Bayesian Classification Dr. Navneet Goyal BITS, Pilani. Bayesian Classification. What are Bayesian Classifiers? Statistical Classifiers Predict class membership probabilities Based on Bayes Theorem Naïve Bayesian Classifier Computationally Simple
Bayesian Classification Dr. Navneet Goyal BITS, Pilani
E N D
Presentation Transcript
Bayesian Classification • What are Bayesian Classifiers? • Statistical Classifiers • Predict class membership probabilities • Based on Bayes Theorem • Naïve Bayesian Classifier • Computationally Simple • Comparable performance with DT and NN classifiers
Bayesian Classification • Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems • Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.
Bayes Theorem • Let X be a data sample whose class label is unknown • Let H be some hypothesis that X belongs to a class C • For classification determine P(H/X) • P(H/X) is the probability that H holds given the observed data sample X • P(H/X) is posterior probability
Bayes Theorem Example: Sample space: All Fruits X is “round” and “red” H= hypothesis that X is an Apple P(H/X) is our confidence that X is an apple given that X is “round” and “red” • P(H) is Prior Probability of H, ie, the probability that any given data sample is an apple regardless of how it looks • P(H/X) is based on more information • Note that P(H) is independent of X
Bayes Theorem Example: Sample space: All Fruits • P(X/H) ? • It is the probability that X is round and red given that we know that it is true that X is an apple • Here P(X) is prior probability = P(data sample from our set of fruits is red and round)
Estimating Probabilities • P(X), P(H), and P(X/H) may be estimated from given data • Bayes Theorem • Use of Bayes Theorem in Naïve Bayesian Classifier!!
Naïve Bayesian Classification • Also called Simple BC • Why Naïve/Simple?? • Class Conditional Independence • Effect of an attribute values on a given class is independent of the values of other attributes • This assumption simplifies computations
Naïve Bayesian Classification Steps Involved • Each data sample is of the type X=(xi) i =1(1)n, where xi is the values of X for attribute Ai • Suppose there are m classes Ci,i=1(1)m. X Ci iff P(Ci|X) > P(Cj|X) for 1 j m, ji i.e BC assigns X to class Ci having highest posterior probability conditioned on X
Naïve Bayesian Classification The class for which P(Ci|X) is maximized is called the maximum posterior hypothesis. From Bayes Theorem • P(X) is constant. Only need be maximized. • If class prior probabilities not known, then assume all classes to be equally likely • Otherwise maximize P(Ci) = Si/S Problem: computing P(X|Ci) is unfeasible! (find out how you would find it and why it is infeasible)
Naïve Bayesian Classification • Naïve assumption: attribute independence = P(x1,…,xn|C) = P(xk|C) • In order to classify an unknown sample X, evaluate for each class Ci. Sample X is assigned to the class Ci iff P(X|Ci)P(Ci) > P(X|Cj) P(Cj) for 1 j m, ji
Naïve Bayesian Classification EXAMPLE
Naïve Bayesian Classification EXAMPLE X= (<=30,MEDIUM, Y,FAIR, ???) We need to maximize: P(X|Ci)P(Ci) for i=1,2. P(Ci) is computed from training sample P(buys_comp=Y) = 9/14 = 0.643 P(buys_comp=N) = 5/14 = 0.357 How to calculate P(X|Ci)P(Ci) for i=1,2? P(X|Ci) = P(x1, x2, x3, x4|C) = P(xk|C)
Naïve Bayesian Classification EXAMPLE P(age<=30 | buys_comp=Y)=2/9=0.222 P(age<=30 | buys_comp=N)=3/5=0.600 P(income=medium | buys_comp=Y)=4/9=0.444 P(income=medium | buys_comp=N)=2/5=0.400 P(student=Y | buys_comp=Y)=6/9=0.667 P(student=Y | buys_comp=N)=1/5=0.200 P(credit_rating=FAIR | buys_comp=Y)=6/9=0.667 P(credit_rating=FAIR | buys_comp=N)=2/5=0.400
Naïve Bayesian Classification EXAMPLE P(X | buys_comp=Y)=0.222*0.444*0.667*0.667=0.044 P(X | buys_comp=N)=0.600*0.400*0.200*0.400=0.019 P(X | buys_comp=Y)P(buys_comp=Y) = 0.044*0.643=0.028 P(X | buys_comp=N)P(buys_comp=N) = 0.019*0.357=0.007 CONCLUSION: X buys computer
Naïve Bayes Classifier: Issues • Probability values ZERO! • Recall what you observed in WEKA! • If Ak is continuous valued! • Recall what you observed in WEKA! If there are no tuples in the training set corresponding to students for the class buys-comp=NO P(student = Y|buys_comp=N)=0 Implications? Solution?
Naïve Bayes Classifier: Issues • Laplacian Correction or Laplace Estimator • Philosophy – we assume that the training data set is so large that adding one to each count that we need would only make a negligible difference in the estimated prob. value. • Example: D (1000) • Class: buys_comp=Y income=low – zero tuples income=medium – 990 tuples income=high – 10 tuples Without Laplacian Correction the probs. are 0, 0.990, and 0.010 With Laplacian correction: 1/1003 = 0.001, 991/1003=0.988, and 11/1003=0.011 respectively.
Naïve Bayes Classifier: Issues • Continuous variable: need to do more work than categorical attributes! • It is typically assumed to have a Guassian distribution with a mean and a std. dev. . • Do it yourself! And cross check with WEKA!
Naïve Bayes (Summary) • Robust to isolated noise points • Handle missing values by ignoring the instance during probability estimate calculations • Robust to irrelevant attributes • Independence assumption may not hold for some attributes • Use other techniques such as Bayesian Belief Networks (BBN)
Probability Calculations No. of attributes = 4 Distinct values = 3,3,3,3 No. of classes = 2 Total no. of probability calculations in NBC = 4*3*2 = 24! What if conditional ind. was not assumed? O(kp) for p k-valued attributes Multiply by m classes.