1 / 22

Rohit Kate

Computational Intelligence in Biomedical and Health Care Informatics HCA 590 (Topics in Health Sciences). Rohit Kate. Naïve Bayes Model. Reading.

nyla
Télécharger la présentation

Rohit Kate

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Intelligence in Biomedical and Health Care InformaticsHCA 590 (Topics in Health Sciences) Rohit Kate Naïve Bayes Model

  2. Reading Paper: Constructing naïve Bayesian classifiers for veterinary medicine: A case study in the clinical diagnosis of classical swine fever, P. L. Geenen, L. C. van der Gaag, W. L. A. Loeffen, A. R. W. Elbers, Research in Veterinary Science 91 (2011) 64-70 (skip 2.3.2)

  3. Estimating Joint Probability Distributions is Not Easy :-( • Suppose prostate cancer control (2 states) depends on: • Tumor Dose (5 states) • PSA (6 states) • Gleason (4 states) • T Stage (3 states) • Lymph Node Dose (2 states) • Local Tumor Control (2 states) • Lymph Node Control (2 states) • Total entries in the joint probability table: 5*6*4*3*2*2*2*2=2,880 • Not easy to estimate from clinical trials!

  4. Estimating Joint Probability Distributions • Simplification assumptions are made about the joint probability distribution to reduce the number of parameters to estimate • Let the random variables be nodes of a graph, there are two major types of simplifications, they are represented as • Directed probabilistic graphical models • Simplest: Naïve Bayes model • General: Bayesian networks • Undirected probabilistic graphical models • Simplest: Maximum entropy model • General: Markov networks

  5. Directed Graphical Models • Also known as Bayesian networks or Bayes nets • Simplification assumption: Some random variables are conditionally independent of others given values for some other random variables • Simplest directed graphical model: Naïve Bayesian or Naïve Bayes • Naïve Bayes assumption: The features (inputs) are conditionally independent given the category (output) • Very simplistic assumption, hence called “naïve”

  6. Probabilistic Classification and Naïve Bayes • Suppose X1,..,Xn are inputs and Y is an output • We want to predict Y given all inputs i.e. P(Y|X1,..,Xn), for example, P(Label|Shape,Color) • Naïve Bayes Assumption: Features are conditionally independent given the category: • How do we estimate P(Y|X1,X2,..,Xn) from this? • Bayes’ theorem lets us calculate P(B|A) in terms of P(A|B)

  7. Recall Bayes’ Theorem • Simple proof from definition of conditional probability: (Def. cond. prob.) (Def. cond. prob.) QED:

  8. Naïve Bayes Model From Bayes’ Theorem Computing marginals Definition of conditional probability

  9. Naïve Bayes Model Naïve Bayes assumption

  10. Naïve Bayes Model • We only need to estimate P(Y) and P(Xi|Y) for all i • Assuming all Y and Xis are binary, only 2n+1 parameters instead of 2n+1-1 parameters: a dramatic reduction (for n=9, a reduction from 1023 to 19!) • Directed graphical model representation: Lightning Y P(Y) Rain Thunder X1 P(X1|Y) X2 P(X2|Y) X3 ……. P(X3|Y) Xn P(Xn|Y)

  11. Naïve Bayes Example Y P(Y) Size P(Size|Y) Color P(Color|Y) Shape P(Shape|Y)

  12. Naïve Bayes Example P(Label|Size,Color,Shape) Test Instance: <medium ,red, circle> What is the probability of it being positive?

  13. Naïve Bayes Example Relevant part of the table Test Instance: <medium ,red, circle> P(positive |medium,red,circle) =P(medium,red,circle|positive)*P(positive)/P(medium,red,circle) = P(medium | positive)*P(red | positive)*P(circle | positive) *P(positive)/ P(medium,red,cirle) P(medium,red,circle) = P(medium,red,circle|positive)* P(positive) + P(medium,red,circle|negative)*P(negative) = P(medium | positive)*P(red | positive)*P(circle | positive)*P(positive) + P(medium | negative)*P(red | negative)*P(circle | negative)*P(negative) = (0.5*0.1*0.9*0.9+0.5*0.2*0.3*0.3) = 0.0405+0.009=0.0495 P(positive|medium,red,circle)= 0.5*0.1*0.9*0.9/0.0495= 0.0405/0.0495 = 0.8181 P(negative|medium,red,circle)=0.5*0.2*0.3*0.3/0.0495=0.009/0.0495=0.1818 Hence the test example is more likely to be positive.

  14. Naïve Bayes Example prior or pre-test probability Prostate cancer (PC) High PSA (HPSA) Abnormal Free PSA (FPSA) Specificity Specificity Sensitivity Sensitivity Independence assumption: Given presence or absence of prostate cancer, the results of the two tests are independent. The above has all the information needed to compute probability ,for example, P(PC=Present|HPSA=Positive,FPSA=Positive), called posterior or post-test probability.

  15. Estimating P(Y) and P(X|Y) • Normally, probabilities are estimated based on observed frequencies in the training data. • If data contains n examples and nkexamples are category yk then • If data nijk of these nk examples have the jth value for feature Xi, xij, then: • Although intuitive, why dividing this way is the best estimate of probability? • These are what are called maximum likelihood estimates and they maximize the probability of observing the given examples

  16. Maximum Likelihood Estimate • P(A teenager will drink and drive) = # of Teenagers who drink and drive/# of teenagers • Suppose out of a sample of 5 teenagers one drinks and drives P(A teenager will drink and drive) = 1/5 • This estimate can be proven to be maximum likelihood estimate (MLE) because it maximizes the probability that it will generate the sampled data • Any other probability value will give less probability to the sample data

  17. Maximum Likelihood Estimates • If P(TDD) = 1/5 then P(not TDD)=4/5 • Probability of the sampled data making assumption that each teenager is independent • P(Data)=1/5*(4/5)4=0.08192 • P(Data) will be less for any other value of P(TDD), • For P(TDD)=1/6 • P(Data) = 1/6*(5/6)4 = 0.0803. • For P(TDD)=1/4 • P(Data) = 1/4*(3/4)4 = 0.07091 • This can be also mathematically shown • Hence simple frequency counts are not only intuitive but also theoretically the best probability estimates

  18. Test Instance: <medium, red, circle> P(positive | medium, red, circle) = 0.5 * 0.0 * 1.0 * 1.0 / P(medium, red, circle) = 0? P(negative | medium, red, circle) = 0.5 * 0.0 * 0.5 * 0.5 / P(medium, red, circle) = 0? Probability Estimation Example

  19. Smoothing • To account for estimation from small samples, probability estimates are adjusted or smoothed. • Laplace smoothing using an m-estimate assumes that each feature is given a prior probability, p, that is assumed to have been previously observed in a “virtual” sample of size m. • For binary features, p is simply assumed to be 0.5.

  20. Laplace Smothing Example • Assume training set contains 10 positive examples: 4: small 0: medium 6: large • Estimate parameters as follows (if m=1, p=1/3) P(small | positive) = (4 + 1/3) / (10 + 1) = 0.394 P(medium | positive) = (0 + 1/3) / (10 + 1) = 0.03 P(large | positive) = (6 + 1/3) / (10 + 1) = 0.576

  21. Reading: An Application of Naïve Bayes in Medicine Paper: Constructing naïve Bayesian classifiers for veterinary medicine: A case study in the clinical diagnosis of classical swine fever, P. L. Geenen, L. C. van der Gaag, W. L. A. Loeffen, A. R. W. Elbers, Research in Veterinary Science 91 (2011) 64-70 (skip 2.3.2) • Why this paper? Demonstrates a real-world application of naïve Bayes model in medicine, recent and easy to understand • Applies naïve Bayes model to predict classical swine fever in herds • Uses 32 binary features (symptoms shown by one or more pigs in a herd) • Probabilities are estimated using data collected from another study • Naïve Bayes model performs as well as manually constructed diagnostic rules to predict classical swine fever

  22. Comments on Naïve Bayes • Tends to work well in practice despite naive assumption of conditional independence. • Although it does not produce accurate probability estimates when its independence assumptions are violated, it may still pick the correct maximum-probability class in many cases. • It is a machine learning method, also present in Weka

More Related