Bayes’ Theorem

Bayes’ Theorem

Let’s consider an example. Say you have 31 people who play golf. One way to divide up the people is to put them in groups based on whether they hit a golf ball from the “right handed” side or the “left handed” side. Say X1 is group label for the right handed and X2 is for left handed duffers. Another way to divide them up is if they wear a golf glove or not. Say Y1 is for those that wear and Y2 is for those that do not wear a golf glove when they play. Say the following crosstabulation or contingency table results for the 31 people. Y1 Y2 Total X1 5 12 17 X2 10 4 14 Total 15 16 31 From this table we can say something about a variety of probabilities – 20, in fact.

Now, from this table you could very naturally consider the following: 1) Marginal probabilities (using the relative frequency approach to probability – like the prob of X1.) P(X1) = 17/31, P(X2) = 14/17, P(Y1) = 15/31, P(Y2) = 16/31, 2) Conditional probabilities (like probability of X1 given that Y1 has occurred). In columns (like what is prob of being X1 given you have a Y1?) P(X1|Y1) = 5/15 P(X1|Y2) = 12/16 P(X2|Y1) = 10/15 P(X2|Y2) = 4/16 In rows P(Y1|X1) = 5/17 P(Y2|X1) = 12/17 P(Y1|X2) = 10/14 P(Y2|X2) = 4/14

3) Intersections (like what is prob of X1 and Y1) P(X1 ⋂ Y1) = 5/31 (which could have been written P(Y1 X1) P(X2 ⋂ Y1) = 10/31 P(X1 ⋂ Y2) = 12/31 P(X2 ⋂ Y2) = 4/31 4) Unions (like what is prob of X1 or Y1) P(X1 ⋃ Y1) = (5 + 12 + 10)/31 = (17 + 15 – 5)/31 = 27/31 P(X2 ⋃ Y1) = (10 + 4 + 5)/31 = (14 + 15 – 10)/31 = 19/31 P(X1 ⋃ Y2) = (5 + 12 + 4)/31 = (17 + 16 – 12)/31 = 21/31 P(X2 ⋃ Y2) = (10 + 4 + 12)/31 = (14 + 16 – 4)/31 = 26/31 This way says add the totals and subtract the intersection so you only add the overlap once. This way seems more natural – just add the relevant boxes and divide by the grand total

Now, let’s take the crosstabulation and construct the joint probability table by dividing by the grand total of 31. Y1 Y2 Total X1 5/31 12/31 17/31 X2 10/31 4/31 14/31 Total 15/31 16/31 31/31 On the next slide I show what this would be like in general terms. You will note in the joint probability table we have marginal probabilities and intersections.

Y1 Y2 Totals X1 P(X1 ⋂ Y1) P(X1 ⋂ Y2) P(X1) X2 P(X2 ⋂ Y1) P(X2 ⋂ Y2) P(X2) Totals P(Y1) P(Y2) 1.00 Again, from the table we have intersections and marginal probabilities already calculated. To get conditional probabilities and unions from this table we just follow the rules. For conditional probabilities, like prob of X1 given Y1 we have P(X1|Y1) = P(X1 ⋂ Y1)/P(Y1) = (5/31)/(15/31) = 5/15. For unions, like prob of X1 or Y1 we have P(X1 ⋃ Y1) = P(X1) + P(Y1) – P(X1 ⋂ Y1) = 17/31 + 15/31 – 5/31 = 27/31

Notice on slide 5 that, for example, in the Y2 column 12/31 + 4/31 = 16/31. On slide 6 this would be seen as P(X1 ⋂ Y2) + P(X2 ⋂ Y2) = P(Y2). In this example we have seen a whole bunch of information. In fact, you know the raw data and you have seen how we calculate many types of probability. Bayes’ Theorem is just a statement that if we know only some information we can still calculate some of the other parts based on what we know. As an example, say we know P(X1) = 17/31, P(X2) = 14/31, P(Y2|X1) = 12/17, and P(Y2|X2) = 4/14. From this we can calculate P(X1 ⋂ Y2), P(X2 ⋂ Y2), P(Y2), P(X1|Y2), P(X2|Y2).

Y1 Y2 Totals X1 P(X1 ⋂ Y1) P(X1 ⋂ Y2) P(X1) X2 P(X2 ⋂ Y1) P(X2 ⋂ Y2) P(X2) Totals P(Y1) P(Y2) 1.00 So, say we know P(X1), P(X2), P(Y2|X1), P(Y2|X2). In this context P(X1) and P(X2) are called prior probabilities. P(Y2|X1) and P(Y2|X2) would be considered additional information that has to be obtained from some source. Note the additional information is about Y2, conditionally. From the definition of conditional probability we can calculate what is under the Y2 column above and then we can calculate the conditional probabilities P(X1|Y2) and P(X2|Y2), here called posterior probabilities. Posterior probabilities are just conditional probabilities, but we do not construct them from raw data because we do not have the raw data. But we have gained enough information to exploit what is known about the relationships that exist in such a table.

Let’s think about an example. Say that in a group of golfers 17/31 are right handed and 14/31 are left handed. Also say that of those who are right handed, 12/17 play without a glove, while 4/14 of those who are left handed play without a glove. Of those who play without a glove, what percentage are right handed and what percentage are left handed? So, overall we know P(X1) and P(X2) (the prior probabilities), but we want to know P(X1|Y2) and P(X2|Y2) (the posterior probabilities). Now as an intermediate step we have to find from the previous slide what is under column Y2. To do this remember conditional probability is defined as P(B|A) = P(B ⋂ A)/P(A), so P(A ⋂ B) = P(B|A)P(A)

From the example, P(X1) = 17/31, P(X2) = 14/31, P(Y2|X1) = 12/17, and P(Y2|X2) = 4/14. Thus P(X1 ⋂ Y2) = P(Y2|X1)P(X1) = (12/17)(17/31) = 12/31, P(X2 ⋂ Y2) = P(Y2|X2)P(X2) = (4/14)(14/31) = 4/31, P(Y2) = P(X1 ⋂ Y2) + P(X2 ⋂ Y2) = 12/31 + 4/31 = 16/31, Thus, P(X1|Y2) = P(X1 ⋂ Y2)/P(Y2) = (12/31)/(16/31) = 12/16, and P(X2|Y2) = P(X2 ⋂ Y2)P(Y2) = (4/31)/(16/31) = 4/16. WOW, I need to put this away and go get a drink - - of water!

How can I summarize what I have done here? • Well, • I had prior probabilities (marginal probs) • 2) Multiplied the priors by known conditionals to get joint probs • 3) Added the joint probs to get a marginal prob, • 4) Calculated new conditionals by taking the calculated joint probabilities and dividing by the calculated marginal prob. • Let’s do another example. • Say A1 = parts to a company come from supplier 1, with P(A1) = .65 (65% of parts come from supplier 1), • A2 = parts to the company come from supplier 2, with P(A2) = .35 (35% of parts come from supplier 2).

Now, say that given the part is from supplier 1 the part is good (G) 98 percent of the time and given that it is from supplier 2 the part is good 95 percent of the time. The joint probabilities are (by multiplication) P(A1 ⋂ G) = .65(.98) and P(A2 ⋂ G) = .35(.95), and And the marginal probability is found by adding these two together. So, P(G) = .65(.98) + .35(.95) = .637 + .3325 = .9695. Then get the new conditional probabilities by division P(A1|G) = .65(.98)/[(.65(.98)) + (.35(.95))] = .637/.9695 = .657, and P(A2|G) = .35(.95)/[(.65(.98)) + (.35(.95))] = .343. So, what does this all mean? We started out knowing that 65% of parts come from supplier 1. We find that 96.95% of the parts that are received are good. Of those that are good, 65.7% are from supplier 1 and 34.3% are from supplier 2.

Bayes’ Theorem

Bayes’ Theorem

Presentation Transcript

Bayes for Beginners

Reynolds Transport Theorem vs. Material Derivative

ECE 2300 Circuit Analysis

Recursive Bayes Filtering Advanced AI

Propositional Approaches to First-Order Theorem Proving

THE BINOMIAL THEOREM Robert Yen

K-nearest neighbor & Naive Bayes

Chomp

Statistics

Midline Theorem and Related Theorems

BASIC PROPORTIONALITY THEOREM

Polynomial Bounds for the Grid-Minor Theorem

INTEGRALS

CONTRACTS – Exceptions To Coase’s Theorem – Empty Core October 31, 2006

Chapter 4 Probability

Criterion for the Index Theorem on the lattice

Bayes, birds and brains: applications of inference and probabilistic modelling

Statistics

Text Classification and Na ï ve Bayes

CONTRACTS – Exceptions To Coase’s Theorem October 24, 2006

Vector Calculus

TRANSFORMER

Bayes’ Theorem

Bayes’ Theorem

Presentation Transcript

Bayes for Beginners

Reynolds Transport Theorem vs. Material Derivative

ECE 2300 Circuit Analysis

Recursive Bayes Filtering Advanced AI

Propositional Approaches to First-Order Theorem Proving

THE BINOMIAL THEOREM Robert Yen

K-nearest neighbor &amp; Naive Bayes

Chomp

Statistics

Midline Theorem and Related Theorems

BASIC PROPORTIONALITY THEOREM

Polynomial Bounds for the Grid-Minor Theorem

INTEGRALS

CONTRACTS – Exceptions To Coase’s Theorem – Empty Core October 31, 2006

Chapter 4 Probability

Criterion for the Index Theorem on the lattice

Bayes, birds and brains: applications of inference and probabilistic modelling

Statistics

Text Classification and Na ï ve Bayes

CONTRACTS – Exceptions To Coase’s Theorem October 24, 2006

Vector Calculus

TRANSFORMER

K-nearest neighbor & Naive Bayes