400 likes | 491 Vues
Learn about conditional, joint, marginal, and independent probabilities in business statistics. Understand how to calculate probabilities, such as the chance a building catches fire and is completely destroyed. Practice exercises included.
 
                
                E N D
Welcome to BUAD 310 Instructor: Kam Hamidieh Lecture 6, Monday February 3, 2014
Agenda & Announcement • Today: • Chapter 8. • Start Chapter 9. • Read all of 8 and 9. • HW 2 is due on Wednesday February 12th, 5 pm. BUAD 310 - Kam Hamidieh
FYI See: http://www.meetup.com/ticketing/ticket_printable/?event_id=162171692 BUAD 310 - Kam Hamidieh
From Last Time • For any events A and B defined on S, • 0  P(A)  1 • P( S ) = 1 ( P(empty set )=0 ) • If A and B are disjoint, then P(A or B) = P(A  B) = P(A) + P(B) • Using the above, you can also show that:P(A) + P(AC) = 1. • General Addition rule:P(A  B) = P(A) + P(B) – P(A ∩ B) • Conditional Probabilities:P(A|B) = P(A ∩ B) / P(B), P(B) > 0 • Independent event:P(A ∩ B) = P(A) × P(B) or P(A|B) = P(A) (P(B) > 0) BUAD 310 - Kam Hamidieh
In Class Exercise From Last Time A fire insurance company insures a building that they estimate would have a 0.70 probability of being completely destroyed given it catches fire. They have accurate estimates that a building has only a 0.02 probability of experiencing a fire. What is the that a building catches fire and is completely destroyed by fire? Hint: apply multiplication rule. BUAD 310 - Kam Hamidieh
Solution Let: D = { Building completely destroyed } F = { Building catches fire } You have the following info: P(D|F) = 0.70 and P(F) = 0.02 You are asked to find: P( D ∩ F ) = ? Note D and F are not independent. Apply multiplication rule: P( D ∩ F ) = P(D|F) × P(F) = 0.70 × 0.02 = 0.014 BUAD 310 - Kam Hamidieh
Condition Probabilities in Business • Corporations just like people borrow money to conduct day to day business • Agencies, Moody's, S&P and Fitch, give “ratings” to companies. • S&P: AAA , AA, A, …, C. • They issue reports that are used by market participants. BUAD 310 - Kam Hamidieh
From Tables to Probabilities • This contingency table summarizes two variable for visits to Amazon.com. • Two Variables:Host (that sent the visitor) = { Comcast, Google, Nextag}Purchase = { Yes, No } • Assume the next visitor to Amazon.com behaves like a random choice from the 28,975 cases in the contingency table BUAD 310 - Kam Hamidieh
Joint Probabilities Take these numbers and divide by total of 28,975 These numbers give you the joint probabilities of Host & Purchase variables. BUAD 310 - Kam Hamidieh
Joint Probabilities • Joint probability (distribution) give you the probabilities that different values of two or more events occur. • Example: Let Y = { Yes }, C = { Comcast }P( Yes and Comcast) = P( Y and C ) = 0.001 • Note joint probabilities add up to 1. BUAD 310 - Kam Hamidieh
Marginal Probabilities • Marginal probabilities (distributions) give you the probabilities for each values of the variables. • Example: Let Y = { Yes }, C = { Comcast }, N = { No }P( Y ) = 0.034, P( C ) = 0.010, P( N ) = 0.966 • Marginal probabilities add up to 1. Marginal Probabilities BUAD 310 - Kam Hamidieh
Conditional Probabilities • Of interest to Amazon.com is the question “which host will deliver the best(?) visitors, that is those who are more likely to make a purchase?” • Find conditional probabilities to answer questions like “among visitors from Comcast, what is the chance a purchase is made?” • Let Y = { Yes }, C = { Comcast } , G = { Google }, E = { NexTag } • Compare: P(Y|C) vs. P(Y|G) vs. P(Y|E) BUAD 310 - Kam Hamidieh
Conditional Probabilities P(Y|C) = 27/295  0.10 Likewise:P(Y|G) = 926/27,995  0.03 P(Y|E) = 29/685  0.04 SO WHAT? Best? BUAD 310 - Kam Hamidieh
In Class Exercise 1N.R. Cook et al “Long term effects of dietary sodium reduction on cardiovascular disease”, BMJ, 2007 A study was conduct to investigate the relationship between cardiovascular disease (CVD) and salt intake. The collected information is summarized below. Find the row and column totals. What are the joint probabilities? Assume we can take the above results to estimate probabilities for the general population. What is the probability of developing CVD given that an individual had low salt consumption? How about for high consumption? Comment on the difference. BUAD 310 - Kam Hamidieh
Probability Trees A blood test is 99% effective in detecting a certain disease when the disease is present. However, the test also yields a false-positive result for 2% of the healthy patients tested. (That is, if a healthy person is tested, then with a probability of 0.02 the test will say that this person has the disease.) Suppose 0.5% (5 out of 1000) of the population has the disease. Given that a person has tested positive, what is the probability that this person actually has the disease? Any Guesses? BUAD 310 - Kam Hamidieh
Probability Trees • We have: • P( +| disease ) = 0.99 so we also know P( - |disease ) = 1 – 0.99 = 0.01 • P( disease ) = 0.005 so we also know P( no disease ) = 1 – 0.005 = 0.995 • P( +| no disease ) = 0.02 • Note we want to find P( disease| + ). • We can use a table but a tree is a better way. BUAD 310 - Kam Hamidieh
P(+|D)=0.99 Positive Disease Negative P(-|D)=0.01 Person P(+|DC) = 0.02 Positive P(DC) = 0.995 No Disease Negative P(-|DC) = 0.98 Probability Trees P(+ and D ) = 0.00495 = 0.005×0.99 P(D) = 0.005 P(- and D) = 0.00005 = 0.005×0.01 P(+ and DC ) = 0.0199 = 0.995×0.02 P(- and DC ) = 0.9751 = 0.995×0.98 1 (The sum is always 1) BUAD 310 - Kam Hamidieh
0.99 Positive Disease 0.005 Negative 0.01 Person Positive 0.02 0.995 No Disease Negative 0.98 Probability Trees 0.00495 = 0.005×0.99 495 0.00005 = 0.005×0.01 500 5 100,000 0.0199 = 0.995×0.02 1990 99,500 0.9751 = 0.995×0.98 97510 1 (The sum is always 1) P( disease|positive) = (0.00495)/ (0.00495 + 0.0199) ≈ 0.1991 P( disease|positive) = (495)/ (495 + 1990) ≈ 0.1991 BUAD 310 - Kam Hamidieh
Some Tidbits (Skip?) • The general technique used above to get P(A|B) from P(B|A) is call Bayes’ Rule. See page 190. • When a disease is rare, testing lots of people creates a lot of false positives. So what do you do? BUAD 310 - Kam Hamidieh
HIV (Time Permitting & not required.) • Some terminology you may come across: • P( test positive | have disease ) is called the sensitivity of the test. This is the probability of correctly testing positive. • P( test negative | don’t have disease ) is called the specificity of the test. This is the probability of correctly testing negative. • P( have disease ) is called the disease prevalence. • P( have disease | test positive ) is called the positive predictivity. • Our problem asks what is the positive predictivity of this test? • Go to http://www.cdc.gov/hiv/statistics/basics/ • Go to http://www.census.gov/popclock/ • Go to http://www.cdc.gov/hiv/pdf/library_slideSet_testing_usca_branson.pdf • Go to http://www.oraquick.com/Home& http://www.oraquick.com/Taking-the-Test/Understanding-Your-Results • Go to http://finance.yahoo.com/q?s=OSUR BUAD 310 - Kam Hamidieh
In Class Exercise 2 Recent studies show that a randomly selected email has a 0.80 probability of being spam. One the favorite tricks of the spammers is to peddle products with the word “Viagra”! It has been estimated that given a message is spam, the probability that the word “Viagra” is in it is 0.95. We also know that given a message is not spam, there is 0.01 probability that it has the word “Viagra” in it. Suppose you get a message with the word “Viagra” in it. What is the probability that that this message is spam? BUAD 310 - Kam Hamidieh
Real World Applications Check out: http://en.wikipedia.org/wiki/Bayesian_spam_filtering The probability that a message is spam given that it contains a word is called the “Spamicity” of a word. Generally a cut off of 0.9 or higher for P(Spam|Word) is used to classify an email as a spam. BUAD 310 - Kam Hamidieh
Random Variable • Random variables assign numbers to the events in the sample space. • The value of a random variable is unknown before the experiment. • They are usually represent by X, Y, Z. Event A So here as an example, X( ) takes an event A, and assigns it some number. X( ) X(A) = some number BUAD 310 - Kam Hamidieh
Quick Example & Motivation Tossing a Coin Twice • Sample space, S = { HH, HT, TH, TT } • A game is played: H -> you win $5, T -> you lose $5 • Some possible events: A = {HH}, B = {TT} • Example random variable X: • X(A) = +10 • X(B) = -10 • Now instead of asking P(A), what is the probability you get two heads in a row, you ask P(X = 10), what is the probability you win $10. BUAD 310 - Kam Hamidieh
Random Variables • Two different classes of random variables: • A continuous random variable can take any value in an interval or collection of intervals. • A discrete random variable can take one of a countable list of distinct values. • We will focus on the discrete case first. BUAD 310 - Kam Hamidieh
Some Notation • Notation: • X, the random variable. • P(X = x), probability that X takes on the value x, where x is one of the possible values of X. Sometimes:P(X = x) = p(x) • P(X ≤ x), probability that X takes on the values less than or equal to k. • So on… BUAD 310 - Kam Hamidieh
Probability Distribution • Probability distribution of X is a table or a rule that assigns probabilities to the possible values of X. • This is basically our probability (statistical) model. • How do you find pd of X (discrete)? • List all simple events in sample space. • Find probability for each simple event. • List possible values for random variable X and identify the value for each simple event. • Find all simple events for which X = x, for each possible value k. • P(X = x) is the sum of the probabilities for all simple events for which X = x. BUAD 310 - Kam Hamidieh
Example of PD of X • Our experiment: Toss the coin 3 times so S = {HHH, HHT, HTH, THH, TTT, TTH, THT, HTT} • Let X = number of heads. Then X(HHH) = 3, …, X(HTT) = 1 • Note X can take on only 4 possible values: {0,1,2,3}. • So here the pdof X is:P(X=0) = 1/8, P(X=2) = 3/8, P(X=1) = 3/8, P(X=3)=1/8 BUAD 310 - Kam Hamidieh
Graph and Table of PD of X 3/8 3/8 1/8 1/8 BUAD 310 - Kam Hamidieh
Conditions for PD of X • Condition 1 - The sum of the probabilities over all possible values of a discrete random variable must equal 1. • Condition 2 - The probability of any specific outcome for a discrete random variable must be between 0 and 1. • Conditions 1 and 2 must be satisfied to have a legitimate discrete random variable. BUAD 310 - Kam Hamidieh
Example • Customers who buy tires at an auto service center purchase one tire, two tires, three tires, or a full set of four tires. The probability of buying one, two, three tires are 1/2, 1/4, 1/16 respectively. • Let Y = number of tires purchased. • Find: • P(Y = 4) • P(Y > 1) • P(Y ≥ 1) • P(Y = 1.4) BUAD 310 - Kam Hamidieh
In Class Exercise 3 Choose a person at random ask “How many times did you go to a gym?” Let X = # of times gone to gym last week Historical data show: Answer • Verify that this is a legitimate discrete model. • Describe the event X < 7 in words. What is P(X < 7)? • Express the event “worked out at least once in last week” in terms of X. What is the probability of this event? BUAD 310 - Kam Hamidieh
Expect Value or Mean of X • If Xis a random variable with possible values x1, x2, . . . ,xk occurring with probabilities p(x1), p(x2), …, p(xk), the expected value of Xis calculated as • The expected value of a random variable is the mean value of the variable X. • Expected value is a weighted average of the possible X values, where the weight are the probabilities. • E(X) gives us a summary measure of where average x values lies. BUAD 310 - Kam Hamidieh
Example BUAD 310 - Kam Hamidieh
Variance and SD of X • The variance of a random variable X is the expected value of the squared deviations from its mean μ. • The standard deviation of a random variable is just the square root of variance. • Official formula: BUAD 310 - Kam Hamidieh
Variance and SD of X • The standard deviation of a random variable is essentially the average distance the random variable falls from its mean over the long run. • Note that SD will be in the same units as your data. BUAD 310 - Kam Hamidieh
Example BUAD 310 - Kam Hamidieh
Question! We talked about mean, variance, and standard deviation before. We are talking about them again.What’s the difference? BUAD 310 - Kam Hamidieh
In Class Exercise 4 In Class Exercise 3 is continued: We have: Answer • Find E(X). • Write out the equation with numbers plugged in to find SD(X) but you need not compute the final answer. • How do you interpret E(X) and SD(X)? BUAD 310 - Kam Hamidieh
Next Time • More of Chapter 9 and start Chapter 10. BUAD 310 - Kam Hamidieh