460 likes | 535 Vues
This text explains conditional probability concepts and common mistakes in homework, emphasizing the importance of prior probabilities and the law of total probability. It offers a step-by-step approach to calculating probabilities and suggests simplifications for easier understanding.
E N D
HW 8 I wanted to bring up a couple of issues from grading HW 8. Even people who got problem #1 exactly right didn’t think about it enough. The problem was supposed to be easy, and it is… if you think about it.
Definition of Conditional Probability Some of you knew the definition of conditional probability, and tried to apply that: P(C/ W) = P(C & W) ÷ P(W) What’s important here is that the instructions did not tell you the prior (unconditional) probability of winning P(W).
A Common Mistake The instructions explained the probability of winning by chance alone, as if you were betting on 10 coin flips: P(W/ not-C), the probability of winning fairly. The most common mistake on the homework was taking P(W) to be equal to 1 ÷ 1024 and using that number to calculate P(W/ C).
Cheating Increases Winning But remember: that’s the probability that someone who isn’t cheating will win. Once you add in cheaters along with the non-cheaters the probability of winning increases. If the chances were still 1 ÷ 1024 even when people started cheating, nobody would bother to cheat!
P(W)? But how do you find out the probability of winning, P(W)? You know the probability of winning given that you cheated: P(W/ C) = 100%. Here, the law of total probability is useful.
Law of Total Probability This is the law for the special case where we have a binary variable like C: P(W) = P(W/ C)P(C) + P(W/ not-C)P(not-C) The first term is going to be 100% times the probability of being a cheater, which is 250 ÷ 1024000.
Second Term P(W) = P(W/ C)P(C) + P(W/ not-C)P(not-C) The second term is going to be (1 in 1024) times the probability that a randomly selected players is not a cheater: P(not-C) = 1 – P(C) = 1 – (250 ÷ 1024000) = 1023750 ÷ 1024000.
P(W) So we get: P(W) = (250 ÷ 1024000) + [(1 ÷ 1024) x (1023750 ÷ 1024000)] = (250 + 999.75) ÷ 1024000 = 1249.75 ÷ 1024000
P(C & W) Now we just need the probability that a randomly selected individual will be a winner and a cheater, P(C & W). Since only cheaters will be winners and cheaters, and every winner and cheater will be a cheater, C & W = C. So P(C & W) = P(C) = 250 ÷ 1024000.
20.004% Now we can use the definition of conditional probability: P(C/ W) = P(C & W) ÷ P(W) = (250 ÷ 1024000) ÷ (1249.75 ÷ 1024000) = 250 ÷ 1249.75 = 20.004%
An Easier Way? But that was hard and it involved a lot of numbers. How can we make it easier? The first thing we need to observe is that both the numerator and the denominator share a common factor:
Remove Common Factors P(C/ W) = P(C & W) ÷ P(W) P(C & W) is going to be the number of cheaters divided by the number of people who play roulette. And P(W) is going to be the number of winners divided by the number of people who play roulette. So you can forget about the common factor: 1 ÷ 1024000.
#W ÷ #C P(C/ W) = P(C & W) ÷ P(W) = #(C & W) ÷ #W The probability that someone who won was a cheater is just the number of people who cheat and win out of the number of people who win. And as we noted before, #(C & W) = #C. So: P(C/ W) = #C ÷ #W
Think about It But even that wasn’t very smart, and needed us to remember the definition of P(C/ W). What if we’re no good at probability? Well, think about the question. We have someone who is a winner. Did he cheat? If the percentage of cheaters among winners is very low, probably not. How many cheaters are there out of the total number of winners?
#C? #W? Now we’ve reduced the problem to two questions: how many people cheat and how many people win? If we take those two numbers and divide the first by the second, we get the probability that someone who wins has cheated. How many cheaters are there out of the total number of winners?
#C I made the answer to the question “how many cheaters are there?” very easy. The probability of being a cheater is 250 ÷ 1024000, and the population of players is 1024000, so #C =(250 ÷ 1024000) x 1024000 = 250. No calculator required!
#W? The second question, “how many winners are there?” is a little bit tougher. It should be the number of cheaters, #C, the number we just calculated– 250– plus the people who won without cheating. #W = #C + #(W & not-C)
#(W & not-C) How do we figure out the # of winners who did not cheat, #(W & not-C) =[P(W/ not-C) x #not-C]? We know the # of non-cheaters, it’s 1024000 – 250, or 1023750. And we know probability that a non-cheater will win, it’s 1 in 1024. So: #(W & not-C) = 1023750 ÷ 1024 = 999.75
P(C/ W) So here’s our answer: P(C/ W) = #C ÷ [#C + #(W & not-C)] = 250 ÷ (250 + 999.75) = 20.04%
999.75 ≈ 1000 If you realized that 999.75 was almost the same as 1000 (there’s only a .25 difference), you could solve the problem in your head: P(C/ W) = 250 ÷ (250 + 1000) = 250 ÷ [250 + (250 x 4)] = 250 ÷ (250 x 5) = 1 ÷ 5 = 20%
Problem #2 In the second problem, you were supposed to give three potential explanations for an as-yet unexplained correlation: the positive correlation between the car accident rates in areas of Chicago, and the rates of street crime in those areas. If an area has more car accidents, it has more street crime and vice versa.
Correlation ≠ Causation There were two problems that I saw show up frequently in your “car crashes cause street crime” and “street crime causes car crashes” explanations. The first was simply a failure to recognize that correlation is not the same as causation, and pointing out a correlation is not the same as explaining it.
Sample Answer Here’s a sample answer that I got: “According to the article, areas that had higher rates of street crime had more car accidents. So street crime causes car accidents.” The claim “areas that had higher rates of street crime had more car accidents” is just a description of the correlation.
Causal Explanation If you want to provide an explanation of a phenomenon, you have to do two things: (a) make a causal claim and (b) provide a mechanism. To explain why car accidents and street crime are correlated you might say (a) car accidents cause street crime and (b) the way this happens is that car accidents distract people and make them vulnerable to robbery.
Second Problem The second problem with some of your “A causes B” and “B causes A” explanations was that some of you often confused common cause explanations for them. For example: “Here’s how car accidents cause street crime: people with a deviant tendency often drive recklessly. This tendency also causes them to commit crimes.”
Really a Common Cause Here, the deviant tendency is what is causing both the car accidents and the street crime. If you stopped the car accidents (for example, by removing everyone’s driver’s license) you wouldn’t reduce the street crime, because it’s the deviant tendency and not the car accidents that causes the crime.
Common Cause There were also some misconceptions about common cause explanations. To explain a correlation between A and B by a common cause C, you make two causal claims: C causes A and C causes B and provide two mechanisms: say why C causes A and why it cause B too.
C Must Be One Thing Importantly, C can’t be two different things, one of which explains A and the other of which explains B. For example, you can’t say C = “people who don’t pay attention and need money” and then go on to explain: “If people don’t pay attention, then they get in car accidents; if they need money, they commit robberies.”
X and Y Correlated? Unless you had an independent reason to think that that there was a strong correlation between not paying attention and needing money, this couldn’t possibly explain the correlation between car accidents and street crime. This explanation is really of the form “X = not paying attention causes car crashes” and “Y = needing money causes street crime.”
Bad Explanation And you can’t argue like this: X causes A Y causes B___________________ Therefore A and B are correlated. Compare: Broken bones cause extreme pain. Smiling babies cause happiness.______ Therefore, extreme pain is correlated with happiness.
Correlated In the Right Way It’s also important that the common cause in your explanation is correlated in the right way with the variables A and B. Here’s an example of the wrong way: “C = residents having more money. People who have more money will buy more cars, and more cars means more accidents. People who have more money will turn to street crime less often.”
Positive vs. Negative Correlations This proposed explanation provides a variable C, “how much money residents have” that causes the values of the variables A, “number of car accidents” and B, “amount of street crime.” However, it predicts that we should see areas of Chicago with more accidents and less crime, and areas with more crime and fewer accidents. It predicts that crime and crashes are negatively correlated.
Goal of Causal Models Finally, remember what the goal of finding various causal models for discovered correlations is. We want to know the best explanation for the correlation. This requires that any proposed explanation, like a common cause explanation, be reasonable.
Unreasonable Explanation I got a lot of examples like this: “night time/ bad weather/ natural disasters are the common cause that explains the correlation between car accidents and street crime. When it’s dark outside, more crime happens. Also when it’s dark, it’s tougher to see and more easy to crash. So darkness causes both increased crime and increased car accidents.”
How Dark It Is But “how dark it is outside” is a variable that has different values at different times. I’m sure there is a positive correlation between street crime at a certain time of day and car accidents at that time of day– for exactly this reason. But the correlation we wanted to explain was not the values of “street crime” and “car crashes” at different times of day– there’s a positive correlation at different parts of the city!
Observed Correlation Here’s the observed correlation: neighborhoods in Chicago that have higher rates of car crashes have higher rates of street crime. But it gets dark at the same time in every neighborhood in Chicago. No neighborhood has more night-time than any other neighborhood. So the reason that neighborhoods differ in these ways cannot be explained by the correlation between street crime and car accidents at night.
Bad Weather & Natural Disasters The same thing is true for bad weather and earthquakes. All the areas of Chicago have the same amount of good weather and the same amount of bad weather. So even if there’s more crime and more accidents during bad weather, this can’t explain why some areas of Chicago have more crime and more accidents than other areas.
Variables This confusion is partly my fault. I should have explained variables more carefully. Here’s how I introduced variables: a variable is something that takes on different values. So for example, “height” is a variable because different people have different heights, and “daylight” is a variable, because different times have different amounts of daylight.
Domain of a Variable But there is an important concept here that I did not stress: the domain of the variable. Different people have different heights, but different times do not. So people (and other things that can have heights, like buildings) are in the domain of the variable “height.” Different times can have different amounts of daylight, but different people (at least people at the same latitude) cannot, so only “time” is in the domain of “amount of daylight.”
Subtly Different Variables Some variables can seem the same, even if they are different, because they have different domains. So there is one variable “crime” that has as its domain time. At different times, there are different amounts of crime. And there is another variable “crime” whose domain is places: different places have different amounts of crime.
Keep the Domain Fixed When you want to explain a correlation between two variables with domain D, and you are proposing a common cause variable C, C must also have D as its domain. Since the variables we’re looking at are varying rates of crime and car accidents by location, we need to look at potential common causes that vary by location, not daylight or bad weather.
Think The answers to HW 8 were mostly good. Still, I can’t teach you everything, there’s only so much time in class. You need to think about the problems. What’s the probability that someone who won was a cheater? The answer is a lot easier if you think first instead of breaking out the calculator and the statistics textbooks.
Think What is a common cause explanation for correlated levels of car accidents and street crime over different locations in Chicago? Some answers to this question (“earthquakes”) don’t make any sense. I can explain why they don’t make sense, but you don’t need that explanation to see that they don’t work. Think about how the causal model is supposed to run, and you will see that it doesn’t.