340 likes | 548 Vues
Medical Biometry I. ( Biostatistics 511) Week 6 Discussion Section Phillip Keung. Objectives. Solve specific probability problems Step-by-step demonstrations Translating questions (English) into events and random variables (probability)
E N D
Medical Biometry I (Biostatistics 511) Week 6 Discussion Section Phillip Keung Biostat 511
Objectives • Solve specific probability problems • Step-by-step demonstrations • Translating questions (English) into events and random variables (probability) • Applications of the probability rules (multiplication, total probability, conditional) and conditions like independence and mutually exclusive events (when appropriate) • Computing probabilities, quantiles and perform basic problem solving for continuous random variables (using the normal distribution, in particular) Biostat 511
Glaucoma & Diabetes [From Activity 5, #8] For a particular population, the lifetime probability of contracting glaucoma is approximately 0.007 and the lifetime probability of contracting diabetes is 0.020. A research study finds (for a sample population) that the probability of contracting both of these diseases in a lifetime is 0.0008. Questions: What is the lifetime probability of contracting either glaucoma or diabetes? What is the lifetime probability of contracting diabetes for a person who already has glaucoma? What is the lifetime probability of contracting glaucoma for a person who already has diabetes? On the basis of the information given, one can conclude that contracting diabetes and glaucoma (are/are not) independent. Biostat 511
Glaucoma & Diabetes To begin: [Step 1:] Translate the information (from English) provided into events (for probability). Events: Let G = the event that a person contracts glaucoma D = the event that a person contracts diabetes [Step 2:] Detail all other information provided in the description. P(G) = 0.007 P(D) = 0.020 P(G , D) = 0.0008 Are G and D mutually exclusive? Biostat 511
Glaucoma & Diabetes [Step 3:] Now translate each question (from English) into events using the information provided, using relations like “and”, “or” and “given”. (a) What is the lifetime probability of contracting either glaucoma or diabetes? Is equivalent to finding P(G or D). [Step 4:] Determine which rule to apply to solve P(G or D). Which one? Why do we subtract P(G, D)? General addition (“or”) rule P(G or D) = P(G) + P(D) – P(G , D) Biostat 511
Glaucoma & Diabetes [Step 5:] We have all the information we need to solve the problem P(G or D)= P(G) + P(D) – P(G , D) = 0.007 + 0.020 - 0.0008 = 0.0278. To recap, we 1. Translate the problem information into events 2. Detail all relevant information about the problem into events 3. Translated the question information into events (using “or”) 4. Determined which probability rules to apply (general addition) 5. Solved the problem Let’s move to the next question, part (b) … Biostat 511
Glaucoma & Diabetes Part (b): We don’t need to redefine the problem [Step 1] or the information provided about the problem [Step 2]. We start with [Step 3] [Step 3:] Translate part (b) from English to events using the relations “and”, “or” and “given.” (b) What is the lifetime probability of contracting diabetes for a person who already has glaucoma? Is equivalent to finding P(D | G). [Step 4:] Which rule to apply to solve P(D | G)? The conditional probability rule: P(D | G) = P(D , G)/P(G) = P(G , D)/P(G) Why? Biostat 511
Glaucoma & Diabetes [Step 5:] We have all the information needed. Solve the problem P(D | G) = P(D , G)/P(G) = 0.0008/0.007 = 0.1143. To recap 1. Translate the problem information into events 2. Detail all relevant information about the problem into events 3. Translated the question information into events (conditional prob.) 4. Determined which probability rules to apply (conditional prob.) 5. Solved the problem Moving on to question (c) … Biostat 511
Glaucoma & Diabetes Starting at [Step 3] [Step 3:] Translate English to our events using the relations “and”, “or” and “given”. (c) What is the lifetime probability of contracting glaucoma for a person who already has diabetes? Is equivalent to finding P(G | D). [Step 4:] Which rule to apply to solve P(G | D)? The conditional probability rule: P(G | D) = P(G , D)/P(D) [Step 5:] Solve the problem P(G | D) = 0.0008/0.020 = 0.04. Biostat 511
Glaucoma & Diabetes Part (d) [Steps 3 & 4:] Translation step of the question. (d) Are contracting diabetes and glaucoma independent? If they are independent, we know that any of the following will hold: P(G , D) = P(G) x P(D) P(D | G) = P(D) P(G | D) = P(G) [Step 5:] Solve the problem We previously solved (c) for P(G | D); we already know P(G) P(G | D) = 0.04; P(G) = 0.007. Diabetes and glaucoma are NOT independent. You can confirm using the other two conditions. Biostat 511
Alzheimers Disease [Adapted from Homework 4, #4] Suppose an unrelated 77-year old man, 76-year old woman and 82-year old woman are selected from a community. Questions: What is the probability that all three individuals will have Alzheimer’s disease? What is the probability that at least one of the women will have Alzheimer’s disease? What the probability that at least one of the three people will have Alzheimer’s disease? What is the probability that exactly one of the three people has Alzheimer’s disease? Biostat 511
Alzheimers Disease To begin: [Step 1:] Translate the information (from English) provided into events (for probability). Events: Let M = event that the 77-year old man has Alzheimer’s disease W1 = event that the 76-year old woman has Alzheimer’s disease W2 = event that the 82-year old woman has Alzheimer’s disease [Step 2:] Detail all other information provided in the description (See the table in Homework 4, #4 for the probabilities). P(M) = 0.049 P(W1) = 0.023 P(W2) = 0.078 All three individuals are unrelated What does this tell us? Biostat 511
Alzheimers Disease [Step 3:] Now translate each question (from English) into events using the information provided, using relations like “and”, “or” and “given”. (a) What is the probability that all three individuals will have Alzheimer’s disease? Is equivalent to finding P(M , W1 , W2). [Step 4:] Determine which rule to apply to solve P(M , W1 , W2). P(M , W1 , W2) General “and” rule P(M , W1 , W2) = P(M) x P(W1) x (W2) ? Can we do this? Why or why not? Since the individuals are unrelated, it’s reasonable to assume the events are independent. The product rule for independence will hold. Biostat 511
Alzheimers Disease [Step 5:] We have all the information we need to solve the problem P(W , M1 , M2) = P(M) x P(W1) x (W2) = (0.049)(0.023)(0.078) = 0.000088. Steps… 1. Translate the problem information into events 2. Detail all relevant information about the problem into events 3. Translated the “question” into events (using “and”) 4. Determined which probability rules to apply (independence) 5. Solved the problem Biostat 511
Alzheimers Disease Part (b) Begin at [Step 3] [Step 3:] Translation part: (b) What is the probability that at least one of the women will have Alzheimer’s disease? Counting up “at least one” requires us to think of all the possible things that can happen for our events W1 and W2. Consider all the possibilities (i.e., the sample space, W). The possible outcomes are Outcomes X = # Alzheimers Satisfies at least 1? O1=W1c, W2c 0 no O2=W1, W2c 1 yes O3=W1c, W2 1 yes O4=W1, W2 2 yes Here, the superscript “c” indicates the compliment event (i.e, no Alzheimers). There are three possible outcomes that satisfy “at least 1” (X > 1) Let’s find the probabilities of these events using our probability rules. Biostat 511
Alzheimers Disease [Step 4:] Which probability rule(s) to apply We have the last three possible outcomes satisfy “at least one”: P(W1, W2c) P(W1c , W2) P(W1, W2) Questions: (1) Are the events W1and W2 independent? (2) Are each of the three outcomes (above) mutually exclusive? If the answer to (1) is ‘yes’, we can write P(W1, W2c) = P(W1) x P(W2c) P(W1c , W2) = P(W1c) x P(W2) P(W1, W2) = P(W1) x P(W2) If the answer to (2) is ‘yes’ how can we use it to help answer question (b)? Biostat 511
Alzheimers Disease [Step 4:] (continued…) If the three outcomes are mutually exclusive, how can we combine these outcomes to arrive at our answer? P(At least 1) = P(X > 1) = P( [W1, W2c] or [W1c , W2] or [W1, W2] ) = P(W1, W2c) + P(W1c , W2) + P(W1, W2) Since the outcomes are mutually exclusive, we can add the probabilities of the individual outcomes. Using these two rules (i.e., mutually exclusive outcomes and independent events), we have P(X > 1) = P( [W1, W2c] or [W1c , W2] or [W1, W2] ) = P(W1, W2c) + P(W1c , W2) + P(W1, W2) = P(W1) x P(W2c) + P(W1c) x P(W2) + P(W1) x P(W2) = P(W1) x [1-P(W2)] + [1-P(W1)] x P(W2) + P(W1) x P(W2) [Step 5:] Solve the problem P(X > 1) = P(W1) x [1-P(W2)] + [1-P(W1)] x P(W2) + P(W1) x P(W2) = (.023)(0.922) + (.977)(.078) + (.023)(.078) = 0.0992. Biostat 511
Alzheimers Disease [Step 5:] (continued…) Now the clever person would realize that the solution could be had using: compliment events and its associated probability rule. We see from the set of all possible events, W, the sample space can be represented by the two complimentary events [X < 1] with [X > 1]. We know from the rule of total probability that P(X > 1) = 1 – P(X < 1). There is only one outcome, neither woman has Alzheimer’s, that satisfies [X < 1]. It is the same as writing [X=0]. Our solution can be found as P(X > 1) = 1 – P(X < 1) = 1 – P(X = 0) = 1 – P(W1c, W2c) = 1 – P(W1c)P(W2c) by independence = 1 – [1-P(W1)][1-P(W2)] by compliments = 1 – (.977)(.922) = 0.0992. Biostat 511
Alzheimers Disease Part (c) We can begin at [Step 3] [Step 3:] Translation part (c)What is the probability that at least one of the three people will have Alzheimer’s disease? Counting up “at least one” requires us to think of all the possible outcomes that can happen for our three events M, W1 and W2. Let’s consider all the possibilities (i.e., the sample space, W). They are Outcomes X = # Alz Satisfies at least 1? O1=Mc, W1c, W2c 0 no O2=M, W1c, W2c 1 yes O3=Mc, W1, W2c 1 yes O4=Mc, W1c, W2 1 yes O5=Mc, W1, W2 2 yes O6=M, W1c, W2 2 yes O7=M, W1, W2c 2 yes O8=M, W1, W2 3 yes Biostat 511
Alzheimers Disease [Step 4:] Which probability rule(s) to apply We have seven possible outcomes that “satisfy at least 1” [X > 1]. We could calculate the probabilities of all seven outcomes and sum them. What else can we do? Use the compliment probability rule to find the solution to P(X > 1) here: P(At least one) = P(X > 1) = 1 – P(X < 1) = 1 – P(X = 0) = 1 – P(no Alz dis. cases) P(X > 1) = 1 – P(X = 0) = 1 – P(Mc , W1c , W2c) = 1 – P(Mc) x P(W1c) x P(W2c) by independence = 1 – [1–P(M)] [1–P(W1)] [1–P(W2)] [Step 5:] Solve the problem P(X > 1) = 1 – P(X = 0) = 1 – [1–P(M)] [1–P(W1)] [1–P(W2)] = 1 – [1–.049] [1–.023] [1–.078] = 1 – (.951)(.977)(.922) = .1433. Biostat 511
Alzheimers Disease Part (d) We can begin at [Step 3] [Step 3:] Translation part (d)What is the probability that exactly one of the three people will have Alzheimer’s disease? Looking again at the outcomes for M, W1 and W2. Outcomes X = # Alz X = 1? O1=Mc, W1c, W2c 0 no O2=M, W1c, W2c 1 yes O3=Mc, W1, W2c 1 yes O4=Mc, W1c, W2 1 yes O5=Mc, W1, W2 2 no O6=M, W1c, W2 2 no O7=M, W1, W2c 2 no O8=M, W1, W2 3 no Biostat 511
Alzheimers Disease [Step 4:] Which probability rule(s) to apply: We have the three outcomes that satisfy “exactly one”: P(O2 or O3 or O4) P(O2) = P(M , W1c , W2c) P(O3) = P(Mc, W1, W2c) P(O4) = P(Mc, W1c , W2) Questions: (1) Are the events M, W1and W2 independent? (2) Are each of the three outcomes, O2, O3, O4, mutually exclusive? If the events are independent, we can write P(M , W1c , W2c) = P(M) x P(W1c) x P(W2c) P(Mc, W1, W2c) = P(Mc) x P(W1) x P(W2c) P(Mc, W1c , W2) = P(Mc) x P(W1c) x P(W2) If the three outcomes are mutually exclusive, we can sum their probabilities P(O2or O3 or O4) = P(O2) + P(O3) + P(O4) Biostat 511
Alzheimers Disease [Step 4:] (continued…) P(X = 1) = P( [M, W1c , W2c] or [Mc, W1, W2c] or [Mc, W1c , W2] ) = P(M, W1c , W2c) + P(Mc, W1, W2c) + P(Mc, W1c , W2) The outcomes are mutually exclusive; we can add the probabilities of the three outcomes. Using these two rules (i.e., mutually exclusive outcomes, independent events), we have P(X = 1) = P(M , W1c , W2c) + P(Mc, W1, W2c) + P(Mc, W1c , W2) = P(M) P(W1c)P(W2c) + P(Mc)P(W1)P(W2c) + P(Mc)P(W1c)P( W2) = P(M) [1-P(W1)][1-P(W2)] + [1-P(M)]P(W1)[1-P(W2)] + [1-P(M)][1-P(W1)]P(W2) [Step 5:] Solve the problem P(X = 1) = P(M) [1-P(W1)][1-P(W2)] + [1-P(M)]P(W1)[1-P(W2)] + [1-P(M)][1-P(W1)]P(W2) = (.049)(.977)(.922) + (.951)(.023)(.922) + (.951)(.977)(.078) = .1368. Challenge: Could we use the Binomial distribution to solve these problems? Biostat 511
Problem Solving andthe Normal Distribution • We will investigate random variables (characteristics), X, that will be modeled as having a normal distribution with some mean and variance (or standard deviation). • The focus will be solving problems where we • Compute probabilities for intervals or ranges of values for a normal random variable, X • Finding “values” of a normal random variable, X0, that correspond to specific probabilities • Use our 5 steps to problem solving, when needed Biostat 511
Normal Distribution • The normal distribution or “bell-shaped” curve has two parameters. • For a normal random variable, X, we have • = the mean of X • = the standard deviation of X • We sometimes write this in shorthand as X ~ N(, ) • We already know the following about X • P( - < X < + ) = 0.68 • P( - 2 < X < + 2) = 0.95 approximately • P( - 3 < X < + 3) = 0.99 approximately • We’ll find solutions for values other than +/- k, (for k = 1, 2, 3) Biostat 511
Standard Normal Distribution - Calculating Probabilities First, let’s consider the standard normal - N(0,1). We will usually use Z to denote a random variable with a standard normal distribution. In Stata, Pr(Z<1.65) is given by normal(1.65). The normal table is also useful for performing probability calculations. ? Biostat 511
Standard Normal Probabilities P[Z < 1.65] = P[Z > 0.5] = P[-1.96 < Z < 1.96] = P[Z < 1.96] - P[Z < -1.96] P[-0.5 < Z < 2.0] = Biostat 511
Converting to Standard Normal Q: This solves the problem for the N(0,1) case. How do we do calculate normal probabilities when the mean is not 0 and the standard deviation is not equal to 1? A:Any normal random variable can be transformed to N(0,1) E(X- ) = V(X- ) = V(X) = 2 V( (X- )/ ) = Linear transformations of normal random variables are still normal. So Z = (X-m)/s ~ N ( ? , ? ) Biostat 511
Serum Cholesterol Example: Serum cholesterol is approximately normally distributed with mean 219 mg/mL and standard deviation 50 mg/mL. If the clinically desirable range is < 200 mg/mL, then what proportion of the population falls in this range? X = serum cholesterol in an individual. = = negative values for cholesterol - huh? Biostat 511
Serum Cholesterol Return to cholesterol example... Serum cholesterol is approximately normally distributed with mean 219 mg/mL and standard deviation 50 mg/mL. If the clinically desirable range is < 200 mg/mL, then what proportion of the population falls in this range? = .3520 In Stata, it is given by the command:display normal(-0.38) Biostat 511
Summary • Five (5) Steps to Problem Solving • Translation of information into events • Collect all information in terms of events • Translate the problem/question into the events defined in (1) using probability rules like “and”, “or” or “given”, or possibly in terms of random variables, such as [X > 1], [X=1] • Use probability rules (e.g., addition, multiplication, independence, etc.) to help solve the problem • Solve the problem • These steps should be followed, for any probability or “word” problem. Biostat 511