1 / 49

Comparing Two Proportions

Statistics. Comparing Two Proportions. Be able to state the null and alternative hypotheses for testing the difference between two population proportions.

roden
Télécharger la présentation

Comparing Two Proportions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Comparing Two Proportions

  2. Be able to state the null and alternative hypotheses for testing the difference between two population proportions. Know how to examine your data for violations of conditions that would make inference about the difference between the two population proportions unwise or invalid. Understand that the formula for the standard error of the difference between two independent sample proportions is based on the principle that when finding the sum or difference of two independent random variable, their variances add. What you will learn

  3. Variances of independent random variables added— • The variance of a sum or difference of independent random variables is the sum of the variances of those variables. Terms

  4. Sampling distribution— • The sampling distribution of is, under appropriate assumptions, modeled by a Normal model with mean and standard deviation Terms

  5. Two-proportion z-interval— • A two-proportion z-interval gives a confidence interval of the true difference in proportions, p1 – p2 , in two independent groups. • The confidence interval is where z* is a critical value from the standard Normal model corresponding to a specified confidence level. terms

  6. Pooling— • When we have data from different sources that we believe are homogeneous, we can get a better estimate of the common proportion and its standard deviation. We can combine, or pool, the data into a single group for the purpose of estimating the common proportion. The resulting pooled standard error is based on more data and is thus more reliable (in the null hypothesis is true and the groups are truly homogenous). Terms

  7. Two-proportion z-test— • Test the null hypothesis H0: p1 – p2 = 0 by referring the statistic to a standard Normal model. Terms

  8. Who do you think is more intelligent, men or women? • Gallup poll of 520 women and 506 men. • 28% of the men thought men were more intelligent. • 14% of the women thought men were more intelligent. • Comparing two percentages are much more common than questions with isolated percentages. • Example– Treatment is better than placebo control • Example– This year’s results are better than last year’s. Example

  9. We know the difference between the two proportions of the random sample is 14%, but what is the true difference? We would like to find the true difference and the margin of error. For this we need to determine the standard deviation of the sampling distribution model for the difference in the proportions. Comparing two proportions

  10. Remember– The variance of the sum or difference of two independent random variables is the sum of their variances. (Chapter 16). Why will this work? Comparing two proportions

  11. How does this work? Consider grabbing a box of cereal. It claims there are 16 ounces in the box. We know that this is not exact because there is some variance from box to box. When you pour 2 ounces of cereal in a bowl, there will be further variance from bowl to bowl. How much cereal is left in the box? Comparing two proportions

  12. According to our rule, the amount of cereal left in the box would now be the sum of the two variances. We need the standard deviation, not the variance which is finding the square root of the variance. Comparing two proportions

  13. Here are the formulas. This formula applies only when X and Y are independent. Comparing two proportions

  14. The samples can have different sizes and different proportion values. We use subscripts to keep the different values straight. In comparing males and females, we could use the subscripts of M and F or 1 and 2. Comparing two proportions

  15. The standard deviations of the sample proportions are: Comparing two proportions

  16. The variance of the difference in the proportions is: The standard deviation is: Comparing two proportions

  17. Since we usually don’t know the true values of p1 and p2, we use the sample proportions from the data we are given. We use them to estimate the variances and find the standard error. Comparing two proportions

  18. Within each group the data should be based on results for independent individuals. • Randomization Condition– • The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment. • The 10% Condition— • If the data are sampled without replacement, the sample should not exceed 10% of the population. Independence Assumptions

  19. Since we are comparing two groups, we need to add the Independent Assumption. • This is the most important assumption. • Independent Groups Assumption— • The two groups we are comparing must also be independent of each other. Usually, the independence of the groups from each other is evident in the way data were collected. Independence Assumptions

  20. Each of the groups must be big enough. • Success/Failure Condition— • Both groups are big enough that at least 10 successes and at least 10 failures have been observed in each. Sample Size condition

  21. The sampling distribution model for a difference between two independent proportions. • Provided that the sampled values are independent, the samples are independent, and the sample sizes are large enough, the sampling distribution of is modeled by a Normal model with and standard deviation Sampling Distribution

  22. If we have the sampling distribution model and the standard deviation, we have what we need to find the margin of error for the differences in proportions. Sampling Distribution

  23. Two-proportion z-interval— • When the conditions are met, we are ready to find the confidence interval for the difference of two proportions, . The confidence interval is where we find the standard error of the difference, from the observed proportions. The critical value z* depends on the particular confidence level, C, that you specify. Sampling Distribution

  24. Consider this example— The National Sleep Foundation asked a random sample of 1010 U.S. adults questions about their sleep habits. The study ensured that there was an equal number of men and women. On the question about snoring had 995 respondents, 37% of adults reported that they snored at least a few nights a week during the past year. 26% of the 184 people under 30 snored with 39% of the 811 in the older group. Can the difference really be 13% or is it due to the natural fluctuations in the sample that was chosen? Pooling

  25. This type of question uses a hypothesis test. What would be the null hypothesis? H0: p1 – p2 = 0 or H0: p1 = p2 What would be the alternative hypothesis? HA: Pooling

  26. The hypothesis is about a new parameter– the difference in proportions. We need to find the standard error for that. But we can actually do better than the standard error. Pooling

  27. The proportions and the standard deviations are linked. There are two proportions in the standard error formula, but look at the null hypothesis. It claims the proportions are equal. To test the hypothesis, we assume that the null hypothesis is true. This means that there is a single value for in the SE formula. Pooling

  28. How can we do this? If the null hypothesis is true, then among all adults the two groups have the same proportion. We will see 48 + 318 = 366 snorers out of a total of 184 + 811 = 995 adults who responded to the question. The overall proportion of snorers was 366/995 = 0.3678. Pooling

  29. Pooling– Combining the counts to get an overall proportion. Whenever we we have data from different sources or different groups but we believe that they really came from the same underlying population, we can pool them to get better estimates. Pooling

  30. When we have only proportions and not the counts, as in the snoring example, we have to reconstruct the number of successes by multiplying the sample sizes by the proportions. If these calculations don’t come out to whole numbers, round first. There must have been a whole number of successes to begin with. (This is the only time you round in the middle of a calculation.) Pooling

  31. We can then put the pooled value into the formula, substituting it for both sample proportions in the standard error formula. Pooling

  32. Snoring-- Pooling

  33. A presidential candidate fears he has a problem with women voters. His campaign staff plans to run a poll to assess the situation. They’ll randomly sample 300 men and 300 women, asking if they have a favorable impression of the candidate. Obviously, the staff can’t know this, but suppose the candidate has a positive image with 59% of males but with only 53% of females. Example-- #1 Page 507

  34. What kind of sampling design is his staff planning to use? This is a stratified random sample, stratified by gender. Example-- #1 Page 507

  35. What difference would you expect the poll to show? We would expect the difference in proportions in the sample to be the same as the difference in proportions in the population, with the percentage of the respondents with a favorable impression of the candidate 6% higher among males. Example-- #1 Page 507

  36. Of course, sampling error means the poll won’t reflect the difference perfectly. What’s the standard error for the difference in the proportions? The standard deviation of the difference proportions is: Example-- #1 Page 507

  37. Sketch a sampling model for the size difference in proportions of men and women with favorable impressions of this candidate that might appear in a poll like this. Example-- #1 Page 507 Difference in proportion with favorable impression (Male – Female) 68% 95% -6% -2% 2% 6% 10% 14% 18% 99.7%

  38. Could the campaign be misled by the poll, concluding that there really is no gender gap? Explain. The campaign could certainly be misled by the poll. According to the model, a poll showing little difference could occur relatively frequently. That result is only 1.5 standard deviations below the expected difference in proportions. Example-- #1 Page 507

  39. In October 2000 the U.S. Department of Commerce reported the results of a large-scale survey on high school graduation. Researchers contacted more than 25,000 Americans aged 24 years to see if they had finished high school; 84% of the 12,460 males and 88.1% of the 12,678 females indicated that they had high school diplomas. Example-- #4 Page 508

  40. Are the assumptions and conditions necessary for inference satisfied? Explain. • Randomization condition— • Assume that the samples are representative of all recent graduates. • 10% condition— • Although large, the samples are less than 10% of all graduates. • Independent samples condition— • The sample of men and the sample of women were drawn independently of each other. • Success/Failure condition— • The samples are very large, certainly large enough for the methods of inference to be used. Example-- #4 Page 508

  41. Create a 95% confidence interval for the difference in graduation rates between males and females. Example-- #4 Page 508

  42. Interpret your confidence interval. We are 95% confident that the proportion of 24-year old American women who have graduated from high school is between 2.4% and 4.0% higher than the proportion of American men the same age who have graduated from high school. Example-- #4 Page 508

  43. Does this provide strong evidence that girls are more likely than boys to complete high school? Explain. Since the interval for the difference in proportions of high school graduates does not contain 0, there is strong evidence that women are more likely than men to complete high school. Example-- #4 Page 508

  44. The painful wrist condition called carpal tunnel syndrome can be treated with surgery or less invasive wrist splints. In September 2002, Time magazine reported on a study of 176 patients. Among the half that had surgery, 80% showed improvement after three months, but only 54% of those who used the wrist splints improved. Example– #6 Page 508

  45. What’s the standard error of the difference in the two proportions? Example– #6 Page 508

  46. Construct a 95% confidence interval for this difference. • Randomization condition– • It’s not clear whether or not this study was an experiment. If so, assume that the subjects were randomly allocated to treatment groups. If not, assume that the subjects are representative of all carpal tunnel sufferers. • 10% condition— • 88 subjects in each group are less than 10% of all carpal tunnel sufferers. • Independent samples condition— • The improvement rates of the two groups are not related. • Success/Failure condition-- • All are greater than 10, so the samples are large enough. Example– #6 Page 508

  47. Success/Failure condition— • All are greater than 10, so the samples are large enough. • Since the conditions have been satisfied, we will find a two-proportion z-interval. Example– #6 Page 508

  48. Success/Failure condition— • Since the conditions have been satisfied, we will find a two-proportion z-interval. Example– #6 Page 508

  49. State an appropriate conclusion. • We are 95% confident that the proportion of patients who show improvement in carpal tunnel syndrome with surgery is between 12.6% and 39.4% higher than the proportion who show improvement with wrist splints. Example– #6 Page 508

More Related