Please turn off cell phones, pagers, etc. The lecture will begin shortly.

Please turn off cell phones, pagers, etc. The lecture will begin shortly. There will be a quiz at the end of today’s lecture.

Final Exam Info The final exam will take place Wednesday, May 3 from 12:20 to 2:10 pm in 105 and 108 Forum. It will consist of 50 questions. 10 will come from material covered on Exam 1; 10 from Exam 2; 10 from Exam 3; and 20 from material covered after Exam 3. It will be very helpful to review Exams 1, 2 and 3. At least five questions from each of these exams will appear on the final. More information on the final will be given next week.

Lecture 33 Today’s lecture will cover material from Chapter 21. • Review CI for a proportion • CI for a mean (Section 21.1) • CI for a difference between two means (Section 21.1)

1. Review CI for a Proportion Confidence Intervals Last time, we introduced the idea of a confidence interval (CI). A confidence interval is a range of numbers (i.e., a lower limit and an upper limit) computed from sample data that covers a population value with at least the specified probability. Remember: The level of confidence (typically 95%) is not a property of a single CI from one sample; rather, it is a property of the procedure if it were applied over and over with many samples.

p plus or minus 2 × sqrt( p × (1 – p ) / n ) CI for a proportion A 95% confidence interval for a proportion is The sample estimate, plus or minus 2 standard errors. Earlier in the semester, we claimed that one could take the estimate plus or minus the “margin of error” (one divided by the square root of n). That method does work, but the resulting interval is wider than necessary. If the sample proportion is close to .5, the two methods are essentially the same. But if the proportion is close to 0 or 1, the new formula produces a more precise interval. From now on, we’ll use the new formula.

SE = sqrt( p × (1 – p) / n ) p = 560 / 800 = .7 Example In a sample of 800 drivers, 560 (that is, 70%) said that they were “better-than-average.” Find a 95% confidence interval for the percentage of drivers in the population who think they are better than average. Solution = sqrt( .7 × .3 / 800 ) = sqrt( .0002625 ) = .0162 2×SE = .0324 The 95% CI goes from .7 – .0324 = .6676 = 66.8% to .7 + .0324 = .7324 = 73.2%.

2. Confidence interval for a mean Recall the Central Limit Theorem for a sample mean: • A sample mean is • approximately normally distributed with • mean = μ and • standard deviation = σ / sqrt(n) Recall that μ is the population mean, and σ is the population standard deviation. The standard deviation of the sample mean, σ / sqrt(n), is more commonly called the “standard error of the mean.” Calling σ / sqrt(n) by this new name will help us to distinguish it from the population standard deviation σ.

sample mean plus or minus 2 × σ / sqrt( n ) CI for a mean A 95% confidence interval for a population mean is Note on SE of the mean: The SE of the mean is σ / sqrt( n ). In most cases, σ is unknown and must be replaced by the standard deviation of the sample. Replacing σ by the sample standard deviation is OK as long as the sample is reasonably large (say, n ≥ 30). If n is small, then the interval should be widened slightly to account for the extra uncertainty in estimating σ. But we won’t worry about that in Stat 100.

Example Forty-two sedentary male subjects were placed on a diet for one year. In this group, the average weight loss was 7.2 kg with a standard deviation of 3.7 kg. Find a 95% CI for the true mean weight loss (i.e. the weight loss if everyone in this population were placed on a diet for one year). Solution The SE of the mean is estimated to be 3.7 / sqrt (42 ) = 0.57. Two SE’s is 2 × .57 = 1.14. The 95% confidence interval goes from 7.2 – 1.14 = 6.06 kg to 7.2 + 1.14 = 8.34 kg.

Another example Another sample of forty-seven sedentary male subjects was placed on an exercise regimen. Over the course of one year, the average weight loss in this group was 4.0 kg, and the standard deviation was 3.9 kg. Find a 95% CI for the true average weight loss under the exercise regimen. Solution The SE of the mean is estimated to be 3.9 / sqrt (47 ) = 0.57. Two SE’s is 2 × .57 = 1.14. The 95% confidence interval goes from 4.0 – 1.14 = 2.86 kg to 4.0 + 1.14 = 5.14 kg.

Is one better than the other? In the last two examples, we found that • a 95% CI for the average weight loss from • dieting alone went from 6.06 kg to 8.34 kg. • a 95% CI for the average weight loss from • exercise alone went from 2.86 kg to 5.14 kg. Notice that these two intervals do not overlap. Thus we can be quite confident that the true average weight loss from dieting alone is greater than the true average weight loss from exercise alone. A better way to help us decide if one population mean is greater than another is to compute a 95% interval for the difference between the two means.

3. CI for difference between two means Suppose that we draw independent samples from two different populations. Here, “independent” means that there are no strong connections between the subjects in one sample and the subjects in the other sample. Independence is usually a reasonable assumption as long as the members of sample #1 and the members of sample #2 are not the same subjects, and if they are not related in any particular way (e.g. husbands and wives). (If the same subjects are being measured twice, or if each subject in sample #1 has a pairwise relationship with a subject in sample #2, then the method I am about to show you should not be used.)

[ ] 2 2 (SE of first mean) + (SE of second mean) Standard error of the difference The difference between the sample means is diff = mean from sample #1 – mean from sample #2 The standard error of this difference is SE diff = square root of Find the standard error for each mean, square them, add them up, then take the square root.

2 2 2 c a + b = c a 2 2 c = sqrt( a + b ) b Analogy to Pythagorean Theorem Recall the Pythagorean Theorem: The same principle applies to the SE of the difference. Think of the SE’s of the two means as the lengths of the two legs of a right triangle (a and b). The SE of the difference is the length of the hypotenuse (c). The length of the hypotenuse is the square root of the sum of the squared lengths of the legs.

CI for the difference between two means A 95% confidence interval for the difference between two means is diff plus or minus 2 × SE diff Remember that diff is the mean of sample #1 minus the mean of sample #2. The SE of a sample mean is the standard deviation from that sample, divided by the size of that sample. Find the SE for each sample mean, square them, add them up, and take the square root. That will give you “SE diff.”

Example 2 2 The SE of the difference is sqrt( .57 + .57 ) = 0.81. Among the 42 male subjects who dieted, the average weight loss was 7.2 kg, and the standard deviation was 3.7 kg. Among the 47 male subjects who dieted, the average weight loss was 4.0 kg, and the standard deviation was 3.9 kg. Find a 95% CI for the difference in means. Solution The SE of the first mean is 3.7 / sqrt( 42 ) = 0.57. The SE of the second mean is 3.9 / sqrt( 47 ) = 0.57. diff = 7.2 – 4.0 = 3.2 2 × SE diff = 2 ×.81 = 1.62. The 95% CI goes from 3.2 – 1.62 = 1.58 to 3.2 + 1.62 = 4.82.

Testing a hypothesis Once we have the 95% CI for the difference, we can easily test the null hypothesis “The two population means are equal” versus the alternative hypothesis “The two population means are not equal” We should reject the null hypothesis and accept the alternative if the confidence interval does not cover zero. In the last example, the confidence interval (1.58 to 4.82) did not cover zero. So we may conclude that dieting alone is more effective than exercising alone, in the sense that it produces a greater average weight loss.

Today’s quiz • Write your name legibly. • Write the Pythagorean Theorem (just the formula). • Give the formula that shows how to compute “SE diff” from the SE’s of the two means.

Please turn off cell phones, pagers, etc. The lecture will begin shortly.