Math 3680 Lecture #17 Two-Sample Inference

Math 3680 Lecture #17 Two-Sample Inference

Two-Sample Data: Matched Pairs

Example. An industrial safety program was recently instituted. Ten similar plants recorded the average weekly loss (averaged over a month) in man-hours due to accidents. The chart shows the results both before and after the safety program was implemented. Is the data statistically significant for the effectiveness of the safety program?

Note:We encountered this kind of problem before with the sign test. However, the sign test did not take into account the magnitudes of the differences between the before and after data; instead, the sign test only looked at which one was larger. Using our improved techniques of hypothesis testing, we are now able to give more a more powerful test to determine the effectiveness of the safety program.

Solution: Let X denote the before data, and Y the after data. Let D = X - Y, the difference between the two. H0: mD = 0 Ha: mD > 0 Significance level: a = 0.05.

The problem now reduces to regular one-variable hypothesis testing, which we know well by now. Using the P-value method, we reject the null hypothesis. It appears that the safety program was effective.

Example. Mechanical science engineers studied the impact of infrasound (sound waves at a frequency below the audibility range of the human ear) on a person’s blood pressure. Five university students were exposed to infrasound for one hour. See table. Does it appear that the mean systolic blood pressure changed as a result of the infrasound? Find a 95% (two-sided) confidence interval for the mean difference in blood pressure. C. Y. H. Qibai and H. Shi, Journal of Low Frequency Noise, Vibrationand Active Control, Vol. 23 (2004)

Two-Sample Data: Independent Samples with Different Variances

Previously, we had problems in which the data was obviously paired. However, it’s not uncommon to compare two different data sets which are not paired. Example. The Ohio EPA collected Index of Biotic Integrity (IBI) measurements for sites located in two Ohio river basins; high IBIs indicate healthier fish populations. Does it appear that the IBI values are the same for both locations? E. L. Boone, Y. Keying and E. P. Smith, Journal of Agricultural, Biological, and Environmental Sciences, Vol. 10 (2005)

Notice that this is a different problem than the one-sample problems that we saw earlier. Before, a typical question would be “Is the mean less than 0.4?”. Now, the question is, “Is there a difference?” For such problems, we can use all of the previous machinery of confidence intervals and hypothesis testing. However, a couple of things will be different: The computation of the standard error (and hence the test statistic), and The computation of the number of degrees of freedom (when using the Student’s t-distribution).

Let’s define As discussed in the past, We will typically be testing if the means are equal, in which case the null hypothesis will be mD = 0.

Furthermore, we will use Welch’s formula for computing the number of degrees of freedom: rounded down to the nearest integer. (There isn’t precise agreement on this, but we’ll defer this discussion to a more advanced statistics class.)

Example. The Ohio EPA collected Index of Biotic Integrity (IBI) measurements for sites located in two Ohio river basins; high IBIs indicate healthier fish populations. Does it appear that the IBI values are the same for both locations? E. L. Boone, Y. Keying and E. P. Smith, Journal of Agricultural, Biological, and Environmental Sciences, Vol. 10 (2005)

Solution. H0 : mM = mH, or mD = 0. Ha : mM = mH, or mD 0. Critical value: a = 0.05.

Now the cumbersome part:

so we use 101 degrees of freedom.

Test statistic: The critical values are 1.98373.

12.42% We fail to reject the null hypothesis. There is not enough evidence to think that the mean IBI values are different at these two locations.

Example. While carpets are nice in hospitals, they may not be sanitary. In a Montana hospital, bacteria levels per cubic foot of air were tested in 8 carpeted and uncarpeted rooms. a) Are the bacteria levels in the uncarpeted rooms lower than in the carpeted rooms? b) Find a 95% confidence interval for the difference in the mean number of bacteria per cubic foot of air. W. G .Walter and A. Stober, Journal of Environmental Health, Vol. 30, p. 405 (1968)

Two-Sample Data: Testing for Proportions with Independent Samples

Example. A mobile computer network consists of computers that maintain wireless communication with one another as they move about a given area. Two different protocols are compared. With protocol A, 170 of 200 (85%) sent messages were successfully received. With protocol B, 123 of 150 (82%) sent messages were successfully received. Can we conclude that protocol A has the higher success rate? T. Camp et. al., Proceedings of the IEEE International Conference on Communications pp. 3318-3324 (2002)

This problem is a case where the X and Y populations measure proportions which are assumed to be equal under the null hypothesis. Then In the last formula, the estimate p is pooled, meaning that we compute the total number of successes over the total number of trials.

Also, just like our previous problems regarding proportions, we use the normal distribution and not the Student t-distribution. Our test statistic will therefore be labeled z. As a consequence, we do not have to compute the degrees of freedom for these kind of problems.

Example. (Repeated for convenience) A mobile computer network consists of computers that maintain wireless communication with one another as they move about a given area. Two different protocols are compared. With protocol A, 170 of 200 (85%) sent messages were successfully received. With protocol B, 123 of 150 (82%) sent messages were successfully received. Can we conclude that protocol A has the higher success rate? T. Camp et. al., Proceedings of the IEEE International Conference on Communications pp. 3318-3324 (2002)

Solution. H0 : pX = pY Ha : pXpY Significance level: a = 0.05 Under the null hypothesis, the proportion pof received messages are the same under both protocols, and thus the population variance p(1 - p) is the same.

For p, we use the pooled proportion from both samples:

Test statistic: The critical values are 1.96.

45.21% We fail to reject the null hypothesis. There is not enough evidence to think that the proportions of successful deliveries are different.

Example. Researchers studied coliform bacteria counts among particles found in wastewater samples. Of 161 particles that were 75-80 mm in diameter, 19 contained coliform bacteria. Of 95 particles that were 90-95 mm in diameter, 22 contained coliform bacteria. Can we conclude that the larger particles are more likely to contain coliform bacteria? R. Emerick et. al., Water Environment Research pp. 432-438 (2000)

Math 3680 Lecture #17 Two-Sample Inference

Math 3680 Lecture #17 Two-Sample Inference

Presentation Transcript

Math 3680 Lecture #3 Probability

Math 3680 Lecture #19 Correlation and Regression

Math 3680 Lecture #4 Discrete Random Variables

Math 3680 Lecture #15 Confidence Intervals

Math 3680 Lecture #8 Continuous Random Variables

Math 3680 Lecture #15 Confidence Intervals

Math 3680 Lecture #5 Important Discrete Distributions