Chapter 9 Inferences from Two Samples

Chapter 9Inferences fromTwo Samples 9.2 Inferences About Two Proportions 9.3 Inferences About Two Means (Independent) 9.4 Inferences About Two Means (Matched Pairs) 9.5 Comparing Variation in Two Samples

Objective Compare the parameters of two populations using two samples from each population. Use Confidence Intervals and Hypothesis Tests For the first population use index 1For the second population use index2 9.2 Compare p1 , p2 9.3 Compare µ1 , µ2 (Independent) 9.4 Compare µ1 , µ2 (Matched Pairs) 9.5 Compare σ12 , σ22

Section 9.2Inferences About Two Proportions Objective Compare the proportions of two populations using two samples from each population. Hypothesis Tests and Confidence Intervals of two proportions use the z-distribution

p1 First population proportion n1 First sample size x1 Number of successes in first sample p1 First sample proportion Notation First Population

p2 Second population proportion n2 Second sample size x2 Number of successes in second sample p2 Second sample proportion Notation Second Population

The pooled sample proportion p x1 + x2 p = n1 + n2 Definition q =1 – p

(1) Have two independentrandom samples (2) For each sample: The number of successes is at least 5 The number of failures is at least 5 Requirements Both requirements must be satisfied to make a Hypothesis Test or to find a Confidence Interval

Tests for Two Proportions The goal is to compare the two proportions H0:p1 =p2 H1:p1 p2 H0:p1 =p2 H1:p1 <p2 H0:p1 =p2 H1:p1 >p2 Two tailed Left tailed Right tailed Note: We only test the relation between p1 and p2 (not the actual numerical values)

^ ^ ( p1 –p2 ) – ( p1 –p2 ) z= pq pq + n2 n1 Finding the Test Statistic Note: p1 –p2 =0 according to H0 This equation is an altered form of the test statistic for a single proportion (see Ch. 8-3)

Test Statistic Note: Hypothesis Tests are done in same way as in Ch.8 (but with different test statistics)

Steps for Performing a Hypothesis Test on Two Proportions • Write what we know • State H0 and H1 • Draw a diagram • Calculate the sample and pooled proportions • Find the Test Statistic • Find the Critical Value(s) • State the Initial Conclusion and Final Conclusion Note: Same process as in Chapter 8

Example 1 The table below lists results from a simple random sample of front-seat occupants involved in car crashes. Use a 0.05 significance level to test the claim that the fatality rate of occupants is lower for those in cars equipped with airbags. p1: Proportion of fatalities with airbagsp2: Proportion of fatalities with no airbags Claimp1 <p2 What we know: x1 =41 x2 =52 α=0.05 n1 =11541 n2 =9853 Claim:p1 <p2 Note: Each sample has more than 5 successes and failures, thus fulfilling the requirements

Given: x1 =41 x2 =52 α=0.05 n1 =11541 n2 =9853 Claim:p1 <p2 Example 1 Diagram z-dist. H0:p1 =p2 H1:p1 <p2 –zα = –1.645 z = –1.9116 Left-Tailed H1 =Claim Sample Proportions Pooled Proportion Test Statistic Critical Value (Using StatCrunch) Initial Conclusion:Since z is in the critical region, reject H0 Final Conclusion: We Accept the claim the fatality rate of occupants is lower for those who wear seatbelts

Stat → Proportions → Two sample → With summary Given: x1 =41 x2 =52 α=0.05 n1 =11541 n2 =9853 Claim:p1 <p2 Example 1 Sample 1: Number of successes: . Number of observations: Sample 2: Number of successes: . Number of observations: ●Hypothesis Test Diagram z-dist. H0:p1 =p2 H1:p1 <p2 Null: prop. diff.= Alternative –zα = –1.645 z = –1.9116 Left-Tailed H1 =Claim Using StatCrunch 41 11541 0 52 < 9853 P-value = 0.028 Initial Conclusion:Since P-value is less than α(with α =0.05), reject H0 Final Conclusion: We Accept the claim the fatality rate of occupants is lower for those who wear seatbelts

Confidence Interval Estimate We can observe how the two proportions relate by looking at the Confidence Interval Estimate of p1–p2 CI = ( (p1–p2) – E,(p1–p2) + E) Where

Example 2 Use the same sample data in Example 1 to construct a 90% Confidence Interval Estimate of the difference between the two population proportions (p1–p2) x1 =41 x2 =52 p1 = 0.003553n1 =11541 n2 =9853 p2 = 0.005278 CI = (-0.003232, -0.000218 ) Note: CI negative implies p1–p2 is negative. This implies p1<p2

Stat → Proportions → Two sample → With summary Example 2 Use the same sample data in Example 1 to construct a 90% Confidence Interval Estimate of the difference between the two population proportions (p1–p2) Sample 1: Number of successes: . Number of observations: Sample 2: Number of successes: . Number of observations: ●Confidence Interval Level x1 =41 x2 =52n1 =11541 n2 =9853 Using StatCrunch 41 11541 0.9 52 9853 CI = (-0.003232, -0.000218 ) Note: CI negative implies p1–p2 is negative. This implies p1<p2

Interpreting Confidence Intervals • If a confidence interval limits does not contain 0, it implies there is a significant difference between the two proportions (i.e. p1 ≠ p2). • Thus, we can interpret a relation between the two proportions from the confidence interval. • In general: • If p1 = p2 then the CI should contain 0 • If p1 > p2 then the CI should be mostly positive • If p1 > p2 then the CI should be mostly negative

Example 3 • Drug Clinical Trial • Chantix is a drug used as an aid to stop smoking. The number of subjects experiencing insomnia for each of two treatment groups in a clinical trial of the drug Chantix are given below: • Use a 0.01 significance level to test the claim proportions of subjects experiencing insomnia is the same for both groups. • Find the 99% confidence level estimate of the difference of the two proportions. Does it support the result of the test? Placebo 805 13 Chantix Treatment 129 19 Number in group Number experiencing insomnia • What we know: x1 =41 x2 =52 α=0.01 • n1 =129 n2 =9853 Claim:p1=p2 Note: Each sample has more than 5 successes and failures, thus fulfilling the requirements

Given: x1 =19 x2 =13 α=0.01 • n1 =129 n2 =805 Claim:p1=p2 Example 3a Diagram z-dist. z = 7.602 H0:p1 =p2 H1:p1 ≠p2 -zα/2 = -2.576 zα/2 = 2.576 Two-Tailed H0 =Claim Sample Proportions Pooled Proportion Test Statistic Critical Value (Using StatCrunch) Initial Conclusion:Since z is in the critical region, reject H0 Final Conclusion: We Reject the claim the proportions of the subjects experiencing insomnia is the same in both groups.

Stat → Proportions → Two sample → With summary • Given: x1 =19 x2 =13 α=0.01 • n1 =129 n2 =805 Claim:p1=p2 Example 3a Sample 1: Number of successes: . Number of observations: Sample 2: Number of successes: . Number of observations: ●Hypothesis Test Diagram z-dist. H0:p1 =p2 H1:p1 ≠p2 Null: prop. diff.= Alternative Two-Tailed H0 =Claim Using StatCrunch 19 129 0 13 ≠ 805 P-value < 0.0001 i.e. the P-value is very small Initial Conclusion:Since the P-value is less than α (0.01), reject H0 Final Conclusion: We Reject the claim the proportions of the subjects experiencing insomnia is the same in both groups.

Example 3b Use the same sample data in Example 3 to construct a 99% Confidence Interval Estimate of the difference between the two population proportions (p1–p2) x1 =19 x2 =13 p1 = 0.14729n1 =129 n2 =805 p2 = 0.01615 CI = (0.0500, 0.2123 ) Note: CI does not contain 0 implies p1 and p2 have significant difference.

Stat → Proportions → Two sample → With summary Example 3b Use the same sample data in Example 3 to construct a 99% Confidence Interval Estimate of the difference between the two population proportions (p1–p2) Sample 1: Number of successes: . Number of observations: Sample 2: Number of successes: . Number of observations: ●Confidence Interval Level x1 =19 x2 =13n1 =129 n2 =805 Using StatCrunch 19 129 0.9 13 805 CI = (0.0500, 0.2123 ) Note: CI does not contain 0 implies p1 and p2 have significant difference.

Section 9.3Inferences About Two Means(Independent) Objective Compare the proportions of two independent means using two samples from each population. Hypothesis Tests and Confidence Intervals of two proportions use the t-distribution

Definitions Two samples are independent if the sample values selected from one population are not related to or somehow paired or matched with the sample values from the other population Examples: Flipping two coins (Independent) Drawing two cards (not independent)

μ1 First population mean σ1 First population standard deviation n1 First sample size x1 First sample mean s1 First sample standard deviation Notation First Population

μ2 Second population mean σ2 Second population standard deviation n2 Second sample size x2 Second sample mean s2 Second sample standard deviation Notation Second Population

(1) Have two independentrandom samples (2)σ1 and σ2 are unknown and no assumption is made about their equality (3) Either or both the following holds: Both sample sizes are large (n1>30, n2>30) or Both populations have normal distributions Requirements All requirements must be satisfied to make a Hypothesis Test or to find a Confidence Interval

Tests for Two Independent Means The goal is to compare the two Means H0:μ1 =μ2 H1:μ1 ≠μ2 H0:μ1 =μ2 H1:μ1 <μ2 H0:μ1 =μ2 H1:μ1 >μ2 Two tailed Left tailed Right tailed Note: We only test the relation between μ1 and μ2 (not the actual numerical values)

Finding the Test Statistic Note: m1 –m2 =0 according to H0 Degrees of freedom: df= smaller of n1 – 1 and n2 – 1. This equation is an altered form of the test statistic for a single mean when σ unknown (see Ch. 8-5)

Test Statistic Degrees of freedomdf= min(n1 – 1, n2 – 1) Note: Hypothesis Tests are done in same way as in Ch.8 (but with different test statistics)

Steps for Performing a Hypothesis Test on Two Independent Means • Write what we know • State H0 and H1 • Draw a diagram • Find the Test Statistic • Find the Degrees of Freedom • Find the Critical Value(s) • State the Initial Conclusion and Final Conclusion Note: Same process as in Chapter 8

Example 1 A headline in USA Today proclaimed that “Men, women are equal talkers.” That headline referred to a study of the numbers of words that men and women spoke in a day. Use a 0.05 significance level to test the claim that men and women speak the same mean number of words in a day.

n1 =186 n2 =210 α=0.05 x1 =15668.5 x2 =16215.0 Claim:μ1 =μ2 s1 =8632.5 s2 =7301.2 Example 1 H0:µ1 =µ2 H1:µ1 ≠µ2 t-dist. df = 185 Two-Tailed H0 =Claim t = 7.602 -tα/2 = -1.97 tα/2 = 1.97 Test Statistic Degrees of Freedom df= min(n1 – 1, n2 – 1) = min(185, 209) = 185 Critical Value tα/2 = t0.025 = 1.97 (Using StatCrunch) Initial Conclusion:Since t is not in the critical region, accept H0 Final Conclusion: We accept the claim that men and women speak the same average number of words a day.

Stat → T statistics → Two sample → With summary n1 =186 n2 =210 α=0.05 x1 =15668.5 x2 =16215.0 Claim:μ1 =μ2 s1 =8632.5 s2 =7301.2 Example 1 Sample 1: Mean Std. Dev. Size Sample 2: Mean Std. Dev. Size ●Hypothesis Test H0:µ1 =µ2 H1:µ1 ≠µ2 Null: prop. diff.= Alternative Two-Tailed H0 =Claim 15668.5 8632.5 Using StatCrunch 0 186 ≠ 16215.0 (Be sure to not use pooled variance) 7301.2 (No pooled variance) 210 P-value = 0.4998 Initial Conclusion:Since P-value > α (0.05), accept H0 Final Conclusion: We accept the claim that men and women speak the same average number of words a day.

Confidence Interval Estimate We can observe how the two proportions relate by looking at the Confidence Interval Estimate of μ1–μ2 CI = ( (x1–x2) – E,(x1–x2) + E) 2 2 Where df = min(n1–1, n2–1)

Example 2 Use the same sample data in Example 1 to construct a 95% Confidence Interval Estimate of the difference between the two population proportions (µ1–µ2) n1 =186 n2 =210 x1 =15668.5 x2 =16215.0 s1 =8632.5 s2 =7301.2 df = min(n1–1, n2–1) = min(185, 210) = 185 tα/2 = t0.05/2 = t0.025 = 1.973 x1 - x2 = 15668.5 – 16215.0 = -546.5 df = min(n1–1, n2–1) = min(185, 210) = 185 tα/2 = t0.1/2 = t0.05 = 1.973 x1 - x2 = 15668.5 – 16215.0 = -546.5 (x1 - x2) + E = -546.5 + 1596.17 = 1049.67 (x1 - x2) – E = -546.5 – 1596.17 = -2142.67 CI = (-2142.7, 1049.7)

Stat → T statistics → Two sample → With summary Example 2 Use the same sample data in Example 1 to construct a 95% Confidence Interval Estimate of the difference between the two population proportions (µ1–µ2) Sample 1: Mean Std. Dev. Size Sample 2: Mean Std. Dev. Size ●Confidence Interval Level: n1 =186 n2 =210 x1 =15668.5 x2 =16215.0 s1 =8632.5 s2 =7301.2 15668.5 8632.5 0.95 186 16215.0 7301.2 Using StatCrunch (No pooled variance) 210 CI = (-2137.4, 1044.4) Note: slightly different because of rounding errors

Example 3 Consider two different classes. The students in the first class are thought to generally be older than those in the second. The students’ ages for this semester are summed as follows: (a) Use a 0.1 significance level to test the claim that the average age of students in the first class is greater than the average age of students in the second class. (b) Construct a 90% confidence interval estimate of the difference in average ages. n1 =93n2 =67 x1 =21.2x2 =19.8 s1 =2.42s2 =4.77

n1 =93 n2 =67 α = 0.1 x1 =21.2 x2 =19.8 Claim: µ1 >µ2 s1 =2.42 s2 =4.77 Example 3a H0:µ1 =µ2 H1:µ1 >µ2 t-dist. df = 66 Right-Tailed H1 =Claim Test Statistic tα/2 = 1.668 t = 7.602 Degrees of Freedom df= min(n1 – 1, n2 – 1) = min(92, 66) = 66 Critical Value tα/2 = t0.05 = 1.668 (Using StatCrunch) Initial Conclusion:Since t is in the critical region, reject H0 Final Conclusion: We accept the claim that the average age of students in the first class is greater than that in the second.

Stat → T statistics → Two sample → With summary n1 =93 n2 =67 α = 0.1 x1 =21.2 x2 =19.8 Claim: µ1 >µ2 s1 =2.42 s2 =4.77 Example 3a Sample 1: Mean Std. Dev. Size Sample 2: Mean Std. Dev. Size ●Hypothesis Test H0:µ1 =µ2 H1:µ1 >µ2 Null: prop. diff.= Alternative Right-Tailed H1 =Claim 21.2 2.42 Using StatCrunch 0 93 ≠ 19.8 (Be sure to not use pooled variance) 4.77 (No pooled variance) 67 P-value = 0.0299 Initial Conclusion:Since P-value < α (0.1), reject H0 Final Conclusion: We accept the claim that the average age of students in the first class is greater than that in the second.

n1 =93 n2 =67 α = 0.1 x1 =21.2 x2 =19.8 s1 =2.42 s2 =4.77 Example 3b (90% Confidence Interval) df = min(n1–1, n2–1) = min(92, 66) = 66 tα/2 = t0.1/2 = t0.05 = 1.668 x1 - x2 = 21.2 – 19.8 = 1.4 (x1 - x2) + E = 1.4 + 1.058 = 2.458 (x1 - x2) – E = 1.4 – 1.058 = 0.342 mp CI = (0.34, 2.46)

Stat → T statistics → Two sample → With summary n1 =93 n2 =67 α = 0.1 x1 =21.2 x2 =19.8 s1 =2.42 s2 =4.77 Example 3b Sample 1: Mean Std. Dev. Size Sample 2: Mean Std. Dev. Size ●Hypothesis Test (90% Confidence Interval) Null: prop. diff.= Alternative 21.2 2.42 Using StatCrunch 0 93 ≠ 19.8 (Be sure to not use pooled variance) 4.77 (No pooled variance) 67 CI = (0.35, 2.45)

Chapter 9 Inferences from Two Samples