Chapter 10

Statistics for Business(Env) Chapter 10 Statistical Inferences Based onTwo Samples

Statistical Inferences Based on Two Samples 10.1 Comparing Two Population Means by Using Independent Samples: Variances Known 10.2 Comparing Two Population Means by Using Independent Samples: Variances Unknown 10.3 Paired Difference Experiments 10.4 Comparing Two Population Proportions by Using Large, Independent Samples

Comparing Two Population Means by Using Independent Samples: Variances Known • Suppose a random sample has been taken from each of two different populations • Suppose that the populations are independent of each other • Then the random samples are independent of each other • Then the sampling distribution of the difference in sample means is normally distributed

Do the achievement scores for children taught by method A differ from the scores for children taught by method B?

A research design that uses a separate sample for each treatment condition (or for each population) is called an independent-measuresresearch design or a between-subjectsdesign. The goal of an independent-measures research study is to evaluate the mean difference between two populations (or between two treatment conditions).

Sampling Distribution of theDifference of Two Sample Means #1 • Suppose population 1 has mean µ1 and variance σ12 • From population 1, a random sample of size n1 is selected which has mean and variance s12 • Suppose population 2 has mean µ2 and variance σ22 • From population 2, a random sample of size n2 is selected which has mean and variance s22 • Then the sample distribution of the difference of two sample means…

Sampling Distribution of theDifference of Two Sample Means #2 • Is normal, if each of the sampled populations is normal • Approximately normal if the sample sizes n1 and n2 are large • Has mean = µ1 – µ2 • Has standard deviation

If you select one score from each of these two populations, the closest two values are X1 =50 and X2 =30. The two values that are farthest apart are X1 =70 and X2 =20. µ1 µ2

Sampling Distribution of theDifference of Two Sample Means #3

z-Based Confidence Interval for the Difference in Means (Variances Known) • Let be the mean of a sample of size n1 that has been randomly selected from a population with mean m1 and standard deviation s1 • Let be the mean of a sample of size n2 that has been randomly selected from a population with m2 and s2 • Suppose each sampled population is normally distributed or that the samples sizes n1 and n2 are large • Suppose the samples are independent of each other, then …

z-Based Confidence Interval for the Difference in Means Continued • A 100(1 – ) percent confidence interval for the difference in populations µ1–µ2 is

Example 10.1 The Bank Customer Waiting Time Case #1 • A random sample of size 100 waiting times observed under the current system of serving customers has a sample mean of 8.79 • Call this population 1 • Assume population 1 is normal or sample size is large • The variance is 4.7 • A random sample of size 100 waiting times observed under the new system of time of 5.14 • Call this population 2 • Assume population 2 is normal or sample size is large • The variance is 1.9 • Then if the samples are independent …

Example 10.1 The Bank Customer Waiting Time Case #2 • At 95% confidence, z/2 = z0.025 = 1.96, and • According to the calculated interval, the bank manager can be 95% confident that the new system reduces the mean waiting time by between 3.15 and 4.15 minutes

z-Based Test About the Difference in Means (Variances Known) • Test the null hypothesis aboutH0: µ1 – µ2 = D0 • D0 = µ1 – µ2 is the claimed difference between the population means • D0 is a number whose value varies depending on the situation • Often D0 = 0, and the null means that there is no difference between the population means

z-Based Test About the Difference in Means (Variances Known) • Use the notation from the confidence interval statement on a prior slide • Assume that each sampled population is normal or that the samples sizes n1 and n2 are large

Test Statistic (Variances Known) • The test statistic is • The sampling distribution of this statistic is a standard normal distribution • If the populations are normal and the samples are independent ...

z-Based Test About the Difference inMeans (Variances Known) • Reject H0: µ1 – µ2 = D0 in favor of a particular alternative hypothesis at a level of significance if the appropriate rejection point rule holds (i.e. calculated z is in the rejection region). • Rules are on the next slide…

Hypothesis Tests forTwo Population Means Two Population Means, Known Population Variances Lower-tail test: H0: μ1μ2 H1: μ1 < μ2 i.e., H0: μ1 – μ2 0 H1: μ1 – μ2< 0 Upper-tail test: H0: μ1≤μ2 H1: μ1>μ2 i.e., H0: μ1 – μ2≤ 0 H1: μ1 – μ2> 0 Two-tail test: H0: μ1 = μ2 H1: μ1≠μ2 i.e., H0: μ1 – μ2= 0 H1: μ1 – μ2≠ 0

Hypothesis tests for μ1 – μ2 Two Population Means, Known Population Variances Lower-tail test: H0: μ1 – μ2 0 H1: μ1 – μ2< 0 Upper-tail test: H0: μ1 – μ2≤ 0 H1: μ1 – μ2> 0 Two-tail test: H0: μ1 – μ2= 0 H1: μ1 – μ2≠ 0 a a a/2 a/2 -za za -za/2 za/2 Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2 or Z > Za/2

Two cities, Boston and Kingston are both in Massachusetts. The mean household income in Boston is $38,000. The population s.d. is known to be $6,000 for a sample of 40 households. The mean income in Kingston is $35,000 for a sample of 35 households. The population s.d. is known to be $7,000. At the .01 significance level can we conclude the mean income in Boston is more? EXAMPLE

EXAMPLE Step 1 State the null and alternate hypotheses. H0: µB< µK H1: µB> µK Step 2 Select the level of significance. The .01 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Since both samples are more than 30, we can use z as the test statistic. Step 4 State the decision rule. The null hypothesis is rejected if t is greater than 2.326 or p < .01.

EXAMPLE Step 5: Compute the value of z and make a decision. Because the computed Z of 1.98 < critical Z of 2.26, the p-value of .0239 > .01 (), the decision is not to reject the H0. We cannot conclude that the mean household income in Boston is larger.

Comparing Two Population Means by Using Independent Samples: Variances Unknown • In general, the true values of the population variances σ12 and σ22 are not known • They have to be estimated from the sample variances s12 and s22, respectively

Comparing Two Population Means by Using Independent Samples: Variances Unknown #2 • Also need to estimate the standard deviation of the sampling distribution of the difference between sample means • Two approaches: • If it can be assumed that σ12 = σ22 = σ2, then calculate the “pooled estimate” of σ2 • If σ12 ≠ σ22, then use approximate methods

Pooled Estimate of σ2 • Assume that σ12 = σ22 = σ2 • The pooled estimate of σ2 is the weighted averages of the two sample variances, s12 and s22 • The pooled estimate of σ2 is denoted by sp2 • The estimate of the population standard deviation of the sampling distribution is

One sample compared with 2 samples statistics Assume that σ12 = σ22 = σ2 Mean SS2 df2

t-Based Confidence Interval for the Difference in Means (Variances Unknown) • Select independent random samples from two normal populations with equal variances • A 100(1 – ) percent confidence interval for the difference in populations µ1 – µ2 is • where • and t/2 is based on (n1+n2-2) degrees of freedom (df)

Finding the value of the test statistic requires two steps: Step One: Pool the sample standard deviations. Step Two: Determine the value of t from the following formula.

Hypothesis tests for μ1 – μ2 Two Population Means, Unknown Population Variances Lower-tail test: H0: μ1 – μ2 0 H1: μ1 – μ2< 0 Upper-tail test: H0: μ1 – μ2≤ 0 H1: μ1 – μ2> 0 Two-tail test: H0: μ1 – μ2= 0 H1: μ1 – μ2≠ 0 a a a/2 a/2 -ta ta -ta/2 ta/2 Reject H0 if t < -ta Reject H0 if t > ta Reject H0 if t < -ta/2 or t > ta/2

Example: A recent EPA study compared the highway fuel economy of domestic and imported passenger cars. A sample of 15 domestic cars revealed a mean of 33.7 mpg with a sample standard deviation of 2.4 mpg. A sample of 12 imported cars revealed a mean of 35.7 mpg with a sample standard deviation of 3.9. At the .05 significance level can the EPA conclude that the mpg is higher on the imported cars?

Example: (continued) Step 1 State the null and alternate hypotheses. H0: µD> µI H1: µD< µI Step 2 State the level of significance. The .05 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution.

Example: (continued) Step 4 The decision rule is to reject H0 if t<-1.708. There are n1 + n2 – 2 or 25 degrees of freedom. Step 5 We compute the pooled variance.

Example: (continued) We compute the value of t as follows.

Example: (continued) Since a computed z of –1.64 > critical z of –1.71, H0 can not be rejected. There is insufficient sample evidence to claim a higher mpg on the imported cars. -1.71 -1.64

Example: Comparing Mean weights To show if boys are heavier than girls of the same age, a survey is conducted in which a sample of 15 boys shows a mean weight of 41Kg and a standard deviation of 3Kg. A group of 10 girls of the same age shows a mean weight of 38Kg and a standard deviation of 2Kg. Assuming both the weights of boys and girls follow the normal distribution. At the level of significant 0.05, test if the average weight of boys is greater than the average weight of girls of the same age. Step 1 State the null and alternate hypotheses. H0: µg> µb H1: µg< µb

Example: (continued) Step 2 State the level of significance. The .05 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution. Step 4 The decision rule is to reject H0 if t > t0.05 =1.714. There are n1 + n2 – 2 or 23 degrees of freedom.

Example: (continued) Step 5 Compute the pooled variance and t. S2p = [(15-1)*32 + (10-1)*22]/ (15+10-2) = 7.04 Sp = 2.65 t = (41-38) / sqrt(7.04*(1/15 + 1/10)) = 2.77 =1.714 Since t =2.77 > t0.05 =1.714, we reject H0. So the mean weight of boys is larger than the mean weight of girls of the same age.

Example: Directed reading activities in the classroom A class of 21 third-graders participates in these activities for 8 weeks while a control classroom of 23 third-graders follows the same curriculum without the activities. After the 8 weeks, all children take a reading test (scores in table). At a level of significance 0.05, can we conclude directed reading activities help improve reading ability? Step 1 State the null and alternate hypotheses. H0: µ1 = µ2 H1: µ1= µ2

Example: Directed reading activities(continued) Step 2 State the level of significance. The .05 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution. Step 4 The decision rule is to reject H0 if t > t0.025 =1.97 or t < -t0.025 . There are n1 + n2 – 2 or 42 degrees of freedom.

Example: Directed reading activities(continued) Step 5 Compute the pooled variance and t. S2p = [(21-1)*11.012 + (23-1)*17.152]/ (21+23-2) = 211.79 t = (51.48-41.52) / sqrt(211.79*(1/21 + 1/23)) = 9.96/4.39=2.27 Since t =2.27 > t0.025 =1.97, we reject H0. So there are significant difference between the 2 group.

Example: Directed reading activities(continued) Step 1 State the null and alternate hypotheses. H0: µ2> µ1 H1: µ2< µ1 Step 5 Compute the pooled variance and t. S2p = [(21-1)*11.012 + (23-1)*17.152]/ (21+23-2) = 211.79 t = (51.48-41.52) / sqrt(211.79*(1/21 + 1/23)) = 9.96/4.39=2.27 There are n1 + n2 – 2 or 42 degrees of freedom. The rule is to reject H0 if t > t0.05 =1.65.

You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data: NYSENASDAQNumber 21 25 Sample mean 3.27 2.53 Sample std dev 1.30 1.16 Pooled Variance t Test: Example Assuming both populations are approximately normal with equal variances, isthere a difference in average yield ( = 0.05)?

Calculating the Test Statistic The test statistic is:

H0: μ1 - μ2 = 0 i.e. (μ1 = μ2) H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)  = 0.05 df = 21 + 25 - 2 = 44 Critical Values: t = ± 1.96 Test Statistic: Solution Reject H0 Reject H0 .025 .025 t 0 1.96 -1.96 2.040 Decision: Conclusion: Reject H0 at a = 0.05 There is evidence of a difference in means.

Two kinds of studies • So far, we have studied : • two sets of sample data that come from two independent populations (e.g. women and men, or students from program A and from program B). • However, sometimes we want to study • two sets of sample data that come from related populations (e.g. “before treatment” and “after treatment”). Independent samples Paired samples

Paired/Dependent Samples Independent samples are samples that are not related in any way. Dependent samplesare samples that are paired or related in some fashion. • *The same subjects measured at two different points in time (repeated-measures). • *Matched or paired observations • *Hypothesis test proceeds just as in the one sample case.

Assume you work in the finance department. Is the new financial package faster (a=0.05 level)? You collect the following processing times for same set of jobs: Paired-Sample t Test: Example Existing System (1)New Software (2)Difference Di 9.98 Seconds 9.88 Seconds .10 9.88 9.86 .02 9.84 9.75 .09 9.99 9.80 .19 9.94 9.87 .07 9.84 9.84 .00 9.86 9.87 - .01 10.12 9.98 .14 9.90 9.83 .07 9.91 9.86 .05

Is the new financial package faster (a = 0.05 level)? Paired-Sample t Test: Example H0: mD £0 H1: mD>0 Reject a =.05 D = .072 a =.05 Critical Value=1.8331df = n - 1 = 9 1.8331 3.66 Decision: Reject H0 t Stat. in the rejection zone. Test Statistic Conclusion: The new software package is faster.

Paired-Sample: Example-twins Suppose we collect 8 pairs of twins. The first twin in the pair is healthy; the second is not. For each twin, we measure grey matter density (gmd). Is grey matter density in the populations significantly different ? Processed data from the 8 pairs is shown below (units not given). Consider the population differences, D = X1 - X2,

Hypothesis Testing Involving Paired Observations (continued) If σD is unknown, we can estimate the unknown population standard deviation with a sample standard deviation: where D is the mean of the differences sd is the (sample) s.d. of the differences n is the number of pairs (differences) The test statistic for D is now a t statistic, with n-1 d.f.

Chapter 10

Chapter 10

Presentation Transcript

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

CHAPTER 10

CHAPTER 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

10~Chapter 10

CHAPTER 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10