1 / 19

The Analysis of Variance

10. The Analysis of Variance. 10.1. Single-Factor ANOVA. Single-Factor ANOVA. Single-factor ANOVA focuses on a comparison of more than two population or treatment means. Let l = the number of populations or treatments being compared

dolan
Télécharger la présentation

The Analysis of Variance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10 The Analysis of Variance

  2. 10.1 Single-Factor ANOVA

  3. Single-Factor ANOVA • Single-factor ANOVA focuses on a comparison of more than two population or treatment means. Let • l = the number of populations or treatments being compared • 1 = the mean of population 1 or the true average response when treatment 1 is applied • . • . • . • I = the mean of population I or the true average response when treatment I is applied

  4. Single-Factor ANOVA • The relevant hypotheses are • H0: 1 = 2 = ··· = I • versus • Ha: at least two the of the i’s are different • If I =4, H0 is true only if all four i’s are identical. Ha would be true, for example, if 1 = 23 = 4, if 1 = 3 = 42, or if all four i’s differ from one another.

  5. The Idea of ANOVA • The sample means for the three samples are the same for each set. • The variation among sample meansfor (a) is identical to (b). • The variation among the individuals within the three samples is much less for (b). • CONCLUSION: the samples in (b) contain a larger amount of variation among the sample means relative to the amount of variation within the samples, so ANOVA will find more significant differences among the means in (b) • assuming equal sample sizes here for (a) and (b). • Note: larger samples will find more significant differences.

  6. Comparing Several Means Do SUVs, trucks and midsize cars have same gas mileage? • Response variable: gas mileage (mpg) • Groups: vehicle classification • 31 midsize cars • 31 SUVs • 14 standard-size pickup trucks Data from the Environmental Protection Agency’s Model Year 2003 Fuel Economy Guide, www.fueleconomy.gov.

  7. Comparing Several Means Means: Midsize: 27.903 SUV: 22.677 Pickup: 21.286 • Mean gas mileage for SUVs and pickups appears less than for midsize cars. • Are these differences statistically significant?

  8. Comparing Several Means Means: Midsize: 27.903 SUV: 22.677 Pickup: 21.286 Null hypothesis: The true means (for gas mileage) are the same for all groups (the three vehicle classifications). We could look at separate t tests to compare each pair of means to see if they are different: 27.903 vs. 22.677, 27.903 vs. 21.286, & 22.677 vs. 21.286 H0: μ1 = μ2H0: μ1 = μ3H0: μ2 = μ3 However, this gives rise to the problem of multiplecomparisons!

  9. The One-Way ANOVA Model Random sampling always produces chance variations. Any “factor effect” would thus show up in our data as the factor-driven differences plus chance variations (“error”): Data = fit + residual The one-way ANOVA model analyzes situations where chance variations are normally distributed N(0,σ) such that:

  10. The ANOVA F Test To determine statistical significance, we need a test statistic that we can calculate: The ANOVA F Statistic The analysis of variance F statistic for testing the equality of several means has this form: Difference in means small relative to overall variability Difference in means large relative to overall variability  F tends to be large  F tends to be small Larger F-values typically yield more significant results. How large depends on the degrees of freedom (I− 1 and N− I).

  11. The ANOVA F Test • The measures of variation in the numerator and denominator are mean squares: • Numerator: Mean Square for Treatments (MSTr) • Denominator: Mean Square for Error (MSE)

  12. Notation • The individual sample means will be denoted by X1, X2, . . ., XI. • That is, • for i=1,…,I • Similarly, the average of all N observations, called the grand mean, is

  13. Notation • Additionally, let , denote the sample variances: • for i=1,…,I

  14. The ANOVA Table • The computations are often summarized in a tabular format, called an ANOVA table in below Table. • Tables produced by statistical software customarily include a P-value column to the right of f. • An ANOVA Table

  15. F Distributions and the F Test • Both v1 and v2 are positive integers. Figure 10.3 pictures an F density curve and the corresponding upper-tail critical value Appendix Table A.9 gives these critical values for  = .10, .05, .01, and .001. • Values of v1 are identified with different columns of the table, and the rows are labeled with various values of v2. • An F density curve and critical value • Figure 10.3

  16. Nematodes and plant growth Hypotheses: All mi are the same (H0) versus not All mi are the same (Ha) Do nematodes affect plant growth? A botanist prepares 16 identical planting pots and adds different numbers of nematodes into the pots. Seedling growth (in mm) is recorded two weeks later.

  17. Output for the one-way ANOVA numerator denominator Here, the calculated F-value (12.08) is larger than Fcritical (3.49) for a=0.05. Thus, the test is significant at a 5%  Not all mean seedling lengths are the same; the number of nematodes is an influential factor.

  18. Using F-table The F distribution is asymmetrical and has two distinct degrees of freedom. This was discovered by Fisher, hence the label “F.” Once again, what we do is calculate the value of F for our sample data and then look up the corresponding area under the curve in F-Table.

  19. Fcritical for a 5% is 3.49 F = 12.08 > 10.80 Thus p< 0.001

More Related