1 / 40

Introduction to the Analysis of Variance

Introduction to the Analysis of Variance. Chapter 14. Chapter Topics. Basic ANOVA Concepts The omnibus null hypothesis Why ANOVA? Linear Model Equation Sums of Squares Mean Squares & Degrees of Freedom The Completely Randomized Design Computational Formulae and Procedures Assumptions.

arden
Télécharger la présentation

Introduction to the Analysis of Variance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the Analysis of Variance Chapter 14

  2. Chapter Topics • Basic ANOVA Concepts • The omnibus null hypothesis • Why ANOVA? • Linear Model Equation • Sums of Squares • Mean Squares & Degrees of Freedom • The Completely Randomized Design • Computational Formulae and Procedures • Assumptions

  3. Chapter Topics • Multiple Comparison Procedures • Contrasts • A Posteori Multiple Comparison Tests • A Priori Multiple Comparison Tests • Practical Significance

  4. The Omnibus Null Hypothesis • Each element denotes the mean of a different population • The alternative states that at least two of the population means are not equal • If we reject, we don’t know which of the means are different

  5. Why ANOVA? • Why not tons of t tests? • Consider the case when we have five means. Our omnibus null hypothesis looks like: • We would have to compute ten different t tests in order to test each mean in the group. • The number of two-sample tests that can be formed from p means is given by: • The type I error rate for these ten tests collectively is close to:

  6. Why ANOVA? • ANOVA allows us to . . . • . . . Test as many means as we would like in a single, quick test • . . . Control the type-I error rate at an acceptable number

  7. The Linear Model Equation • Each score can be thought of as a composite score consisting of three separate parameters. • is the score for the person in the population • is the grand mean of all scores in an experiment • is the effect of the population on this subject’s score • is the random error effect on this subject’s score • This may make more sense when we see how to estimate these parameters

  8. The Linear Model Equation (continued) • Estimating the Parameters The Sample Grand Mean The Population Effect The Error Effect • Example: Suppose I was one of 40 people taking an IQ test, and I received a score of 131. • The average of all 40 people was 100. • The average of people in my group (grad students) was 124. { 131 = 100 + (124 – 100) + (131 – 124) The “Treatment” Effect

  9. Sums of Squares • For this design, we are going to have three terms for sums of squares: • SSTO • Sums of Squares Total • Measures total variability among scores in an experiment • SSBG • Sums of Squares Between Groups • Measures variability between treatment levels • SSWG • Sums of Squares Within Groups • Measures variability within treatment levels

  10. Sums of Squares (continued) df=np-1 df=p-1 df=p(n-1)

  11. Mean Squares This is just another name for the total sample variance! This is the average variability between treatment levels. This is the average variability within treatment levels.

  12. Mean Square Between Groups • Interpreting MSBG • If MSBG is close to zero, it indicates that there is not a lot of variability between the treatment levels. • If there is little or no variation between the treatment levels but a great deal of overall variation (MSTO), it must be that the variability within treatment levels accounts for most of the total variation. • An increase in MSBG indicates a higher amount of variability between treatment levels. • If there is high variation between the treatment levels relative to the overall variation, it must be that the variability between treatment levels accounts for most of the total variation.

  13. Mean Square Within Groups • Interpreting MSWG • If MSWG is close to zero, it indicates that there is not a lot of variability within the treatment levels. • If there is little or no variation within the treatment levels relative to overall variation (MSTO), it must be that the variability between treatment levels accounts for most of the total variation. • An increase in MSWG indicates a higher amount of variability within treatment levels. • If there is high variation within the treatment levels relative to the overall variation, it must be that the variability between treatment levels does not account for an appreciable portion of the total variation.

  14. Summarizing SS & MS – ANOVA Table

  15. Measuring Variability • What we want as researchers is for our “treatment levels” to account for more variation than does random error. • Enter the F statistic • Recall from previous chapters that the F statistic is defined as the ratio of two independent variances. • MSBG and MSWG are both measures of variance, and they are independent of one another. • Forming a ratio of these two numbers yields an F statistic. • Because we want our “treatment levels” to account for most of the variation, we want this statistic to be as large as possible. • How large is large enough?

  16. Measuring Variability (continued) • Expected Mean Squares • Recall the assumptions: • Random Sampling / Random Assignment • Normally distributed populations • Equal variances • Equal means????? • If the means are equal in each of the populations, it can be shown that the expected value of BOTH of the mean squares terms is: • This is the population error variance. • If the means are not equal in each of the populations, it can be shown that the expected value of the mean squares are:

  17. Measuring Variability (continued) • If the population means are not equal . . . • . . . we have expected values as displayed above. • . . . when we form the F statistic, we see that the ratio will become larger than one as the size of the “treatment effects” grows. • If the population means are equal . . . • . . . the F ratio will be close to one • Sooooo . . . How far away from one is far enough away to say the means are different? • Enter the F table with (p-1) degrees of freedom in the numerator and n(p-1) degrees of freedom in the denominator.

  18. The Completely Randomized Design • So far, we’ve been talking about ANOVA in general • The Completely Randomized Design (CR-p) is one of many designs. It is characterized by: • One treatment with p levels • N=n*p participants • Participants are randomly assigned to the treatment levels • We usually want to restrict this so that each treatment level has the same number of participants • Differences from other designs • Each participant is randomly assigned to only one treatment level. Participants are not administered more than one treatment • We don’t have to worry about participant matching, independent samples, repeated measures, etc.

  19. Computational Procedures for CR-p • Consider an example where we are interested in the effects of sleep deprivation on hand-steadiness. That is, we want to know if the amount of experienced sleep deprivation has an effect on hand-steadiness. We, as researchers, decide to have four treatment levels. • a1 – 12 hours of sleep deprivation • a2 – 18 hours of sleep deprivation • a3 – 24 hours of sleep deprivation • a4 – 30 hours of sleep deprivation • We have a total of 32 subjects, so we need to randomly assign them to one of these four groups.

  20. Computational Procedures for CR-p (continued) • We did the experiment with 32 subjects randomly assigned to each group so that eight were in each, and we recorded as our dependent variable for each subject the number of times during a two-minute interval that a stylus makes contact with the side of a half-inch hole. The data are:

  21. Computational Procedures for CR-p (continued) • We compute the sums of squares as follows:

  22. Computational Procedures for CR-p (continued) • We use these numbers to begin our ANOVA table • A few things to note about the ANOVA table • SSBG + SSWG = SSTO; if it doesn’t, you’ve made a mistake. • dfBG + dfWG = dfTO; if it doesn’t, you’ve made a mistake. • MSBG = SSBG/dfBG; MSWG = SSWG/dfWG; • MSBG + MSWG ≠ MSTO; if it does, you’ve probably made a mistake. • For a CR-p design, F = MSBG/MSWG

  23. CR-p Procedures in JMP • Analyze | Fit Y by X | Measurement in Y box (Continuous) | Grouping Variable in X box (Nominal) |  | Means/Anova • This is the exactsame sequence as for a t test for independent samples

  24. CR-p Procedures in JMP

  25. More on the F statistic • Using the critical value approach, we would need to find the point which corresponds to an area of α in the tail. • If our computed F exceeds this number, we reject the null hypothesis. • It does, so we reject the null. Reject a=.05 0 F3,28 F.05, 3, 28=2.95 F = 7.50

  26. More on the F statistic (continued) • We determine the p-value in the same manner • Through JMP (exact) • Through tables (approximate) • JMP – p=0.0008 • Tables – p<.01 Reject p-value a=.05 0 F3,28 F.05, 3, 28=2.95 F = 7.50

  27. Assumptions Associated with the CR-p Design • The model equation reflects all the sources of variation that affect each score. • If an experiment contains two treatments, the CR-p design is not appropriate • Participants are random samples from the respective populations, or the participants have been randomly assigned to treatment levels. • This helps distribute idiosyncratic characteristics of participants randomly over the treatment levels. • The p populations are normally distributed. • The F test is robust with respect to departures from non-normality, especially when the populations are symmetric and the “n’s” are equal. • The variances of the p populations are equal. • The F test is robust with respect to heterogeneity of variances provided there is an equal number of observations in each treatment level, the populations are normal, and the ratio of largest to smallest sample variance does not exceed three.

  28. Multiple Comparisons • If an omnibus null hypothesis is rejected, we don’t know which means differ. • Multiple Comparisons is a method of determining which means differ. • Definitions: • Contrast – difference among the means • A priori tests – when a researcher wishes to test a specific set of null hypotheses prior to gathering the data • A posteori tests – when the data suggests sets of null hypotheses that are of interest to the researcher • Also called post-hoc tests

  29. Contrasts • Contrasts are typically denoted by and • Contrasts can take a number of forms • Those in the left column are pairwise contrasts; that is, they compare two means. • Those in the right column are non-pairwise contrasts.. • Each contrast has associated with it coefficients. • Integers before the means in each contrast • Coefficients sum to 0; sum of absolute value of coefficients is 2

  30. A Posteori Multiple Comparison Tests • Tukey’s Multiple Comparison Test and Confidence Interval • Tukey’s HSD (Honestly Significant Difference) • Used for pairwise contrasts (two-tailed) when “n’s” are equal • Test statistic given by: • Test of omnibus null hypothesis is not required beforehand • If |q| exceeds the critical value in Table D.10, we reject the null hypothesis and conclude that the two population means are significantly different. • The confidence interval for the contrast is given by:

  31. A Posteori Multiple Comparison Tests (continued) • Tukey’s Multiple Comparison Test and Confidence Interval • Inspection of our data suggests we might be interested in a contrast between the first and fourth means. • Because |q|>2.92, we reject the hypothesis and conclude that the two population means are different. • Some of the other pairwise contrasts from this data are:

  32. A Posteori Multiple Comparison Tests (continued) • Tukey’s Multiple Comparison Test and Confidence Interval • Assumptions with Tukey’s Multiple Comparison Test • Random sampling or random assignment of participants • The p populations are normally distributed • The p populations achieve homogeneity of variance • The sample n’s are equal • When the sample n’s are unequal . . . • . . . use the Fisher-Hayter Multiple Comparison Test • When the populations have heterogeneous variances • . . . use another procedure (another class) • When the populations are not normal • . . . use another procedure (another class)

  33. A Posteori Multiple Comparison Tests (continued) • Fisher-Hayter Multiple Comparison Test • Two-step procedure for pairwise comparisons • Test the omnibus null hypothesis • If rejected, continue to multiple comparisons • Two advantages over Tukey’s test • Does not require equal n’s • More powerful than Tukey’s test for most data • Test statistic is given by • Reject the null hypothesis if |qFH|>

  34. A Posteori Multiple Comparison Tests (continued) • Fisher-Hayter Multiple Comparison Test • Assumptions • Random sampling or random assignment of participants • The p populations are normally distributed • The variances of the p populations are homogeneous • Scheffé’s Multiple Comparison Test • Should be used if any of the contrasts are non-pairwise • To preceded these tests with a test of the omnibus null hypothesis is unnecessary • Test statistic is given by: • Hypotheses are rejected if FS>

  35. A Priori Multiple Comparison Tests • Dunn-Šidàk Test • A priori test statistic for pairwise or non-pairwise one- or two-sided contrast hypotheses • Divides the level of significance equally among a set of C tests • The probability of one or more type I errors is less than • The test statistic is given by • Non-directional hypothesis is rejected if |tDS|> • Directional hypothesis is rejected if |tDS|>

  36. Multiple Comparison Tests – Summary of Assumptions • All multiple comparison tests we’ve discussed require • Normal populations • Populations with homogeneous variances • Random sampling or random assignment • Tukey’s procedure requires equal sample n’s • For pairwise comparisons, we can use • Tukey’s procedure (a posteori – non-directional) • Fisher-Hayter procedure (a posteori – non-directional) • Scheffé’s procedure (a posteori – non-directional) • Dunn-Šidàk procedure (a priori – directional or non-directional) • For nonpairwise comparisons, we can use • Scheffé’s procedure (a posteori – non-directional) • Dunn-Šidàk procedure (a priori – directional or non-directional)

  37. Multiple Comparison Tests – Summary of Assumptions (continued) • We can compute confidence intervals for • Tukey’s (a posteori – non-directional) • Scheffé’s (a posteori – non-directional) • Dunn-Šidàk (a priori – directional or non-directional) • The omnibus null hypothesis must be tested before using • Fisher-Hayter (a posteori – non-directional)

  38. Practical Significance • Recall the difference between practical significance and statistical significance • Strength of association can be measured with the ANOVA F test using omega-squared, which is given by: • The values are interpreted as follows: • 0.01 – Small association • 0.059 – Medium association • 0.138 – large association • We can define the effect size for contrasts as follows:

  39. Chapter Review • Basic ANOVA Concepts • The omnibus null hypothesis • Why ANOVA? • Linear Model Equation • Sums of Squares • Mean Squares & Degrees of Freedom • The Completely Randomized Design • Computational Formulae and Procedures • Assumptions

  40. Chapter Review • Multiple Comparison Procedures • Contrasts • A Posteori Multiple Comparison Tests • A Priori Multiple Comparison Tests • Practical Significance

More Related