1 / 70

MT2004

MT2004. Olivier GIMENEZ Telephone: 01334 461827 E-mail: olivier@mcs.st-and.ac.uk Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html. 13. Analysis of variance.

kylia
Télécharger la présentation

MT2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MT2004 Olivier GIMENEZ Telephone: 01334 461827 E-mail: olivier@mcs.st-and.ac.uk Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html

  2. 13. Analysis of variance • So far, we’ve investigated the relationship between a response variable and one or several continuous explanatory variables • The objective here is to study the relationship between a response variable Y and one or two discrete explanatory variables

  3. 13. Analysis of variance 13.1 One-way ANOVA • Example: a standard measurement of the flammability of fabric is given by the length of the burnt portion of a piece of the fabric which has been held over a flame for a given time. An investigation to see whether or not there was a difference between the measurement obtained by 5 laboratories produced the following data.

  4. 13. Analysis of variance 13.1 One-way ANOVA laboratory 1 2 3 4 5 2.9 2.7 3.3 3.3 4.1 3.1 3.4 3.3 3.2 4.1 3.1 3.6 3.5 3.4 3.7 3.7 3.2 3.5 2.7 4.2 3.1 4.0 2.8 2.7 3.1 4.2 4.1 2.8 3.3 3.5 3.7 3.8 3.2 2.9 2.8 3.9 3.8 2.8 3.2 3.5 3.1 4.3 3.8 2.9 3.7 3.0 3.4 3.5 2.6 3.5 2.9 3.3 3.8 2.8 3.9 Measurements of length obtained by 5 laboratories

  5. 13.1 One-way ANOVA • The problem here is to compare several populations • The technique we will use is the one-way analysis of variance • This is a special case of the ANOVA introduced in the Regression Section • Consider k distributions (or populations) with means 1,…,k, and suppose we wish to test: • H0: 1=…=k • against • H0: 1,…,k are not all equal • WARNING: the alternative hypothesis does not imply that all the i are different, but at least one pair. E.g. with k = 3, 1=23 would be OK.

  6. 13.1 One-way ANOVA • In the example, we wish to test the null hypothesis that the means of lengths obtained by the k = 5 laboratories are the same. • Suppose that we have random sample of sizes n1,…,nk from the k distributions. Note that the random samples do not need to have same sample size. • yij denotes the jth observation on the ith distribution, i = 1,…, k and j = 1,…, ni

  7. 13. Analysis of variance 13.1 One-way ANOVA laboratory 1 2 3 4 5 2.9 2.7 3.3 3.3 4.1 3.1 3.4 3.3 3.2 4.1 3.1 3.6 3.5 3.4 3.7 3.7 3.2 3.5 2.7 4.2 3.1 4.0 2.8 2.7 3.1 4.2 4.1 2.8 3.3 3.5 3.7 3.8 3.2 2.9 2.8 3.9 3.8 2.8 3.2 3.5 3.1 4.3 3.8 2.9 3.7 3.0 3.4 3.5 2.6 3.5 2.9 3.3 3.8 2.8 3.9 y23 Measurements of length obtained by 5 laboratories y57

  8. 13.1 One-way ANOVA • In the example, we wish to test the null hypothesis that the means of lengths obtained by the k = 5 laboratories are the same. • Suppose that we have random sample of sizes n1,…,nk from the k distributions. Note that the random samples do not need to have same sample size. • yij denotes the jth observation on the ith distribution, i = 1,…, k and j = 1,…, ni • We will assume that yij is an observation from a random variable Yij where: • Yij N(i,2), i = 1,…, k and j = 1,…, ni, Yij independent • We thus have that E(Yij) = i

  9. 13.1 One-way ANOVA • Actually, this model is a particular case of a multiple regression • Define indicator variables x1,…, xk by: • Then the equation E(Yij) = i can be rewritten as: • E(Yij) = 1x1 + … + kxk • This equation defines a multiple regression without intercept  • Now, to test the null hypothesis, we can apply the results of the end of the Regression Section (we place equality restrictions on the full model)

  10. 13.1 One-way ANOVA • The full model has k parameters (1,…,k) thus p1 = k. • The submodel under H0 is E(Yij) = , thus p0 = 1. • We have n = n1 + … + nk observations. • So an appropriate statistic to test the null hypothesis is: • If H0 is false (i.e. 1,…,k are not all equal), then this statistic will tend to take values too large to be consistent with the quantile of a F distribution with k-1 and n-k degrees of freedom.

  11. 13.1 One-way ANOVA • We provide other expressions for rss0 and rss1, much easier to manipulate • Let denote the overall sample mean • Let denote the sample mean of the ith random sample (pop.)

  12. 13.1 One-way ANOVA • We provide other expressions for rss_0 and rss_1, much easier to manipulate • Let denote the overall sample mean • Let denote the sample mean of the ith random sample (pop.)

  13. 13.1 One-way ANOVA • We provide other expressions for rss_0 and rss_1, much easier to manipulate • Let denote the overall sample mean • Let denote the sample mean of the ith random sample (pop.)

  14. 13.1 One-way ANOVA • We provide other expressions for rss_0 and rss_1, much easier to manipulate • Let denote the overall sample mean • Let denote the sample mean of the ith random sample (pop.) • It can be shown that the total variability is the sum of the between and within variability:

  15. 13.1 One-way ANOVA • It can also be shown that the maximum likelihood are given: • For the full model by: • For the submodel by: • And that:

  16. 13.1 One-way ANOVA • If we define: • Then • Becomes:

  17. 13.1 One-way ANOVA • Most often, the sums of squares, mean squares, F values, p-values are displayed in an ANOVA table • With • Note that the within mean square MSW is an unbiased estimator of the variance 2, called the residual s.e.

  18. 13. Analysis of variance 13.1.1 One-way ANOVA in R • Example: a standard measurement of the flammability of fabric is given by the length of the burnt portion of a piece of the fabric which has been held over a flame for a given time. An investigation to see whether or not there was a difference between the measurement obtained by 5 laboratories produced the following data.

  19. 13. Analysis of variance 13.1.1 One-way ANOVA in R laboratory 1 2 3 4 5 2.9 2.7 3.3 3.3 4.1 3.1 3.4 3.3 3.2 4.1 3.1 3.6 3.5 3.4 3.7 3.7 3.2 3.5 2.7 4.2 3.1 4.0 2.8 2.7 3.1 4.2 4.1 2.8 3.3 3.5 3.7 3.8 3.2 2.9 2.8 3.9 3.8 2.8 3.2 3.5 3.1 4.3 3.8 2.9 3.7 3.0 3.4 3.5 2.6 3.5 2.9 3.3 3.8 2.8 3.9 Measurements of length obtained by 5 laboratories

  20. 13. Analysis of variance 13.1.1 One-way ANOVA in R • We wish to test the null hypothesis: • H0: 1 = … = 5 • Against the alternative hypothesis • H1: at least one pair of i’s are not equal • Where i is the mean length of burnt fabric in measurements from laboratory i (i = 1,…, 5)

  21. 13.1.1 One-way ANOVA in R > lengthlab1<-c(2.9,3.1,3.1,3.7,3.1,4.2,3.7,3.9,3.1,3.0,2.9) > lengthlab2<-c(2.7,3.4,3.6,3.2,4.0,4.1,3.8,3.8,4.3,3.4,3.3) > lengthlab3<-c(3.3,3.3,3.5,3.5,2.8,2.8,3.2,2.8,3.8,3.5,3.8) > lengthlab4<-c(3.3,3.2,3.4,2.7,2.7,3.3,2.9,3.2,2.9,2.6,2.8) > lengthlab5<-c(4.1,4.1,3.7,4.2,3.1,3.5,2.8,3.5,3.7,3.5,3.9) > lab1 <- rep(1,11) > lab2 <- rep(2,11) > lab3 <- rep(3,11) > lab4 <- rep(4,11) > lab5 <- rep(5,11) > fabric<-data.frame(lab=c(lab1,lab2,lab3,lab4,lab5),length=c(lengthlab1,lengthlab2,lengthlab3,lengthlab4,lengthlab5)) > plot(fabric$lab,fabric$length)

  22. 13.1.1 One-way ANOVA in R

  23. 13.1.1 One-way ANOVA in R H0: 1=2=3=4=5 =  ?

  24. 13.1.1 One-way ANOVA in R > reglab <- lm(length~as.factor(lab), data=fabric) > reglab Call: lm(formula = length ~ as.factor(lab), data = fabric) Coefficients: (Intercept) as.factor(lab)2 as.factor(lab)3 as.factor(lab)4 3.33636 0.26364 -0.03636 -0.33636 as.factor(lab)5 0.30909 The lm command produces in that case the parameters estimates of model E(Yij) = 1x1 + … + 5x5

  25. 13.1.1 One-way ANOVA in R • > anova(reglab) • Analysis of Variance Table • Response: length • Df Sum Sq Mean Sq F value Pr(>F) • as.factor(lab) 4 2.9865 0.7466 4.5346 0.003337 ** • Residuals 50 8.2327 0.1647 • --- • Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 • The R command anova applied to the regression object reglab produces the ANOVA table • The pvalue is small, we reject H0 that 1 = … = 5

  26. 13.1.1 One-way ANOVA in R Checking the assumptions • It is crucial to test the assumptions of the ANOVA model, in particular: • The observations in each group come from a normal distribution • The variances are equal (= 2)

  27. 13.1.1 One-way ANOVA in R Checking the assumptions • It is crucial to test the assumptions of the ANOVA model, in particular: • Normality: use QQplot on residuals • Homogeneity of variance: inspect the variances

  28. 13.1.1 One-way ANOVA in R Checking the assumptions 1. The observations in each group come from a normal distribution. We check normality of the residuals: > resfab1<-lab1-mean(lab1) > resfab2<-lab2-mean(lab2) > resfab3<-lab3-mean(lab3) > resfab4<-lab4-mean(lab4) > resfab5<-lab5-mean(lab5) > resfab<-c(resfab1,resfab2,resfab3,resfab4,resfab5) > qqnorm(resfab) > qqline(resfab)

  29. 13.1.1 One-way ANOVA in R Normality is OK…

  30. 13.1.1 One-way ANOVA in R Checking the assumptions 2. The variances are equal (2). We inspect the variances: > var(lab1) [1] 0.2045455 > var(lab2) [1] 0.212 > var(lab3) [1] 0.138 > var(lab4) [1] 0.082 > var(lab5) [1] 0.1867273 Variances are roughly equal

  31. 13. Analysis of variance 13.1.2 Least Significant Differences • When performing an ANOVA, we wish to test the null hypothesis: • H0: 1 = … = 5 • Against the alternative hypothesis • H1: at least one pair of i’s are not equal • So if the null hypothesis is rejected, the question is which differences between groups are most important • In other words, we wish to test H0: i = j,  i  j

  32. 13. Analysis of variance 13.1.2 Least Significant Differences • An appropriate test to compare the groups in pairs is the 2-sample t-test • Under H0: i = j, we have that • But 2 is unknown • We will replace 2 by s2 = rss1 / (n-k) = MSW (given in the ANOVA table)

  33. 13.1.2 Least Significant Differences • On one hand, we have that s2 is independent of • On the other hand: • So, an appropriate test statistic is: • To be compared with the quantile t0.025;n-k for a 2-sided test

  34. 13. Analysis of variance 13.1.2 Least Significant Differences • WARNING: This is not exactly the same formula as for the 2-sample t-test since: • s2 is calculated using all the data, and not just group i and j • the degree of freedom is n - k rather than ni + nj - 2

  35. 13. Analysis of variance 13.1.2 Least Significant Differences • If the samples are of equal size, i.e. n1 = … = nk • It’s easier to calculate the smallest difference in sample means leading to rejection of the null hypothesis that the 2 groups have equal means • This is called the Least Significant Differences (LSD) • If k groups, each with m observations (n = mk), the LSD for significance level  is: • Once the LSD is calculated, then look for the pairs of groups with sample means differing by more than the LSD

  36. 13.1.2 Least Significant Differences Example (Fabric data): The Least Significant Differences for significance level  is: > LSD <- qt(0.975,55-5)*sqrt(2*0.1647/11) > LSD [1] 0.3475762

  37. 13.1.2 Least Significant Differences > mean(lab5) [1] 3.645455 > mean(lab2) [1] 3.6 > mean(lab1) [1] 3.336364 > mean(lab3) [1] 3.3 > mean(lab4) [1] 3 We calculate the sample mean for each group

  38. 13.1.2 Least Significant Differences > mean(lab5) [1] 3.645455 > mean(lab2) [1] 3.6 > mean(lab1) [1] 3.336364 > mean(lab3) [1] 3.3 > mean(lab4) [1] 3 And then look for the pairs of groups with sample means differing by more than the LSD = 0.3475762

  39. 13.1.2 Least Significant Differences > mean(lab5) [1] 3.645455 > mean(lab2) [1] 3.6 > mean(lab1) [1] 3.336364 > mean(lab3) [1] 3.3 > mean(lab4) [1] 3 Suggests that 4 < 2

  40. 13.1.2 Least Significant Differences > mean(lab5) [1] 3.645455 > mean(lab2) [1] 3.6 > mean(lab1) [1] 3.336364 > mean(lab3) [1] 3.3 > mean(lab4) [1] 3 Suggests that 4 < 5

  41. 13.1.2 Least Significant Differences > mean(lab5) [1] 3.645455 > mean(lab2) [1] 3.6 > mean(lab1) [1] 3.336364 > mean(lab3) [1] 3.3 > mean(lab4) [1] 3 Suggests that 4 < 2 and 4 < 5, but does not suggest any other differences between the i

  42. 13.2 Two-way ANOVA • So far, we've considered only one explanatory discrete variable (lab in the fabric data example) • Let's assume now that each observation belongs to 2 groups • This is a 2-way ANOVA • Example: consider a reading comprehension test given to pupils of age 9, 10 and 11 from 4 schools (A, B, C and D), giving the scores:

  43. 13.2 Two-way ANOVA • Example: consider a reading comprehension test given to pupils of age 9, 10 and 11 from 4 schools (A, B, C and D), giving the scores: • So observation/score yi belongs to school j (j = 1,..., J) and age k (k = 1,..., K) • yi is an observation of an independent r.v. Yi N(i,2)

  44. 13.2 Two-way ANOVA • There are four models of potential interest. • Model 3: the expected comprehension score E(Yi) = i is the sum of a school effect and an age effect:

  45. 13.2 Two-way ANOVA • There are four models of potential interest. • Model 1: the expected comprehension score E(Yi) = i is the result of a school effect only (k = 0, k, k = 1,..., K):

  46. 13.2 Two-way ANOVA • There are four models of potential interest. • Model 2: the expected comprehension score E(Yi) = i is the result of an age effect only (j = 0, j, j = 1,..., J):

  47. 13.2 Two-way ANOVA • There are four models of potential interest. • Model 0: the expected comprehension score E(Yi) = i is not the result of a school effect nor an age effect:

  48. 13.2 Two-way ANOVA • The ANOVA table for comparing models 0, 1, 2 to model 3 is:

  49. 13.2 Two-way ANOVA • The ANOVA table for comparing model 0, 1, 2 and 3 is: compare model 2 vs model 3

  50. 13.2 Two-way ANOVA • The ANOVA table for comparing model 0, 1, 2 and 3 is: compare model 1 vs model 3

More Related