680 likes | 785 Vues
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE. regression: x is a quantitative explanatory variable. type is a qualitative variable (a factor). Illustration. Company 1: 36 28 32 43 30 21 33 37 26 34 Company 2: 26 21 31 29 27 35 23 33 Company 3:
E N D
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE
regression: x is a quantitative explanatory variable
type is a qualitative variable (a factor)
Illustration Company 1: 36 28 32 43 30 21 33 37 26 34 Company 2: 26 21 31 29 27 35 23 33 Company 3: 39 28 45 37 21 49 34 38 44
Explanatory variable qualitative i.e. categorical - a factor • Analysis of variance • linear models for comparative experiments
Using Factor Commands • The display is different if “type” is declared as a factor.
We could check for significant differences between two companies using t tests. • t.test(company1,company2) • This calculates a 95% Confidence Interval for difference between means
Taking all the results together We calculate the total variation for the system which is the sum of squares of individual values – 32.59259
We can also work out the sum of squares within each company This sums to 1114.431
The total sum of squares of the situation must be made up of a contribution from variation WITHIN the companies and variation BETWEEN the companies. • This means that the variation between the companies equals 356.0884
This can all be shown in an analysis of variance table which has the format:
Using the R package, the command is similar to that for linear regression
Theory Data: yij is the jth observation using treatment i Model: where the errors ij are i.i.d. N(0,s2)
The response variables Yij are independent Yij ~ N(µ + τi , σ2) Constraint:
Partitioning the observed total variation SSB SST SSRES SST = SSB + SSRES
Fitted values: Company 1: 320/10 = 32 Company 2: 225/8 = 28.125 Company 3: 335/9 = 37.222 Residuals: Company 1: 1j = y1j- 32 Company 2: 2j= y2j- 28.125 Company 3: 3j = y3j - 37.222
SST = 30152 – 8802/27 = 1470.52 SSB = (3202/10 + 2252/8 + 3352/9) – 8802/27 = 356.09 SSRES = 1470.52 – 356.09 = 1114.43
ANOVA table Source of Degrees of Sum Mean F variation freedom of squares squares Between 2 356.09 178.04 3.83 treatments Residual 24 1114.43 46.44 Total 26 1470.52
Testing H0 : τi= 0 , i = 1,2,3 v H1 : not H0 (i.e. τi 0 for at least one i) Under H0, F = 3.83 on 2,24 df. P-value = P(F2,24 > 3.83) = 0.036 so we can reject H0 at levels of testing down to 3.6%.
Conclusion Results differ among the three companies (P-value 3.6%)
The fit of the model can be investigated by examining the residuals: the residual for response yij is this is just the difference between the response and its fitted value (the appropriate sample mean).
Plotting the residuals in various ways may reveal ● a pattern (e.g. lack of randomness, suggesting that an additional, uncontrolled factor is present) ● non-normality (a transformation may help) ● heteroscedasticity (error variance differs among treatments – for example it may increase with treatment mean: again a transformation – perhaps log - may be required)
In this example, samples are small, but one might question the validity of the assumptions of normality (Company 2) and homoscedasticity (equality of variances, Company 2 v Companies 1/3).
plot(residuals(lm(company~type))~ fitted.values(lm(company~type)),pch=8)
plot(residuals(lm(company~type))~ fitted.values(lm(company~type)),pch=8) • abline(h=0,lty=2)
It is also possible to compare with an analysis using “type” as a qualitative explanatory variable • type=c(rep(1,10),rep(2,8),rep(3,9)) • No “factor” command
Note low R2 The equation is company = 27.666+2.510 x type