Experimental design and statistical analyses of data

Experimental design and statistical analyses of data Lesson 5: Mixed models Nested anovas Split-plot designs

Randomized block design • All treatments are allocated to the same experimental units • Treatments are allocated at random Treatments (a= 4) Blocks (b = 3)

Blocks (patients) Treatments (drugs)

Response of patient j receiving drug i Effect of patient j Residual Overall mean Effect of drug i An alternative way of writing a GLM αi = μi - μ βj = μj - μ

Response of patient j receiving drug i Predicted value of y αi = μi - μ βj = μj - μ

Effects of drugs Effects of patients Ex: Patient 2 receiving treatment C:

Consider the two questions: • Are the three patients different? • Are patients in general different? • In the first case, ”patients” is considered as a fixed factor • In the second case, ”patients” is considered as a random factor

βj is assumed to be iid ND(0,σb2) i.e. independently and identically normally distributed with zero mean and variance σ²b Probability of β ”Patients” is a random effect: If patients are randomly chosen, βj will be a stochastic variable

Residual variance Variance due to drug (factor a) Variance due to patient (factor b) Variances V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε) = σa2 + σb2 + σ2

Both factors are fixed V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε) = σa2 + σb2 + σ2 Variance of a single observation: V(y) = σ2 Variance of an average:

”Patients” is a random factor (mixed anova) V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε) = σa2 + σb2 + σ2 Variance of a single observation: V(y) = σb2 + σ2 Variance of an average:

Both factors are random V(y) = V(μ + αi + βj + ε) = V(μ)+ V(αi)+ V( βj)+ V(ε) = σa2 + σb2 + σ2 Variance of a single observation: V(y) = σa2 +σb2 + σ2 Variance of an average:

Expected Means Squares

→ σa2 = 0 → → σb2 = 0 → Expected Mean Squares E[MSa] = bσa2 + σ2 df = a-1 E[MSb] = aσb2 + σ2 df = b-1 E[MSe] = σ2 df = (a-1)(b-1) H0: αA = αB = αC = αD = 0 H0: β1 = β2 = β3 = 0

Hvis ”patients” is a random factor, σb2 is estimated from E[MSb] = aσb2 + σ2 → Variance of a single observation: V(y) = σb2 + σ2 = 0.927+0.117 = 1.044 Variance of the average:

How to do it with SAS

DATA eks5_1; INPUT pat $ treat $ y; /* indlæser data */ CARDS; /* her kommer data. Kan også indlæses fra en fil */ 1 A 5.17 2 A 6.23 3 A 4.93 1 B 5.21 2 B 7.34 3 B 4.55 1 C 4.91 2 C 6.18 3 C 4.64 1 D 4.74 2 D 6.31 3 D 4.61 ; PROC GLM; /* procedure General Linear Models */ TITLE 'Eksempel 5.1'; /* medtages hvis der ønskes en titel */ CLASS pat treat; /* pat og treat er klasse (kvalitative) variable */ MODEL y = pat treat; RANDOM pat; /* Patienter er en tilfældig faktor */ RUN;

Eksempel 5.1 8 13:18 Monday, November 5, 2001 General Linear Models Procedure Dependent Variable: Y Source DF Sum of Squares Mean Square F Value Pr > F Model 5 8.09475000 1.61895000 13.80 0.0031 Error 6 0.70401667 0.11733611 Corrected Total 11 8.79876667 R-Square C.V. Root MSE Y Mean 0.919987 6.341443 0.34254359 5.40166667 Source DF Type I SS Mean Square F Value Pr > F PAT 2 7.64831667 3.82415833 32.59 0.0006 TREAT 3 0.44643333 0.14881111 1.27 0.3666 Source DF Type III SS Mean Square F Value Pr > F PAT 2 7.64831667 3.82415833 32.59 0.0006 TREAT 3 0.44643333 0.14881111 1.27 0.3666 MSe MSb MSa

Eksempel 5.1 18 09:00 Friday, November 16, 2001 General Linear Models Procedure Source Type III Expected Mean Square PAT Var(Error) + 4 Var(PAT) TREAT Var(Error) + Q(TREAT)

Nested designs

Replicates can also be regarded as nested within drugs and patients Patient j is the same for all drugs Patients are said to be nested within drugs Factor A (drug) AB C D Factor B (patient) 1 2 3 12 3 12 3 1 2 3 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Patient j is not the same for all drugs Model: Factor A (drug) AB C D Factor B (patient) 1231 23 123 123 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Model:

Rules for finding the EMS(after Dunn and Clark) • For each effect, write down every possible variance component containing every letter of the effect name. For example, in a two way design with r replicates per cell, the EMS for factor A includes σa2, σab2 and σ(ab)e2, but not σb2 • For any nested factor add in parentheses to the effect name the name(s) of the factor within it is nested e.g if B is nested in A, σ(a)b2 is the variance of β(i)j. • For the coefficient of each variance component, use all letters not in the subscripts of the variance component • For each variance component, look at any subscripts outside parentheses that are not in the effect name; if any of these letters corresponds to a fixed effect, omit that variance component

Model: Interaction between drug and patient Residual of the kth replicate nested within drug i and patient j Factor A (drug) AB C D Two-way anova (A and B fixed) Factor B (patient) 1231 23 123 123 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

Factor A: σa2 + σab2 + σ(ab)e2 Factor B: σb2 + σab2 +σ(ab)e2 Factor AB: Residual: σab2 +σ(ab)e2 σ(ab)e2 Model: (1) For each effect, write down every possible variance component containing every letter of the effect name. For example, in a two way design with r replicates per cell, the EMS for factor A includes σa2, σab2 and σ(ab)e2, but not σb2

Residual: σ(ab)e2 Model: (2) For any nested factor add in parentheses to the effect name the name(s) of the factor within it is nested e.g if B is nested in A, σ(a)b2 is the variance of β(i)j. Factor A: σa2 + σab2 + σ(ab)e2 Factor B: σb2 + σab2 +σ(ab)e2 Factor AB: σab2 +σ(ab)e2

Factor A: brσa2 + rσab2 + σ(ab)e2 Factor B: arσb2 + rσab2 +σ(ab)e2 Factor AB: Residual: rσab2 +σ(ab)e2 σ(ab)e2 Model: (3) For the coefficient of each variance component, use all letters not in the subscripts of the variance component

Residual: σ(ab)e2 Model: (4) For each variance component, look at any subscripts outside parentheses that are not in the effect name; if any of these letters corresponds to a fixed effect, omit that variance component Factor A: brσa2 + rσab2 + σ(ab)e2 Factor B: arσb2 + rσab2 +σ(ab)e2 Factor AB: rσab2 +σ(ab)e2

Factor A (drug) AB C D Two-way anova (A and B fixed) Factor B (patient) 1231 23 123 123 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Model:

NB! Factor A (drug) AB C D Two-way anova (A fixed, B random) Factor B (patient) 1231 23 123 123 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Model: βj is ND(0, σb2) (αβ)ijis ND(0; σab2(1-1/a))

Factor A: AB C D Two-way anova (A and B random) Factor B: 1231 23 123 123 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Model: βi is ND(0, σb2) αi is ND(0, σa2) (αβ)ijis ND(0; σab2)

Model: Factor A (drug) AB C D Nested anova (A fixed, B random) Factor B (patient) 1 2 3 12 3 12 3 1 2 3 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 β(i)j is ND(0, σ(a)b2)

Model: Factor A (doctor) AB C D Nested anova (A and B random) Factor B (patient) 1 2 3 12 3 12 3 1 2 3 Replicate 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 β(i)j is ND(0, σ(a)b2) αi is ND(0, σa2)

Model: Treatment (a = 3) 40% 20% 0% Four level nested anova 1 2 1 2 1 2 Tree (b = 2 ) Leaf (c = 3 ) 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Replicate (r = 2) 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 β(i)j is ND(0, σ(a)b2) γ(ij)k is ND(0, σ(ab)c2)

→ → → MS(ab)c = rs(ab)c2 + s2 MS(a)b = cr s(a)b2+ r s(ab)c2 +s2 = cr s(a)b2 + MS(ab)c MSa = bcrsa2 +cr s(a)b2+ r s(ab)c2 +s2 = bcrsa2 +MS(a)b

How do it with SAS

DATA nested; /* Nested anova (eks 6-4 inthe lecture notes) */ INFILE 'H:\lin-mod\eks6x.prn' firstobs =2 ; INPUT treat $ tree $ leaf $ disc $ Nitro ; PROC GLM; CLASS treat tree leaf disc; MODEL Nitro = treat tree(treat) leaf(tree treat); /* treatmentis a fixed factor, while treesandleaves are random */ RANDOM tree(treat) leaf(tree treat); /* gives the expected means squares */ RUN;

General Linear Models Procedure Dependent Variable: NITRO Source DF Sum of Squares Mean Square F Value Pr > F Model 17 134.04000000 7.88470588 8.00 0.0001 Error 18 17.75000000 0.98611111 Corrected Total 35 151.79000000 R-Square C.V. Root MSE NITRO Mean 0.883062 3.271932 0.99303127 30.35000000 Source DF Type I SS Mean Square F Value Pr > F TREAT 2 71.78000000 35.89000000 36.40 0.0001 TREE(TREAT) 3 36.04666667 12.01555556 12.18 0.0001 LEAF(TREAT*TREE) 12 26.21333333 2.18444444 2.22 0.0618 Source DF Type III SS Mean Square F Value Pr > F TREAT 2 71.78000000 35.89000000 36.40 0.0001 TREE(TREAT) 3 36.04666667 12.01555556 12.18 0.0001 LEAF(TREAT*TREE) 12 26.21333333 2.18444444 2.22 0.0618 NB! These values are based on MSe as the error term, which is wrong!

DATA nested; /* Nested anova (eks 6-4 inthe lecture notes) */ INFILE 'H:\lin-mod\eks6x.prn' firstobs =2 ; INPUT treat $ tree $ leaf $ disc $ Nitro ; PROC GLM; CLASS treat tree leaf disc; MODEL Nitro = treat tree(treat) leaf(tree treat); /* treatmentis a fixed factor, while treesandleaves are random */ RANDOM tree(treat) leaf(tree treat); /* gives the expected means squares */ RUN;

General Linear Models Procedure Source Type III Expected Mean Square TREAT Var(Error) + 2 Var(LEAF(TREAT*TREE)) + 6 Var(TREE(TREAT)) + Q(TREAT) TREE(TREAT) Var(Error) + 2 Var(LEAF(TREAT*TREE)) + 6 Var(TREE(TREAT)) LEAF(TREAT*TREE) Var(Error) + 2 Var(LEAF(TREAT*TREE))

PROC GLM; CLASS treat tree leaf disc; MODEL Nitro = treat tree(treat) leaf(tree treat); /* treatmentis a fixed factor, while treesandleaves are random */ RANDOM tree(treat) leaf(tree treat); /* gives the expected means squares */ TEST h=treat e= tree(treat); /* tests for the difference betweentreatmentswith MS for tree(treat) as denominator */ TEST h= tree(treat) e=leaf(tree treat); /* tests for the difference between trees with MS for leaf(tree treat) as denominator*/

General Linear Models Procedure Dependent Variable: NITRO Tests of Hypotheses using the Type III MS for TREE(TREAT) as an error term Source DF Type III SS Mean Square F Value Pr > F TREAT 2 71.78000000 35.89000000 2.99 0.1933 Tests of Hypotheses using the Type III MS for LEAF(TREAT*TREE) as an error term Source DF Type III SS Mean Square F Value Pr > F TREE(TREAT) 3 36.04666667 12.01555556 5.50 0.0130

PROC GLM; CLASS treat tree leaf disc; MODEL Nitro = treat tree(treat) leaf(tree treat); /* treatmentis a fixed factor, while treesandleaves are random */ RANDOM tree(treat) leaf(tree treat); /* gives the expected means squares */ TEST h=treat e= tree(treat); /* tests for the difference betweentreatmentswith MS for tree(treat) as denominator */ TEST h= tree(treat) e=leaf(tree treat); /* tests for the difference between trees with MS for leaf(tree treat) as denominator*/ MEANS treat / TukeyDunnett('Control') e= tree(treat) cldiff; /* findspossible significantdifferences between treatments and the control and the other treatments */ RUN;

Tukey's Studentized Range (HSD) Test for variable: NITRO NOTE: This test controls the type I experimentwise error rate. Alpha= 0.05 Confidence= 0.95 df= 3 MSE= 12.01556 Critical Value of Studentized Range= 5.910 Minimum Significant Difference= 5.9134 Comparisons significant at the 0.05 level are indicated by '***'. Simultaneous Simultaneous Lower Difference Upper TREAT Confidence Between Confidence Comparison Limit Means Limit 20% - 40% -3.663 2.250 8.163 20% - Control -2.513 3.400 9.313 40% - 20% -8.163 -2.250 3.663 40% - Control -4.763 1.150 7.063 Control - 20% -9.313 -3.400 2.513 Control - 40% -7.063 -1.150 4.763

Dunnett's T tests for variable: NITRO NOTE: This tests controls the type I experimentwise error for comparisons of all treatments against a control. Alpha= 0.05 Confidence= 0.95 df= 3 MSE= 12.01556 Critical Value of Dunnett's T= 3.866 Minimum Significant Difference= 5.4714 Comparisons significant at the 0.05 level are indicated by '***'. Simultaneous Simultaneous Lower Difference Upper TREAT Confidence Between Confidence Comparison Limit Means Limit 20% - Control -2.071 3.400 8.871 40% - Control -4.321 1.150 6.621

PROC NESTED; CLASS treat tree leaf; VAR Nitro; RUN;

Coefficients of Expected Mean Squares Source TREAT TREE LEAF ERROR TREAT 12 6 2 1 TREE 0 6 2 1 LEAF 0 0 2 1 ERROR 0 0 0 1

Experimental design and statistical analyses of data