1 / 46

Revealing the riddle of REML

Revealing the riddle of REML. Mick O’Neill Faculty of Agriculture, Food & Natural Resources, University of Sydney. Background. Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty Biometry 3 is a Third Year elective

shae
Télécharger la présentation

Revealing the riddle of REML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Revealing the riddle of REML Mick O’Neill Faculty of Agriculture, Food & Natural Resources, University of Sydney

  2. Background • Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty • Biometry 3 is a Third Year elective • Biometry 4 is (still) a possible major • All students are now expected to design and analyse their fourth year experiments with little or no help from the Biometry Unit

  3. Third year Biometry students can: • Design and analyse multi-strata factorial experiments (split-plots, strip-plots) • Perform binomial & ordinal logistic regression, Poisson regression, … • Analyse repeated measures data using REML

  4. Step 1. What is Maximum Likelihood? • The likelihood is the prior probability of obtaining the actual data in your sample • This requires you to assume that the data follow some distribution, typically: • Binomial or Poisson for count data • Normal or LogNormal for continuous data

  5. Step 1. What is Maximum Likelihood? The likelihood is the prior probability of obtaining the actual data in your sample Each of these distributions involves at least one unknown parameter (probability, mean, standard deviation, …) which must be estimated from the data.

  6. Step 1. What is Maximum Likelihood? The likelihood is the prior probability of obtaining the actual data in your sample Estimation is achieved by finding that parameter valuewhich maximises the likelihood (or equivalently the log-likelihood)

  7. Example 1. Binomial data • Guess p = P(seed germinates) • Evaluate LogL • Maximise LogL by varying p

  8. Example 2. Normal data • Guess m and s • Evaluate LogL • Maximise LogL by varying m and s

  9. Step 2. What is REML? It is possible to partition the likelihood into two terms: • a likelihood that involves m (as well as s2) • and • a residual likelihood that involves only s2

  10. Step 2. What is REML? It is possible to partition the likelihood into two terms, in such a way that: • the first likelihood can be maximised to estimate m (and its solution does not depend on the value of s2) • the residual likelihood can be maximised to estimates2 REML estimate

  11. =  How?

  12. Solutions • ML estimate of variance is • REML estimate is • In each case the estimate of m isthe sample mean

  13. sn-1 sn ML estimate REML estimate

  14. =  Extensions

  15. Example 3. One-way (no blocking)Fixed effects

  16. ANOVA vs REML ANOVA: Source of variation d.f. s.s. m.s. v.r. F pr. Chick stratum Diet 3 0.01847 0.00616 0.10 0.958 Residual 12 0.73230 0.06103 Total 15 0.75078 REML Variance Components Analysis: Fixed model : Constant+Diet Random model : Chick Chick used as residual term *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Chick Identity Sigma20.0610 0.02491 *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob Diet 0.30 3 0.100.960

  17. ANOVA: • ***** Tables of means ***** Grand mean 3.129 Diet Diet_1 Diet_2 Diet_3 Diet_4 3.107 3.188 3.112 3.107 *** Standard errors of differences of means *** Table Diet rep. 4 d.f. 12 s.e.d. 0.1747 REML Variance Components Analysis: *** Table of predicted means for Diet *** Diet Diet_1 Diet_2 Diet_3 Diet_4 3.107 3.187 3.112 3.107 Standard error of differences: 0.1747

  18. Example 4a. One-way (in randomised blocks) – fixed treatments ANOVA: Source of variation d.f. s.s. m.s. v.r. F pr. Block stratum 5 5.410 1.082 0.29 Block.*Units* stratum Spacing 4 125.661 31.415 8.50 <.001 Residual 20 73.919 3.696 REML Variance Components Analysis (a) With Block + Spacing both fixed effects: Term Factor Model(order) Parameter Estimate S.e. Residual Identity Sigma2 3.696 1.169 Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob Block 1.46 5 0.29 0.917 Spacing 34.00 4 8.50 <0.001

  19. Random blocks, fixed treatments ANOVA: Source of variation d.f. s.s. m.s. v.r. F pr. Block stratum 5 5.410 1.082 0.29 Block.*Units* stratum Spacing 4 125.661 31.415 8.50 <.001 Residual 20 73.919 3.696 REML Variance Components Analysis (b) With Spacing fixed and Block random: *** Estimated Variance Components *** Random term Component S.e. Block 0.000 BOUND *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Residual Identity Sigma2 3.173 0.897 *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob Spacing 39.60 4 9.90 <0.001 Estimated Source Value Block -0.5228 Error 3.6959

  20. Model for RCBD Yield of soybean = Overall mean + Block effect + Spacing effect + Error • Overall mean and Spacing effects are fixedeffects • Block effect is a random term • Error is a random term

  21. General Linear Mixed Model Yield of soybean = Overall mean + Block effect + Spacing effect + Error Y = Fixed effects + Random effects + Error term Y = Xt + Zu + e • The random effects can be correlated • The error term can be correlated • The random effects are uncorrelated with the error term

  22. General Linear Mixed Model Y = Fixed effects + Random effects + Error term Y = Xt + Zu + e is a scaling factor, often set to 1

  23. REML is used as the default to estimate to variance and covariance parameters of the model • The algorithm does not depend on the data being balanced

  24. Furthermore, different choices for the variance matrices allow for : • an appropriate repeated measures analysis for normal data • an appropriate spatial analysis for field trials Nested models can be compared using the change in deviance which is approximately c2 with df = change in number of parameters

  25. Example 5. Adjusting thesis marks for random markers

  26. For students: For markers:

  27. Example 6. Use of devianceWidths (in mm) of the dorsal shield of larvae of ticks found on 4 rabbits

  28. Minitab’s analysis Source DF Adj MS F P Rabbit 3 602.6 5.26 0.004 Error 33 114.5 Estimated Source Term Source Value 1 Rabbit (2) + 9.0090 (1) Rabbit 54.18 2 Error (2) Error 114.48 Rabbit Mean 1 372.3 2 354.4 3 355.3 4 361.3 these are sample means

  29. GenStat’s Linear Mixed Models analysis Random term Component S.e. Rabbit 55.0 55.8 *** Residual variance model *** Term Model(order) Parameter Estimate S.e. Residual Identity Sigma2 114.4 28.2 Table of predicted means for Rabbit (these are BLUPs) Rabbit 1 2 3 4 369.9 355.5 356.0 361.2 Standard error of differences: Average 4.613 Maximum 5.055 Minimum 4.133 Average variance of differences: 21.38 Deviance d.f. 215.22 34

  30. Test H0: Method: drop Rabbit as a random term Deviance d.f. 221.21 35 for reduced model 215.22 34 Change in deviance = 6.0 with 1 df P-value = 0.014 The variation in the widths of the dorsal shield of larvae of ticks found among rabbits differs significantly across rabbits (P = 0.014) The variance among rabbits is estimated to be 55.0 ( 55.7) compared to the variance within rabbits, namely 114.4 ( 28.2)

  31. Example 7 - Repeated Measures Growth of a fungus (in cm) over time

  32. Growth of fungus

  33. Split plot without Greenhouse-Geisser adjustmment (assumes equi-correlation structure among times) Source of variation d.f. m.s. v.r. F pr. Rep.Fungus stratum Fungus 2 8.104 97.30 <.001 Residual 9 0.083 3.37 Rep.Fungus.Time stratum Time 5 55.231 2233.21 <.001 Fungus.Time 10 0.933 37.71 <.001 Residual 45 0.025 Estimated stratum variances Stratum variance d.f. variance component Rep.Fungus 0.0833 9 0.0098 Rep.Fungus.Time 0.0247 45 0.0247

  34. Split plot with Greenhouse-Geisser adjustmment (tests equi-correlation structure among times) Source of variation d.f. m.s. v.r. F pr. Rep.Fungus stratum Fungus 2 8.104 97.30 <.001 Residual 9 0.083 3.37 Rep.Fungus.Time stratum Time 5 55.231 2233.21 <.001 Fungus.Time 10 0.933 37.71 <.001 Residual 45 0.025 (d.f. are multiplied by the correction factors before calculating F probabilities) Box's tests for symmetry of the covariance matrix: Chi-square 57.47 on 19 df: probability 0.000 F-test 2.93 on 19, 859 df: probability 0.000 Greenhouse-Geisser epsilon 0.3206

  35. Split plot via REML – ignoring changing variances Fixed model : Constant+Fungus+Time+Fungus.Time Random model : Rep.Fungus+Rep.Fungus.Time Estimated Variance Components Random term Component S.e. Rep.Fungus 0.00976 0.00660 Residual variance model Term Model(order) Parameter Estimate S.e. Rep.Fungus.Time Identity Sigma2 0.0247 0.0052 Deviance d.f. -109.90 52 Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob Fungus 194.60 2 97.30 <0.001 Time 11166.05 5 2233.21 <0.001 Fungus.Time 377.08 10 37.71 <0.001

  36. Split plot via REML – accounting for changing variances (a) Fixed model : Constant+Fungus+Time+Fungus.Time Random model : Rep.Fungus+Rep.Fungus.Time Estimated Variance Components Random term Component S.e. Rep.Fungus 0.01053 0.00539 Residual variance model Term Model(order) Parameter Estimate S.e. Rep.Fungus.Time Identity Sigma2 0.0082 .00428 Rep Identity - - - Fungus Identity - - - Time Diagonal d_1 1.000 FIXED d_2 1.102 0.815 d_3 0.227 0.215 d_4 0.262 0.253 d_5 1.965 1.443 d_6 13.550 9.580

  37. Split plot via REML – accounting for changing variances (b) Fixed model : Constant+Fungus+Time+Fungus.Time Random model : Rep.Fungus+Rep.Fungus.Time Estimated Variance Components Random term Component S.e. Rep.Fungus 0.01053 0.00539 Residual variance model Term Model(order) Parameter Estimate S.e. Rep.Fungus.Time Identity Sigma2 1.000 FIXED Rep Identity - - - Fungus Identity - - - Time Diagonal d_1 0.0082 0.0043 d_2 0.0091 0.0047 d_3 0.0019 0.0015 d_4 0.0022 0.0017 d_5 0.0162 0.0081 d_6 0.1116 0.0530

  38. Split plot via REML – accounting for changing variances and an AR(1) correlation structure Fixed model : Constant+Fungus+Time+Fungus.Time Random model : Rep.Fungus+Rep.Fungus.Time Estimated Variance Components Random term Component S.e. Rep.Fungus 0.010616 0.005572 Residual variance model Term Model(order) Parameter Estimate S.e. Rep.Fungus.Time Identity Sigma2 0.0085 0.0045 Rep Identity - - - Fungus Identity - - - Time AR(1) hetphi_10.148 0.209 d_1 1.000 FIXED d_2 1.202 0.895 d_3 0.260 0.249 d_4 0.264 0.266 d_5 1.829 1.347 d_6 13.560 9.620

  39. Split plot via REML – accounting for changing variances and an AR(1) correlation structure Deviance d.f. Change d.f. Same variance, uncorrelated -109.90 52 Different variances over time -151.03 47 41.13 5 + AR(1) correlation structure -151.59 46 0.56 1

  40. Example 8 – Spatial analysis RCBD (fixed) fertilisers Potato yields (t/ha)

  41. Source of variation d.f. m.s. v.r. F pr. Block stratum 3 2.929 1.43 Block.Treatment stratum Treatment 5 92.359 45.07 <.001 Residual 15 2.049 Treatment A B C D E F 17.55 25.60 27.67 25.94 30.51 30.63 *** Standard errors of differences of means *** Table Treatment rep. 4 d.f. 15 s.e.d. 1.012

  42. Contour plot of residuals

  43. REML Random term Component S.e. Block 0.395 0.500 Residual variance model Term Factor Model Parameter Estimate S.e. Y.X Sigma2 2.849 1.739 Y AR(1) phi_10.7054 0.2078 X AR(1) phi_1 -0.2508 0.3397 Deviance d.f. 36.54 14 Treatment A B C D E F 17.74 26.29 26.79 26.34 30.41 29.37 Standard error of differences: Average 0.7749 Maximum 0.8942 Minimum 0.6465 Average variance of differences: 0.6050

More Related