1 / 66

Lecture 2 Estimation and Inference for the marginal model

Lecture 2 Estimation and Inference for the marginal model. Ziad Taib Biostatistics, AZ April 2011. Outline of lecture 2. A reminder Estimation for the marginal model ML and REML estimation Inference for the mean structure Inference for the variance components

kimimela
Télécharger la présentation

Lecture 2 Estimation and Inference for the marginal model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2Estimation and Inference for the marginal model Ziad Taib Biostatistics, AZ April 2011 Name, department

  2. Outline of lecture 2 • A reminder • Estimation for the marginal model ML and REML estimation • Inference for the mean structure • Inference for the variance components • Fitting linear mixed models in SAS Name, department

  3. 1. A reminder: The 2-stage Model Formulation: Name, department

  4. Stage 1 • Response Yij for ith subject, measured at time tij,i = 1, . . . , N, j = 1, . . . , ni • • Response vector Yifor ith subject: • Ziis a (nix q) matrix of known covariates and • biis a (nix q) matrix of parameters • Note that the above model describes the observed variability within subjects Possibly after some convenient transformation Name, department

  5. Stage 2 • Between-subject variability can now be studied from relating the parameters bito known covariates • Kiis a (qx p) matrix of known covariates and • bis a (p-dimensional vector of unknown regression parameters • Finally Name, department

  6. The General Linear Mixed-effectsModel • The 2-stages of the 2-stage approach can now be combined into one model: Average evolution Subject specific Name, department

  7. The hierarchical versus the marginal Model It can be written as The general mixed model is given by It is therefore also called a hierarchical model Name, department

  8. Marginally we have that is distributed as Hence f(yiI bi) f(bi) f(yi) Name, department

  9. Example Name, department

  10. The prostate data A model for the prostate cancer Stage 1 Name, department

  11. The prostate data A model for the prostate cancer Stage 2 Age could not be matched Ci, Bi, Li, Miare indicators of the classes: control, BPH, local or metastatic cancer. Agei is the subject’s age at diagnosis. The parameters in the first row are the average intercepts for the different classes. Name, department

  12. The prostate data This gives the followingmodel eij Name, department

  13. 2. Estimation in the Marginal Model: ML and REML Estimation Name, department

  14. ML and REML estimates: Name, department

  15. ML and REML estimates (cont’d) Name, department

  16. Estimation based on the marginal model Vi Name, department

  17. Name, department

  18. ML estimation • Maximise with respect to b • Replace in the likelihood function • Maximise with respect to a • One can use the EM algorithm or Newton Raphson Name, department

  19. Name, department

  20. ML estimation Name, department

  21. REML ESTIMATION • Restricted (or residual, or reduced) maximum likelihood (REML) approach is a particular form of maximum likelihood estimation which does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect. • In the case of variance component estimation, the likelihood function is calculated from the probability distribution of a set of contrasts. In particular, REML is used as a method for fitting linear mixed models. In contrast to maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters. Name, department

  22. Analysis of Contrast Variables Contrast variables in repeated measures data are linear combinations of the responses over time for an individual. In longitudinal studies it is of interest to consider the set of differences between responses at consecutive time points, that is, changes from time 1 to time 2, time 2 to time 3, and so forth. A set of contrast variables can be used to analyze trends over time and to make comparisons between times. The original repeated measures data for each individual are transformed into new sets of variables each given by a set of contrast variables. Name, department

  23. REML estimation • Given an iid sample Yii = 1, . . . , N, we can estimate the variance using • But since m is uknown, we use • Based on this we can define Name, department

  24. REML Name, department

  25. Name, department

  26. 3. Inference for the meanstructure Name, department

  27. Approximate Wald tests • Under the Wald statistical test, named after Abraham Wald, the maximum likelihood estimate of the parameter(s) of interest b is compared with the proposed value b0, with the assumption that the difference between the two will be approximately normal. Typically the square of the difference is compared to a chi-squared distribution. In the univariate case, the Wald statistic is • which is compared against a chi-square distribution. Alternatively, the difference can be compared to a normal distribution. In this case the test statistic is Name, department

  28. Approximate Wald tests Obs!a is estimated which gives extra variability and bias. Bias is resolved by using t- or F-test. Name, department

  29. Approximate t-and F- tests Name, department

  30. Robust inference Name, department

  31. Name, department

  32. Name, department

  33. Name, department

  34. Name, department

  35. Name, department

  36. Likelihood ratio tests Name, department

  37. Likelihood ratio tests Name, department

  38. Name, department

  39. 4. Inference for the Variance Components Name, department

  40. 5. Fitting linear mixed models in SAS Name, department

  41. Statistical software Name, department

  42. Software (cont’d) • SAS – SPSS – BMDP/5v – ML3 – HLM – Splus – R can handle correlated data but some are more restricted than others. • Most packages offer a choice between ML and REML and optimisation is often based on Newton-Raphson, the EM algorithm or Fisher scoring. • The user has to specify a model for the mean response that is linear in the fixed effects and to specify a covariance structure. The user can select a full parameterisation of the covariance structure (unstructured) or choose among given covariance structures. • The covariance structure is also influenced by the inclusion of random effects and their covariance structure. Name, department

  43. Software (cont’d) • Output often includes: • history of optimisation iterations • estimates of fixed effects • covariance parameters with standard errors • estimates of user specified contrasts • Graphics is often limited but can be done in another software Name, department

  44. SAS PROC MIXED and Repeated Measures • PROC MIXED of SAS offers greater flexibility for the modelling of repeated measures data than PROC GLM. (Firstly, the procedure provides a mechanism for modelling the covariance structure associated with the repeated measures. Secondly, it can handle some forms of missing data without discarding an entire subject’s-worth of data. Thirdly, it has some capability to handle the situation when each subject may be measured at different times and time intervals.) • In PROC GLM, repeated measures are handled in a multivariate framework and it requires a multivariate view of the data. PROC MIXED, on the other hand, requires a univariate or stacked-data view of the data. In other words, there is only a single response variable. The repeated information, including all of the information about the subjects, is contained in other variables. Proc GLMassumes that the covariance matrix meets a sphericity assumption compound symmetry. Name, department

  45. SAS PROC MIXED • Proc mixed was designed to handle mixed models. It has a large choice of covariance structures (unstructured, random effects, autoregressive, Diggle etc) • PROC MIXED can be used not only to estimate the fixed parameters, but also the covariance parameters. • By default, PROC MIXED estimates the covariance parameters using the method of restricted maximum likelihood (REML). • PROC MIXED provides empirical Bayes estimates. • Separate analyses for separate groups can be run using the BY statement. • Approximate F tests for class variables are obtained using Wald’s test. • All components of the output can be saved as a SAS data set for further manipulation using other internal (SAS) or external procedures. Name, department

  46. PROC MIXED: Syntax PROC MIXED < options > ; BY variables ; CLASS variables ; ID variables ; MODEL dependent = < fixed-effects > < / options > ; RANDOM random-effects < / options > ; REPEATED < repeated-effect > < / options > ; PARMS (value-list) ... < / options > ; PRIOR < distribution > < / options > ; CONTRAST 'label' < fixed-effect values ... >                                    < | random-effect values ... > , ... < / options > ; ESTIMATE 'label' < fixed-effect values ... >                                    < | random-effect values ... >< / options > ; LSMEANS fixed-effects < / options > ; MAKE 'table' OUT=SAS-data-set ; WEIGHT variable ; Name, department

  47. In Proc Mixed, the mixed model is specified by means of a number of statements like CLASS, MODEL, RANDOM and REPEATED. • The CLASS statement identifies the classification variables (for example, gender, person, age, etc.). • The MODEL statement specifies the model’s fixed effects equation, Xiβ. Thus, the design matrix Xi is defined and the model’s intercept is included by default. • The RANDOM statement isused to specify random effects and the form of covariance matrix D. (Useful options: SOLUTION: print random effects solution). • The REPEATED statement models the intra-individual variation and includes the structure of Si=Cov(ei), where Siis a block diagonal matrix for each subject. (If the REPEATED statement is not included it is assumed that Si=σ2I). • LSMEANS Calculates least squares mean estimates of specified fixed effects. Name, department

  48. Modelling the Covariance Structure Using the RANDOM and REPEATED Statements in PROC MIXED Measures on different individuals are independent, so covariance needs attention only with measures on the same individuals. The covariance structure refers to variances at individual times and to correlation between measures at different times on the same individual. There are basically two aspects of the correlation. • First, two measures on the same individual are correlated simply because they share common contributions from that individual. This is due to variation between indivduals. • Second, measures on the same individual close in time are often more highly correlated than measures far apart in time. This is covariation within indivduals. . Usually, when using PROC MIXED, the variation between indivduals is specified by the RANDOM statement, and covariation within indivduals is specified by the REPEATED statement Name, department

  49. PROC MIXED fits many different structures (some are listed here). Note also that a particular structure may be fit using more than one “TYPE” designation, and with combinations of the RANDOM and REPEATED statements. Name, department

  50. Data structure of Proc Mixed • Consider the example where arm strength is measured on 8 patients at 3 different times and where patients have been randomized to one of 2 treatment groups. The multivariate view associated with e.g. PROC GLM code: would look like below Name, department

More Related