Multilevel Data in Outcomes Research

Multilevel Data in Outcomes Research • Types of multilevel data common in outcomes research • Random versus fixed effects • Statistical Model Choices • “Shrinkage Estimates” versus Fixed Effects • Example of CA State CABG data

What are multilevel data? • Gathering individual observations into larger groups does not create clustered data • Individual observations from a simple, random sample are never multilevel • Multilevels are a result of sampling/design • Usually from stages/levels in obtaining the individual units of observation • Repeated measures is a type of multilevel data

Other Names for Multilevel Data • Hierarchical models • Clustered data (but different from cluster analysis) • Components of Variance models • Contextual Models • Micro and macro level data

Multilevel Data in Outcomes Research • Two levels: • Hospitals and patients • Physicians and patients • Three levels: • Hospitals, physicians, and patients • Physicians, patients, and repeated measures • Four levels: • National Health Interview Survey

National Health Interview Survey • Highest level: Select Primary Sampling Units (MSA’s, counties, groups of counties) • Next level: Stratify PSU’s by Census blocks and select Secondary Sampling Units (clusters of households) • Next level: Select Households within SSU’s • Lowest level: Interview individuals in the households (some all, others a sample)

Characteristics of Multilevel Data • Measurements within level are correlated (eg, measures on same person are more alike than measurements across persons) • Variables can be measured at each level • Standard statistical models and tests are incorrect • The variance of the outcome can be attributed to each level

Two Parts of Multilevel Data VarianceOutcome = Patient Satisfaction Score Level 2: Physicians MD1: mean=81 MD2: mean=58 MD3: mean=74 55 61 68 74 75 79 81 85 77 Level 1: Patients Variance in the patient score divides into two parts: (1) the variance between physicans = 2B (2) the variance within the physicians = 2W So the total variance = 2B + 2W

Intraclass Correlation Coefficient (ICC) The intraclass correlation coefficient (ICC) is a measure of the correlation among the individual observations within the clusters It is calculated by the ratio of the between cluster variance to the total variance: 2B / (2B + 2W )

Intraclass Correlation Coefficient (ICC) MD1: mean=81 MD2: mean=58 MD3: mean=74 58 58 74 74 74 74 81 81 81 Take extreme case where each MD’s patients have the same score = no variance within the physicians. So, ICC = 2B / 2B + 2W = 2B / 2B + 0 = 1 = perfect correlation within the clusters.

Intraclass Correlation Coefficient (ICC) MD1: mean=71 MD2: mean=68 MD3: mean=74 58 78 54 94 84 64 81 61 71 A different case where each MD’s patients have very different scores = most of the variance is within the physicians (ie, between patients, not physicians). ICC is close to 0.

Implications of ICC for Analysis • When the ICC is close to 0, most of the variation is explained by patient level measures • Less difference between results from ordinary regression and multilevel models • May be less important to use a statistical model that allows variables for physician characteristics

Implications of ICC for Analysis • When the ICC is close to 1, most of the variation is explained by physician level measures • Using a statistical model that removes physician effects leaves little variation to explain • Important to use a statistical model that allows variables for physician characteristics

Methods of Analyzing Multilevel Data • Regression model ignoring higher level variables • Regression model with an indicator variable for each level 2 unit (minus one) • Conditional regression model • Regression model with generalized estimating equations (GEE model) • Random or mixed effects regression model

Choice of Analysis Model: Three Main Considerations • What is the research question? • How many observations are there at each level of the data? • How important is controlling unmeasured confounding at the higher level?

Fixed versus Random Effects • Effects are random when the units are a sample of a larger population • have variation because sampled; another sample would give different data • Effects are fixed if they represent all possible members of a population: • eg, male/female; treatment groups; all the regions of the U.S.

Fixed versus Random Effects • Effects treated as fixed or random depending on the research question • Random effects: generalize from the sample to a larger population • Random effects: reduce variation due to small sample size by fitting a distribution • Fixed effects: Control for unmeasured confounding at the higher level

Methods of Analyzing Multilevel Data Fixed Effects Models: - Regression model with an indicator variable for each level 2 unit (minus one) - Conditional regression model Random Effects Models: - Regression model with generalized estimating equations (GEE) - Random or mixed effects regression model

What are “shrinkage estimates”? • Also called Bayesian or empiric Bayesian estimates (Iezzoni text) or Best linear unbiased prediction estimates (SAS) • Can only be obtained from a random effects (not GEE) regression model • Variance of the higher level variable is modeled as if from a specified distribution (usually normal, but other possible)

A Simple Random Effects Model • A simple random effects model is: yij =  + j+ eij, where  = overall mean, j = difference for MD, and eij = individual error • Model says there is random variation from the mean score at the level of MD’s plus variation at the level of patients • Bayesian estimates are the individual j’s obtained from the overall distribution

Example of Shrinkage Estimates • In Patient Outcomes Research Team study of patient satisfaction with MD treatment for diabetes, raw mean patient scores by MD ranged from 53.4 to 87.1 • The random effects shrinkage estimates of the mean patient scores by MD ranged from 60.4 to 78.6 • Random effects shrinkage estimates are closer to the overall mean

Controversy in Outcomes Research • Report Cards rank hospitals or physicians • Data used has at least two levels (hospitals or physicians and their patients) • Controversy is over the choice of statistical model for evaluating variation at the hospital or physician level

Methods of Analyzing Hospital (or MD) Mortality Variance • Ignore hospital, run ordinary regression then predict average for each hospital • Remove hospital effect with indicator variables for hospitals (fixed effects model) then predict average for each hospital • Run random effects regression and obtain the Bayesian/shrinkage estimates for each hospital

Shrinkage estimates and CA State CABG Data • Unadjusted estimate for each hosptial is estimated as from a normal distribution • More weight is given to hospitals with more CABG patients • Hospitals with smaller numbers move closer to the mean in modeling a normal distribution • Estimates somewhat software dependent

Shrinkage Estimates: Software • Obtaining shrinkage estimates involves some software choices • Not all software provides them • STATA by itself doesn’t provide them • Different likelihood methods of fitting models • STATA add-on GLLAMM (free download) • SAS • For linear outcome, PROC MIXED • For non-linear, PROC NLMIXED and GLIMMIX • Some other software for multilevel data

Multilevel Data in Outcomes Research