Validation of predictive regression models

Validation of predictive regression models Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician

Personal background • Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands • Frank Harrell: Health Evaluation Sciences, Univ of Virginia, Charlottesville, VA, USA “Validation of predictions from regression models is of paramount importance”

Learning objectives: knowledge of • common types of regression models • fundamental assumptions of regression models • performance criteria of predictive models • principles of different types of validation

Performance objectives • To be able to explain why validation is necessary for predictive models • To be able to judge the adequacy of a validation procedure

Predictive models provide quantitative estimates of an outcome, e.g. • Quality of life one year after surgery • Death at 30 days after surgery • Long term survival

Predictive models are often based on regression analysis • y ~ a + sum(bi*xi) y: outcome variable a: intercept bi: regression coefficient i xi: predictor variable i i in [1,many], usually 2 to 20

3 examples of regression • Quality of life one year after surgery: continuous outcome, linear regression • Death at 30 days after surgery: binary outcome, logistic regression • Long term survival: time-to-outcome, Cox regression

Predictive models make assumptions • Distribution • Linearity of continuous variables • Additivity of effects

Example: a simple logistic regression model • 30day mortality ~ a + b1*sex + b2*age Assumptions: • Distribution of 30day mortality is binomial • Age has a linear effect • The effects of sex and age can be added

Assessing model assumptions • Examine model residuals • Perform specific tests • add nonlinear terms, e.g. age+age2 • add interaction terms, e.g. sex*age

Model assumptions and predictions • Better predictions if assumptions are met • Some violation inherent in empirical data • Evaluate predictions in new data

Evaluation of predictions • Calibration • average of predictions correct? • low and high predictions correct? • Discrimination • distinguish low risk from high risk patients?

Example: predicted probabilities

3 types of validation • Apparent: performance on sample used to develop model • Internal: performance on population underlying the sample • External: performance on related but slightly different population

Apparent validity • Easy to calculate • Results in optimistic performance estimates

Apparent estimates optimistic since same data used for: • Definition of model structure: e.g. selection and coding of variables • Estimation of model parameters: e.g. regression coefficients • Evaluation of model performance: e.g. calibration and discrimination

Internal validity • More difficult to calculate • Test model in new data, random from underlying population

Why internal validation? • Honest estimate of performance should be obtained, at least for a population similar to the development sample • Internal validated performance sets an upper limit to what may be expected in other settings (external validity)

External validity • Moderately easy to calculate when new data are available • Test model in new data, different from development population

Why external validation? • Various factors may differ from development population, including • different selection of patients • different definitions of variables • different diagnostic or therapeutic procedures

Internal validation techniques • Split-sample: • development / validation • Cross-validation: • alternating development / validation • extreme: n-1 develop / 1 validate (‘jack-knife’) • Bootstrap

Bootstrap is the preferred internal validation technique • bootstrap sample for model development: n patients drawn with replacement • original sample for validation: n patients • difference: optimism • efficiency: development and validation on n patients

Example: bootstrap results for logistic regression model • 30-day mortality ~ a + b1*sex + b2*age Apparent area under the ROC curve: 0.77 Mean area of 200 bootstrap samples:0.772 Mean area of 200 tests in original: 0.762 Optimism in apparent performance: 0.01 Optimism-corrected area: 0.76

External validation techniques • Temporal validation: same investigators, validate in recent years • Spatial validation (other place): same investigators, cross-validate in centers • Fully external: other investigators, other centers

Example: external validity of logistic regression model • 30-day mortality ~ a + b1*sex + b2*age Apparent area in 785 patients: 0.77 Tested in 20,318 other patients: 0.74 Tested by other investigators: ?

Example: external validation

Summary • Apparent validity gives an optimistic estimate of model performance • Internal validity may be estimated by bootstrapping • External validity should be determined in other populations

Key references • tutorial and book on multivariable models(Harrell 1996, Stat Med 15:361-87; Harrell: regression modeling strategies, Springer 2001) • empirical evaluations of strategies(Steyerberg 2000: Stat Med19: 1059-79) • internal validation (Steyerberg 2001:JCE 54: 774-81) • external validation (Justice 1999: Ann Intern Med 130:515-24; Altman 2000: Stat Med 19: 453-73)

Links • Interactive text book on predictive modelinghttp://www.neri.org/symptom/mockup/Chapter_8/ • Harrell’s Regression modeling strategieshttp://hesweb1.med.virginia.edu/biostat/rms/

Validation of predictive regression models

Validation of predictive regression models

Presentation Transcript

Regression Models

Regression Models

Regression Models

Regression Models

Regression Models

Regression Models

Regression Models

Regression Models

LINEAR REGRESSION: Evaluating Regression Models

Regression Models

Regression Models

Regression Models

Validation of Predictive Models: Acceptable Prediction Zone Method

Adequacy of Regression Models

Regression Models

Validation of Predictive Classifiers

Regression Models

Regression Models

Regression Models