Prognostic Model Building with Biomarkers in Pharmacogenomics Trials

Prognostic Model Building with Biomarkers in Pharmacogenomics Trials Li-an Xu & Douglas Robinson Statistical Genetics & Biomarkers Exploratory Development, Global Biometric Sciences Bristol-Myers Squibb 2006 FDA/Industry Statistics WorkshopTheme - Statistics in the FDA and Industry: Past, Present, and FutureWashington, DC September 27-29, 2006

Outline • Statistical Challenges in Prognostic Model Building • Data quantity and quality across multiple platforms • Dimension reduction in model building process • Model performance measures • Realistic assessment of model performance • Handling correlated predictors: when p >> n

Data Quantity and Quality Across Platforms • Tumor samples for mRNA • Trial A Sample Size : 161 Subjects • 134 usable (sufficient quality and quantity) mRNA samples (85%) • Trial B Sample Size : 110 Subjects • 83 usable mRNA samples (75%) • Plasma protein profiling (Liquid Chromatography / Mass Spectrometry) • Trial B Sample Size : 110 Subjects • 90 usable plasma samples (82%) • Even if sample collection is mandatory, usable sample size < subject sample size • Need to design studies based on expected usable sample size

Dimension Reduction in Prognostic Model Building • Number of potential predictors is greater than number of subjects (p>>n) in high throughput biomarker studies • No unique solutions in prognostic model fitting with traditional methods • Regularized methods can provide some possible solutions • Penalized logistic regression (PLR) + Recursive Feature Elimination (RFE) • Threshold gradient descent + RFE • Further dimension reduction may still be needed • Incorporate prior information (e.g. results from preclinical studies as the starting point for p) • Intersection of single-biomarker results from multiple statistical methods

~22,000 genes 1 gene Choose the model with the smallest cross-validation error and fewest genes Average Cross-validation Error Number of predictors in model Dimension Reduction Through Penalized Logistic Regression with Recursive Feature Elimination to Select Genes Training Set Genes Patients

High Sensitive Resistant Expression level Low Dimension Reduction Through Preclinical Studies • Predicting cell line sensitivity to a compound • 18 cancer cell lines (12 sensitive, 6 resistant) • Identified top 200 genes associated with in vitro sensitivity/resistance Sensitive Resistant Expression 18 Caner Cell Lines Example of one gene

Predicting Response in Trial A • Dimension reduction by using prior preclinical results seemed to help in this trial

Dimension Reduction Through Intersection of Single-Biomarker Results from Multiple Statistical Methods Logistic Regression 297 Probesets t – Test 396 Probesets 46 97 51 Cox Proportional Hazards: 446 Probesets • Intersection resulted in 51 potential candidates • It may be more beneficial to start model building with this set than the complete set of potential predictors (work currently in progress)

Model Performance Measures • Sensitivity, Specificity, Positive and Negative Predictive Value are common measures of model performance • Dependent on the threshold • Area under the ROC curve (AUC) may be a better measure for comparing models • These figures are from simulated perfect predictors • All three models yield complete separation between responders and non-responders • Arbitrary threshold of 0.5 probability may lead one to believe that model 2 is superior • AUC correctly shows equivalence

Realistic Assessment of Model Performance • When sample size is reasonably large • Split sample into a training set and independent test Set • Build the model on the training set and test the model performance on the test set • Pro: One independent test of model performance for the model picked in the training set • Cons: • When sample size is small, the estimate of performance may have a large variance • Reduced sample size for training may yield sub-optimal model • Christophe Ambroise & Geoffrey J. • McLachlan, PNAS 99(10): 2002 • Entire model building procedure should be cross-validated

Individual runs Average AUC Realistic Assessment of Model Performance • When sample size is small, one cannot split data into training / test set • Cross–validation alone is a reasonable alternative • Warning: Initial performance estimate may be misleading Cross-validated AUC Number of Predictors • Cross-validation should be repeated multiple times • Allows one to observe effects of sampling variability • The average of replicate estimators gives a more accurate assessment of model performance

Handling Correlated Predictors: When p >> n • Complex correlation structure (mRNA as example) • Multiple probe sets interrogate the same gene • Multiple genes function together in pathways • Not all pathways are known • Multiple response definitions that are interrelated • False positive genes may be correlated with true positives • Most prognostic modeling techniques do not handle this well • Recursive feature elimination may remove important predictors because of correlations • This is an open research problem

Summary • Need to design studies based on expected usable sample size • Dimension reduction in the model building process • Overfitting problem can be mitigated by regularized methods • To further reduce the candidate set of predictors • Preclinical information can be useful • Intersection of single-biomarker results by different statistical methods may also be useful • Model performance • Independent test set may be important for validation purposes. When sample size is small, cross-validation is a viable alternative. • Cross-validation should include biomarker selection procedures and needs to be performed appropriately • Cross-validation should be repeated multiple times • Performance measures should be carefully chosen when comparing multiple models. AUC often is a good choice. • Handling correlated predictors is still an open research problem

Acknowledgments Haolan Lu David Mauro Shelley Mayfield Oksana Mokliatchouk Relekar Padmavathibai Barry Paul Lynn Ploughman Amy Ronczka Katy Simonsen Eric Strittmatter Dana Wheeler Shujian Wu Shuang Wu Kim Zerba Renping Zhang Can Cai Scott Chasalow Ed Clark Mark Curran Ashok Dongre Matt Farmer Alexander Florczyk Shirin Ford Susan Galbraith Ji Gao Nancy Gustafson Ben Huang Tom Kelleher Christiane Langer Hyerim Lee

Prognostic Model Building with Biomarkers in Pharmacogenomics Trials