1 / 65

Consistent Assessment of Biomarker and Subgroup Identification Methods H.D. Hollins Showalter

Consistent Assessment of Biomarker and Subgroup Identification Methods H.D. Hollins Showalter. Outline. Background Data Generation Performance Measurement Example Operationalization Conclusion. Outline. Background Data Generation Performance Measurement Example Operationalization

kasia
Télécharger la présentation

Consistent Assessment of Biomarker and Subgroup Identification Methods H.D. Hollins Showalter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consistent Assessment of Biomarker and Subgroup Identification Methods H.D. Hollins Showalter

  2. Outline Background Data Generation Performance Measurement Example Operationalization Conclusion

  3. Outline Background Data Generation Performance Measurement Example Operationalization Conclusion

  4. Tailored Therapeutics Adapted from slides presented by William L. Macias, MD, PhD, Eli Lilly • *Opening Remarks at 2009 Investor Meeting, John C. Lechleiter, Ph.D. A medication for which treatment decisions are based on the molecular profile of the patient, the disease, and/or the patient’s response to treatment. A tailored therapeuticallows the sponsor to make a regulatory approved claim of an expected treatment effect (efficacy or safety) “Tailored therapeutics can significantly increase value—first, for patients—who achieve better outcomes with less risk and, second, for payers—who more frequently get the results they expect.”*

  5. Achieving Tailored Therapeutics Data source: clinical trials (mostly) Objective: identify biomarkers and subgroups Challenges: complexity, multiplicity Need: modern statistical methods

  6. Prognostic vs. Predictive Markers Prognostic Marker Single trait or signature of traits that identifies different groups of patients with respect to the risk of an outcome of interest in the absence of treatment Predictive Marker Single trait or signature of traits that identifies different groups of patients with respect to the outcome of interest in response to a particular treatment

  7. Statistical Interactions Treatment Treatment by Marker Effect Treatment Response Treatment Effect No treatment Marker Effect - + Marker Y = 0+ 1*M + 2*T + 3*M*T + 

  8. Types of Predictive Markers Treatment Treatment Treatment Response Response Response No treatment No treatment No treatment - - - + + + Marker Marker Marker Treatment Response No treatment - + Marker

  9. Predictive Marker Example Subgroup of Interest Subgroup of Interest M+ M− M+ M− Group size: 25% x2 = 1 Trt response: -1.39 Pl response: -0.19 Treatment effect: -1.20 Group size: 50% Trt response: -1.17 Pl response: -0.09 Treatment effect: -1.08 Trt response: -0.33 Pl response: -0.20 Treatment effect: -0.13 Group size: 75% Trt response: -0.23 Pl response: -0.13 Treatment effect: -0.1 x2 = 0 x1 =0 x1 = 1 x1 =0 x1 = 1 Entire Population Entire Population

  10. BSID vs. “Traditional” Analysis • Traditional subgroup analysis • Interaction testing, one at a time • Many statistical issues • Many gaps for tailoring • Biomarker and subgroup identification (BSID) • Utilizes modern statistical methods • Addresses issues with subgroup analysis • Maximizes tailoring opportunities

  11. Simulation to Assess BSID Methods Objective Consistent, rigorous, and comprehensive calibration and comparison of BSID methods Value • Further improve methodology • Identify the gaps (where existing methods perform poorly) • Synergy/combining ideas from multiple methods • Optimize application for specific clinical trials

  12. BSID Simulation: Three Components • Data generation • Key is consistency • BSID • “Open” and comprehensive application of analysis method(s) • Performance measurement • Key is consistency

  13. BSID Simulation: Visual Representation Data Generation Truth Dataset n Dataset … Dataset 1 Dataset 2 BSID Results n Results … Results 1 Results 2 Performance Measurement Performance Metrics n Performance Metrics … Performance Metrics 1 Performance Metrics 2 Overall Performance Metrics

  14. Outline Background Data Generation Performance Measurement Example Operationalization Conclusion

  15. Data Generation • Creating virtual trial data • Make assumptions in order to emulate real trial data • Knowledge of disease and therapies, including historical data • Specific to BSID: must embed markers and subgroups • In order to measure the performance of BSID methodology the “truth” is needed • This is challenging/impossible to discern using real trial data

  16. Data Generation Survey logit model (w/o and with subject-specific effects “contribution model” linear model (on probability scale) “tree model” exponential model

  17. Data Generation: Recommendations • Clearly identify attributes and models • Transparency • Traceability of analysis • Make sure to capture the “truth” in a way that facilitates performance measurement • Derive efficiency and synergistic value (more on this later!)

  18. Data Generation: Specifics • Identify key attributes • Sample size • Number of predictors • Response type • Predictor type/correlation • Subgroup size • Sizes of effects: placebo response, overall treatment effect, predictive effect(s), prognostic effect(s) • Others: Missing data, treatment assignment • Specify model

  19. Data Generation: Recommendations • Clearly identify attributes and models • Transparency • Traceability of analysis • Make sure to capture the “truth” in a way that facilitates performance measurement • Derive efficiency and synergistic value (more on this later!)

  20. Data Generation: Reqs • Format data consistently • Make code flexible enough to accommodate any/all attributes and models • Ensure that individual datasets can be reproduced (i.e., various seeds for random number generation) • The resulting dataset(s) should always have the same look and feel

  21. Outline Background Data Generation Performance Measurement Example Operationalization Conclusion

  22. Performance Measurement Quantifying the ability of BSID methodology to recapture the “truth” underlying the (generated) data If done consistently, allows calibration and comparison of BSID methods

  23. Performance Measurement: Survey QUINT5 SIDES (2011)1 IT6 SIDES (2014)2 GUIDE4 VT3 (RP1a) Pr(type I errors) Selection rate Frequencies of the final tree sizes Pr(complete match) Finding correct X’s Pr(selection at 1st or 2nd level splits of trees) Complete match rate (RP1b) Pr(type II errors) Pr(partial match) Closeness of to the true Frequency of (predictor) “hits” Partial match rate Accuracy (RP2) Rec. of tree complexity Pr(selecting a subset) Closeness of the size of to the size of the true Pr(nontrivial tree) Confirmation rate Bias assessment via likelihood ratio and logrank tests Pr(selecting a superset) (RP3) Rec. of splitting varsand split points. Treatment effect fraction Power Treatment effect fraction (updated def.) Properties of as an estimator of (RP4) Rec. of assignments of observations to partition classes

  24. Performance Measurement: Recommendations testing estimation prediction

  25. Perf. Measurement: Survey Revisited Marker Level Subgroup Level Subj. Level QUINT5 SIDES (2011)1 IT6 SIDES (2014)2 GUIDE4 VT3 Selection rate Properties of as an estimator of (RP1a) Pr(type I errors) Selection rate Frequencies of the final tree sizes Treatment effect fraction Pr(complete match) Pr(selecting a superset) Finding correct X’s Pr(selection at 1st or 2nd level splits of trees) (RP1b) Pr(type II errors) Closeness of to the true Complete match rate Complete match rate (RP1b) Pr(type II errors) Pr(partial match) Closeness of to the true (RP2) Rec. of tree complexity Finding correct X’s Accuracy Treatment effect fraction (updated def.) Frequency of (predictor) “hits” Partial match rate Partial match rate (RP4) Rec. of assignments of observations to partition classes Accuracy (RP2) Rec. of tree complexity Pr(selecting a subset) Closeness of the size of to the size of the true Bias assessment via likelihood ratio and logrank tests Power Pr(nontrivial tree) (RP3) Rec. of splitting varsand split points. Confirmation rate Closeness of the size of to the size of the true Confirmation rate Pr(selection at 1st or 2nd level splits of trees) Bias assessment via likelihood ratio and logrank tests Pr(selecting a superset) (RP3) Rec. of splitting varsand split points. Treatment effect fraction Pr(complete match) Power Treatment effect fraction (updated def.) Frequencies of the final tree sizes (prediction) Properties of as an estimator of (estimation) Pr(nontrivial tree) Pr(partial match) (RP4) Rec. of assignments of observations to partition classes SIDES (2011)1 GUIDE4 Pr(selecting a subset) (RP1a) Pr(type I errors) Frequency of (predictor) “hits” SIDES (2014)2 QUINT5 IT6 VT3 (testing)

  26. Contingency Table: Marker Level Predictive Biomarker True False True Positive False Positive Yes Identified as Predictive • Sensitivity = True Positive / True Predictive Biomarkers • Specificity = True Negative / False Predictive Biomarkers • PPV = True Positive / Identified as Predictive • NPV = True Negative / Not Identified as Predictive False Negative True Negative No

  27. Performance Measures: Marker Level # and % of predictors: true vs. identified Sensitivity Specificity PPV NPV

  28. Performance Measures: Subgroup Level • Size of identified subgroup • Treatment effect in the identified subgroup • Average the true “individual” treatment effects under potential outcomes framework • Accuracy of estimated treatment effect • Difference (both absolute and direction) between estimate and true effect

  29. Perf. Measures: Subgroup Level, cont. • Implications on sample size/time/cost of future trials • Given true treatment effect, what is the number of subjects needed in the trial for 90% power? • What is the cost of the trial? (mainly driven by # enrolled) • How much time will the trial take? (mainly driven by # screened)

  30. Contingency Table: Subject Level Potential to Realize Enhanced Treatment Effect* True False True Positive *at a meaningful or desired level False Positive M+ Membership Classification False Negative True Negative M- • Sensitivity = True Positive / True Enhanced Treatment Effect • Specificity = True Negative / False Enhanced Treatment Effect • PPV = True Positive / Classified as M+ • NPV = True Negative / Classified as M-

  31. Performance Measures: Subject Level Compare subgroup membership on the individual level: true vs. identified Sensitivity Specificity PPV NPV

  32. Conditional Performance Measures • Same metrics with Null submissions removed • Markers/subgroups can be very difficult to find. When a method DOES find something, how accurate is it? • Hard(er) to compare multiple methods when all performance measures are washed out by Null submissions

  33. Cond. Subgroup Level Measures Example 1000 simulations BSID Method A 900/1000: Null 100/1000: x1 = 1 BSID Method B 900/1000: Null 50/1000: x1 = 1 50/1000: x2 = 1 M− M+ Group size: 50% x2 = 1 Group size: 50% Conditional Size: 0.5 Effect: 10 Unconditional Size: 0.95 Effect: 5.25 Conditional Size: 0.5 Effect: 7.5 Unconditional Size: 0.95 Effect: 5.5 Treatment effect: 10 Group size: 50% Treatment effect: 0 Group size: 50% x2 = 0 x1 = 1 x1 =0 Truth (but x1 very hard to find)

  34. Performance Measurement: Reqs For each application of BSID user proposes: • List of predictive biomarkers • The onesubgroup for designing the next study • Estimated treatment effect in this subgroup • In conjunction with the “truth” underlying the generated data, all of the recommended performance measures can be calculated using these elements

  35. Considering the “Three Levels” What are the most important and relevant measures of a result? Depends on the objective…

  36. Outline Background Data Generation Performance Measurement Example Operationalization Conclusion

  37. Data Generation Example linear model

  38. Data Generation Example, cont.

  39. Data Generation Example, concl. x_1_1_1 x_1_1_1 Dataset 1 Trt 0: -0.141 Trt 1: -0.407 Effect: -0.266 Dataset 21 Trt 0: -0.018 Trt 1: -0.427 Effect: -0.409

  40. BSID Methods Applied to Example • Alpha controlled at 0.1

  41. Performance Measurement Example Proposal Truth + Performance Measures =

  42. Perf. Measurement Example, cont.

  43. Perf. Measurement Example, concl.

  44. Outline Background Data Generation Performance Measurement Example Operationalization Conclusion

  45. Strategy • Develop framework (done/ongoing) • Present/get input (current) • Internal and external forums • Workshop • Establish an open environment (future) • R package on CRAN • Web portal repository

  46. Predictive Biomarker Project: Vision • Access Web Portal • Reads open description (objective, models, formats etc.) • Access web interface for Data Generation • Generate data under specified scenarios, or utilize “standard”/pre-existing scenarios • Apply BSID methodology to datasets • Express results in the specified format • Access web interface for Performance Measurement • Compare performance • Encouraged to contribute to Repository • Open sharing of results, descriptions, programs

  47. Pros and Cons Pros More convenient and useful simulation studies to aid research Direct comparisons of performance by methods Optimization of methods for relevant and important scenarios for drug development New insights and collaborations Data sets could be applied for other statistical problems Cons Need to develop infrastructure to support simulated data Access and upkeep Need experts to explicitly define the scope

  48. Outline Background Data Generation Performance Measurement Example Operationalization Conclusion

  49. Conclusion Simulation studies are a common approach to assessing BSID methods but there is a lack of consistency in data generation and performance measurement The presented framework enables consistent, rigorous, comprehensive calibration and comparison of BSID methods Collaborating on this effort will result in efficiency and synergistic value

  50. Acknowledgements Richard Zink Lei Shen Chakib Battioui Steve Ruberg Ying Ding Michael Bell

More Related