Advanced Machine Learning Approaches for Personalized Healthcare in Warfarin Dosing

Machine Learningfor HealthcareDavid PageDept. of Biostatistics & Medical Informaticsand Dept. of Computer SciencesUniversity of Wisconsin-Madison

PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Gender Birthdate P1 M 3/22/63 PatientID Date Lab Test Result PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months Electronic Medical Record

Individual Patient G + C + E Personalized Treatment Predictive Model for Disease Susceptibility & Treatment Response State-of-the-Art Machine Learning Genetic, Clinical, & Environmental Data Repeat for thousands of patients Predictive PersonalizedMedicine Repeat for hundreds of diseases and treatments

Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data • International WarfarinPharmacogenetics Consortium • (IWPC) • NEJM, February 19, 2009, vol. 360, no. 8

Motivation • “In Milestone, FDA Pushes Genetic Tests Tied to Drug” • Where: Front-page article, Wall Street Journal, August 16, 2007 • Why: FDA released new warfarin product labeling with pharmacogenomics dosing recommendations • What:New pharmacogenetics section and changes in initial dosage section with pharmacogentics in the warnings section • http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf

“In Milestone, FDA Pushes Genetic Tests Tied to Drug” Initial dosing (warfarin package insert) “The dosing of COUMADIN must be individualized according to patient’s sensitivity to the drug as indicated by the PT/INR….. It is recommended that COUMADIN therapy be initiated with a dose of 2 to 5 mg per day with dosage adjustments based on the results of PT/INR determinations.The lower initiation doses should be considered for patients with certain genetic variations in CYP2C9 and VKORC1 enzymes as well as for elderly and/or debilitated patients….” http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf

Clinicians’ responses to FDA labeling change for warfarins • How, exactly, would I use this information? • Nice science, but prove to me that it’s better than what we already do • i.e., I have to see a randomized trial comparing genotype-guided versus usual dosing • Summer 2009: the NHLBI Clarification of Optimal Anticoagulation through Genetics (COAG) trial (PI: Stephen Kimmel, MD)

Current warfarinpharmacogenetics information limitations • Clinical utility (or a randomized trial) will require dosing equation that incorporates genetic and non-genetic, demographic information. • Numerous such equations have been proposed, but: • most are highly geographically confined • none were developed from robust data in Asians, Caucasians, and Africans • Thus, an equation derived from a large, geographically and ethnically diverse population was needed to help insure global clinical utility.

IWPC - 21 research groups 4 continents and 9 countries • Asia • Israel, Japan, Korea, Taiwan, Singapore • Europe • Sweden, United Kingdom • North America • USA (11 states: Alabama, California, Florida, Illinois, Missouri, North Carolina, Pennsylvania, Tennessee, Utah, Washington, Wisconsin) • South America • Brazil

Dataset • 5,700 patients treated with warfarin • Demographic characteristics • Primary indication for warfarin treatment • Stable therapeutic dose of warfarin • Treatment INR • Target INR • 5,052 patients with a target INR of 2-3 • Concomitant medications • Grouped by increased or decreased effect on INR • Presence of genotype variants • CYP2C9(*1, *2 and *3) • VKORC1 (one of seven SNPs in linkage disequilibrium) • blinded re-genotyping for quality control

Age, height and weight

Average warfarin doses for stable INR (median – 2.5)

Race, inducers and amiodarone

CYP2C9 and VKORC1 genotypes

Weekly dose by CYP2C9 genotype

CYP2C9 genotype by race

Weekly dose by VKORC1 -1639 genotype

VKORC1 -1639 genotype by race

Modeling of VKORC1 SNPs • Missing values of VKORC1 -1639 G>A (rs9923231) • Imputed based on race and VKORC1 SNP data at 2255C>T (rs2359612), 1173 C>T (rs9934438), or 1542G>Crs8050894 • If the VKORC1 genotype could not be imputed, it was treated as “missing” (a distinct variable) in the model.

Data Analysis Methodology • Derivation Cohort • 4,043 patients with a stable dose of warfarin and target INR of 2-3 mg/week • Used for developing dose prediction models Validation Cohort • 1,009 patients (20% of dataset) • Used for testing final selected model Analysis group did not have access to validation set until after the final model was selected

Real-valued prediction methods used • Included, among others • Support vector regression • Regression trees • Model trees • Multivariate adaptive regression splines • Least-angle regression • Lasso • Logarithmic and square-root transformations • Direct prediction of dose Support vector regression and Ordinary least-squares linear regression gave the lowest mean absolute error • Predicted the square root of the dose • Incorporated both genetic and clinical data

IWPC pharmacogenetic dosing algorithm • **The output of this algorithm must be squared to compute weekly dose in mg • ^All references to VKORC1 refer to genotype for rs9923231

IWPC clinical dosing algorithm • **The output of this algorithm must be squared to compute weekly dose in mg

Results Inclusion of genotypes for CYP2C9 and VKORC1, in addition to clinical variables, are significantly closer to estimating the appropriate initial dose of warfarin than just a clinical or fixed-dose approach 46.2% of the population with ≤21 mg/wk or ≥49 mg/wk benefit the most • These are the patients for whom an underdose or overdose could have adverse clinical consequences. Patients requiring an intermediate dose are likely to obtain little benefit including genotypes

Model comparisons

Warfarin doses predicted for the clinical and PGx algorithms with and without amiodarone 50 yr old White Male 175 cm 80 kg Genotypes can change the recommended dose from >45 mg/wk to <10 mg/wk when all other factors equal!

Warfarin doses predicted for the clinical and PGx algorithms based on race and genotype 50 yr old Male 175 cm 80 kg Racial differences in the estimated dose are insignificant when genotypes included. Clinical algorithm may substantially overestimate or underestimate the dose.

% Patients with dose estimates within 20% of actual dose • Comparison of PGx, clinical • and fixed dose approaches • 3 dose groups shown (mg/wk) • low (≤21) • intermediate (>21 to <49) • high (≥49) • Fixed dose (35 mg/wk) • None of the estimates for • low and high dose groups were • within 20% of actual dose

Limitations of this study • Did not address the issue of whether a precise initial dose of warfarin translates into • improved clinical end points reduction in time needed to achieve a stable therapeutic INR, fewer INRs out of range, reduced incidence of bleeding or thromboembolic events • Did not have sufficient data across the 21 groups to include potentially important factors such as • smoking status, vitamin K intake, alcohol consumption, other genetic factors (e.g., CYP4F2, ApoE, GGCX), environmental factors

New England Journal of Medicine, Feb 2009 Data available at PharmGKB • www.pharmgkb.org • Accession number: PA162355460

Writing committee: Teri E. Klein, Russ B. Altman, Niklas Eriksson, Brian F. Gage, Stephen E. Kimmel, Ming-Ta M. Lee, Nita A. Limdi, David Page, Dan M. Roden, Michael J. Wagner, Michael D. Caldwell, Julie A. Johnson Data Contributors: Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Yuan-Tsong Chen Chang Gung Memorial Hospital, Chang Gung University, Taiwan, ROC: Ming-ShienWen China Medical University, Graduate Institute of Chinese Medical Science, Taichung, Taiwan, ROC: Ming-Ta M. Lee Hadassah Medical Organization, Israel: YosephCaraco, IditAchache, SimhaBlotnick, MordechaiMuszkat Inje University, Korea: Jae-Gook Shin, Ho-Sook Kim InstitutoNacional de Câncer, Brazil: Guilherme Suarez-Kurtz, Jamila Alessandra Perini InstitutoNacional de CardiologiaLaranjeiras, Brazil: Edimilson Silva-Assunção Intermountain Healthcare, USA: Jeffrey L. Anderson, Benjamin D. Horne, John F. Carlquist Marshfield Clinic, USA: Michael D. Caldwell, Richard L. Berg, James K. Burmester National University Hospital, Singapore: Boon Cher Goh, Soo-Chin Lee Newcastle University, United Kingdom: FarhadKamali, Elizabeth Sconce, Ann K. Daly University of Alabama, USA: Nita A. Limdi University of California, San Francisco, USA: Alan H.B. Wu University of Florida, USA: Julie A. Johnson, Taimour Y. Langaee, HuaFeng University of Illinois, Chicago, USA: Larisa Cavallari, Kathryn Momary University of Liverpool, United Kingdom: MunirPirmohamed, Andrea Jorgensen, Cheng HokToh, Paula Williamson University of North Carolina, USA: Howard McLeod, James P. Evans, Karen E. Weck University of Pennsylvania, USA: Stephen E. Kimmel, Colleen Brensinger University of Tokyo and RIKEN Center for Genomic Medicine, Japan: Yusuke Nakamura, Taisei Mushiroda University of Washington, USA: David Veenstra, Lisa Meckley, Mark J. Rieder, Allan E. Rettie Uppsala University, Sweden: Mia Wadelius, Niclas Eriksson, HåkanMelhus Vanderbilt University, USA: C. Michael Stein, Dan M. Roden, Ute Schwartz, Daniel Kurnik Washington University in St. Louis, USA: Brian F. Gage, Elena Deych, Petra Lenzini, Charles Eby Wellcome Trust Sanger Institute, United Kingdom: Leslie Y. Chen, PanosDeloukas IWPC Authors Statistical Analysis: University of Alabama, USA: Nita A. Limdi Marshfield Clinic, USA: Michael D. Caldwell North Carolina State University, USA: Alison Motsinger-Reif Stanford University, USA: Russ B. Altman, HershSagrieya, Teri E. Klein, Balaji S. Srinivasan Uppsala University, Uppsala Clinical Research Center, Sweden: Niclas Eriksson University of California, San Francisco, USA: Alan H.B. Wu University of North Carolina, USA: Michael J. Wagner University of Florida, USA: Julie A. Johnson University of Pennsylvania, USA: Stephen E. Kimmel University of Wisconsin-Madison, USA: David Page, Eric Lantz, Tim Chang Vanderbilt University, USA: Marylyn Ritchie Washington University in St. Louis, USA: Brian F. Gage, Elena Deych Genotyping QC of IWPC Samples: Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Liang-Suei Lu Genotype and Phenotype QC: Inje University, Korea: Jae-Gook Shin Marshfield Clinic, USA: Michael D. Caldwell Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan University of Alabama, USA: Nita A. Limdi University of Florida, USA: Julie A. Johnson University of Pennsylvania, USA: Stephen E. Kimmel University of North Carolina, USA: Michael J. Wagner University of Wisconsin-Madison, USA: David Page Washington University in St. Louis, USA: Brian F. Gage Vanderbilt University, USA: Marylyn Ritchie Data Curation: Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan University of North Carolina, USA: Michael J. Wagner Washington University in St. Louis, USA: Elena Deych

Application: Mammography • Provide decision support for radiologists • Variability due to differences in training and experience… to get 90% of cancers, have high false positive rate • Experts have higher cancer detection and fewer benign biopsies • Shortage of experts

Bayes Net for Mammography • Kahn, Roberts, Wang, Jenks, Haddawy (1995) • Kahn, Roberts, Shaffer, Haddawy (1997) • Burnside, Rubin, Shachter (2000) • Note: not CAD (computer-assisted diagnosis), which circles abnormalities in an image… this is based on data entered into National Mammography Database schema by radiologists

Ca++ Lucent Centered Milk of Calcium Mass Stability Ca++ Dermal Mass Margins Mass Density Ca++ Round Mass Shape Ca++ Dystrophic Mass Size Ca++ Popcorn Benign v. Malignant Ca++ Fine/ Linear Breast Density Mass P/A/O Ca++ Eggshell Skin Lesion Ca++ Pleomorphic Tubular Density FHx Ca++ Punctate Age Ca++ Amorphous HRT Architectural Distortion Asymmetric Density LN Ca++ Rod-like

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Mammography Database

Benign v. Malignant Calc Fine Linear Mass Size Level 1: Parameters P(Benign) = ?? .99 P(Yes| Benign) = P(Yes| Malignant) = .01 .55 ?? ?? P( size > 5| Benign) = P(size > 5| Malignant) = .33 .42 ?? ??

Data • Structured data from actual practice • National Mammography Database • Standard for reporting all abnormalities • Our dataset contains • 435 malignancies • 65,365 benign abnormalities • Link to biopsy results • Obtain disease diagnosis – our ground truth

Hypotheses • Learn relationships that are useful to radiologist • Improve by moving up learning hierarchy

Results (Radiology, 2009) • Trained (Level 2, TAN) Bayesian network model achieved an AUC of 0.966 which was significantly better than the radiologists’ AUC of 0.940 (P = 0.005) • Trained BN demonstrated significantly better sensitivity than the radiologist (89.5% vs. 82.3%—P = 0.009) at a specificity of 90% • Trained BN demonstrated significantly better specificity than the radiologist (93.4% versus 86.5%—P = 0.007) at a sensitivity of 85%

ROC: Level 2 (TAN) vs. Level 1

Precision-Recall Curves

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Mammography Database

Statistical Relational Learning • Learn probabilistic model, but don’t assume iid data: there may be relevant data in other rows or even other tables • Database schema: defines set of features

SRL Aggregates Information from Related Rows or Tables • Extend probabilistic models to relational databases • Probabilistic Relational Models(Friedman et al. 1999, Getoor et al. 2001) • Tricky issue: one to many relationships • Approach: use aggregation • PRMs cannot capture all relevant concepts

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Aggregate Illustration Aggregation Function: Min, Max, Average, etc.

New Schema Avg Size this Date 0.03 0.045 0.045 0.02 … Patient Abnormality Date Calcification … Mass Avg Size Loc Benign/ Fine/Linear Size this date Malignant P1 1 5/02 No 0.03 0.03 RU4 B P1 2 5/04 Yes 0.05 0.045 RU4 M P1 3 5/04 No 0.04 0.045 LL3 B P2 4 6/00 No 0.02 0.02 RL2 B … … … … … … … …

Level 3: Aggregates Avg Size this date Benign v. Malignant Calc Fine Linear Mass Size Note: Learn parameters for each node

Database Notion of View • New tables or fields defined in terms of existing tables and fields known as views • A view corresponds to alteration in database schema • Goal: automate the learning of views

Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Possible View

Advanced Machine Learning Approaches for Personalized Healthcare in Warfarin Dosing

Advanced Machine Learning Approaches for Personalized Healthcare in Warfarin Dosing

Presentation Transcript

Toward the Electronic Medical Record

Manitoba s Electronic Medical Record Program

Electronic medical record (EMR)

Electronic medical record (EMR)

Auditing Electronic Medical Record Systems

Electronic Medical Record Market by 2018

Electronic Medical Record Systems

Electronic Medical Record Systems

Electronic Medical Record Systems

Electronic Medical Record Systems

The Electronic Medical Record

Electronic Medical Record for Child Life

Electronic Health (medical) Record

Electronic Medical Record

Electronic Medical Record Features

Electronic Medical Record

Reinventing the Electronic Medical Record (EMR)

Electronic Medical Record

Electronic Medical Record

Free Electronic Medical Record Software Karewellness.

Electronic Medical Record