1 / 80

Electronic Medical Record

Machine Learning for Healthcare David Page Dept. of Biostatistics & Medical Informatics and Dept. of Computer Sciences University of Wisconsin-Madison. PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic

ember
Télécharger la présentation

Electronic Medical Record

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learningfor HealthcareDavid PageDept. of Biostatistics & Medical Informaticsand Dept. of Computer SciencesUniversity of Wisconsin-Madison

  2. PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Gender Birthdate P1 M 3/22/63 PatientID Date Lab Test Result PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months Electronic Medical Record

  3. Individual Patient G + C + E Personalized Treatment Predictive Model for Disease Susceptibility & Treatment Response State-of-the-Art Machine Learning Genetic, Clinical, & Environmental Data Repeat for thousands of patients Predictive PersonalizedMedicine Repeat for hundreds of diseases and treatments

  4. Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data • International WarfarinPharmacogenetics Consortium • (IWPC) • NEJM, February 19, 2009, vol. 360, no. 8

  5. Motivation • “In Milestone, FDA Pushes Genetic Tests Tied to Drug” • Where: Front-page article, Wall Street Journal, August 16, 2007 • Why: FDA released new warfarin product labeling with pharmacogenomics dosing recommendations • What:New pharmacogenetics section and changes in initial dosage section with pharmacogentics in the warnings section • http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf

  6. “In Milestone, FDA Pushes Genetic Tests Tied to Drug” Initial dosing (warfarin package insert) “The dosing of COUMADIN must be individualized according to patient’s sensitivity to the drug as indicated by the PT/INR….. It is recommended that COUMADIN therapy be initiated with a dose of 2 to 5 mg per day with dosage adjustments based on the results of PT/INR determinations.The lower initiation doses should be considered for patients with certain genetic variations in CYP2C9 and VKORC1 enzymes as well as for elderly and/or debilitated patients….” http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf

  7. Clinicians’ responses to FDA labeling change for warfarins • How, exactly, would I use this information? • Nice science, but prove to me that it’s better than what we already do • i.e., I have to see a randomized trial comparing genotype-guided versus usual dosing • Summer 2009: the NHLBI Clarification of Optimal Anticoagulation through Genetics (COAG) trial (PI: Stephen Kimmel, MD)

  8. Current warfarinpharmacogenetics information limitations • Clinical utility (or a randomized trial) will require dosing equation that incorporates genetic and non-genetic, demographic information. • Numerous such equations have been proposed, but: • most are highly geographically confined • none were developed from robust data in Asians, Caucasians, and Africans • Thus, an equation derived from a large, geographically and ethnically diverse population was needed to help insure global clinical utility.

  9. IWPC - 21 research groups 4 continents and 9 countries • Asia • Israel, Japan, Korea, Taiwan, Singapore • Europe • Sweden, United Kingdom • North America • USA (11 states: Alabama, California, Florida, Illinois, Missouri, North Carolina, Pennsylvania, Tennessee, Utah, Washington, Wisconsin) • South America • Brazil

  10. Dataset • 5,700 patients treated with warfarin • Demographic characteristics • Primary indication for warfarin treatment • Stable therapeutic dose of warfarin • Treatment INR • Target INR • 5,052 patients with a target INR of 2-3 • Concomitant medications • Grouped by increased or decreased effect on INR • Presence of genotype variants • CYP2C9(*1, *2 and *3) • VKORC1 (one of seven SNPs in linkage disequilibrium) • blinded re-genotyping for quality control

  11. Age, height and weight

  12. Average warfarin doses for stable INR (median – 2.5)

  13. Race, inducers and amiodarone

  14. CYP2C9 and VKORC1 genotypes

  15. Weekly dose by CYP2C9 genotype

  16. CYP2C9 genotype by race

  17. Weekly dose by VKORC1 -1639 genotype

  18. VKORC1 -1639 genotype by race

  19. Modeling of VKORC1 SNPs • Missing values of VKORC1 -1639 G>A (rs9923231) • Imputed based on race and VKORC1 SNP data at 2255C>T (rs2359612), 1173 C>T (rs9934438), or 1542G>Crs8050894 • If the VKORC1 genotype could not be imputed, it was treated as “missing” (a distinct variable) in the model.

  20. Data Analysis Methodology • Derivation Cohort • 4,043 patients with a stable dose of warfarin and target INR of 2-3 mg/week • Used for developing dose prediction models Validation Cohort • 1,009 patients (20% of dataset) • Used for testing final selected model Analysis group did not have access to validation set until after the final model was selected

  21. Real-valued prediction methods used • Included, among others • Support vector regression • Regression trees • Model trees • Multivariate adaptive regression splines • Least-angle regression • Lasso • Logarithmic and square-root transformations • Direct prediction of dose Support vector regression and Ordinary least-squares linear regression gave the lowest mean absolute error • Predicted the square root of the dose • Incorporated both genetic and clinical data

  22. IWPC pharmacogenetic dosing algorithm • **The output of this algorithm must be squared to compute weekly dose in mg • ^All references to VKORC1 refer to genotype for rs9923231

  23. IWPC clinical dosing algorithm • **The output of this algorithm must be squared to compute weekly dose in mg

  24. Results Inclusion of genotypes for CYP2C9 and VKORC1, in addition to clinical variables, are significantly closer to estimating the appropriate initial dose of warfarin than just a clinical or fixed-dose approach 46.2% of the population with ≤21 mg/wk or ≥49 mg/wk benefit the most • These are the patients for whom an underdose or overdose could have adverse clinical consequences. Patients requiring an intermediate dose are likely to obtain little benefit including genotypes

  25. Model comparisons

  26. Warfarin doses predicted for the clinical and PGx algorithms with and without amiodarone 50 yr old White Male 175 cm 80 kg Genotypes can change the recommended dose from >45 mg/wk to <10 mg/wk when all other factors equal!

  27. Warfarin doses predicted for the clinical and PGx algorithms based on race and genotype 50 yr old Male 175 cm 80 kg Racial differences in the estimated dose are insignificant when genotypes included. Clinical algorithm may substantially overestimate or underestimate the dose.

  28. % Patients with dose estimates within 20% of actual dose • Comparison of PGx, clinical • and fixed dose approaches • 3 dose groups shown (mg/wk) • low (≤21) • intermediate (>21 to <49) • high (≥49) • Fixed dose (35 mg/wk) • None of the estimates for • low and high dose groups were • within 20% of actual dose

  29. Limitations of this study • Did not address the issue of whether a precise initial dose of warfarin translates into • improved clinical end points reduction in time needed to achieve a stable therapeutic INR, fewer INRs out of range, reduced incidence of bleeding or thromboembolic events • Did not have sufficient data across the 21 groups to include potentially important factors such as • smoking status, vitamin K intake, alcohol consumption, other genetic factors (e.g., CYP4F2, ApoE, GGCX), environmental factors

  30. New England Journal of Medicine, Feb 2009 Data available at PharmGKB • www.pharmgkb.org • Accession number: PA162355460

  31. Writing committee: Teri E. Klein, Russ B. Altman, Niklas Eriksson, Brian F. Gage, Stephen E. Kimmel, Ming-Ta M. Lee, Nita A. Limdi, David Page, Dan M. Roden, Michael J. Wagner, Michael D. Caldwell, Julie A. Johnson Data Contributors: Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Yuan-Tsong Chen Chang Gung Memorial Hospital, Chang Gung University, Taiwan, ROC: Ming-ShienWen China Medical University, Graduate Institute of Chinese Medical Science, Taichung, Taiwan, ROC: Ming-Ta M. Lee Hadassah Medical Organization, Israel: YosephCaraco, IditAchache, SimhaBlotnick, MordechaiMuszkat Inje University, Korea: Jae-Gook Shin, Ho-Sook Kim InstitutoNacional de Câncer, Brazil: Guilherme Suarez-Kurtz, Jamila Alessandra Perini InstitutoNacional de CardiologiaLaranjeiras, Brazil: Edimilson Silva-Assunção Intermountain Healthcare, USA: Jeffrey L. Anderson, Benjamin D. Horne, John F. Carlquist Marshfield Clinic, USA: Michael D. Caldwell, Richard L. Berg, James K. Burmester National University Hospital, Singapore: Boon Cher Goh, Soo-Chin Lee Newcastle University, United Kingdom: FarhadKamali, Elizabeth Sconce, Ann K. Daly University of Alabama, USA: Nita A. Limdi University of California, San Francisco, USA: Alan H.B. Wu University of Florida, USA: Julie A. Johnson, Taimour Y. Langaee, HuaFeng University of Illinois, Chicago, USA: Larisa Cavallari, Kathryn Momary University of Liverpool, United Kingdom: MunirPirmohamed, Andrea Jorgensen, Cheng HokToh, Paula Williamson University of North Carolina, USA: Howard McLeod, James P. Evans, Karen E. Weck University of Pennsylvania, USA: Stephen E. Kimmel, Colleen Brensinger University of Tokyo and RIKEN Center for Genomic Medicine, Japan: Yusuke Nakamura, Taisei Mushiroda University of Washington, USA: David Veenstra, Lisa Meckley, Mark J. Rieder, Allan E. Rettie Uppsala University, Sweden: Mia Wadelius, Niclas Eriksson, HåkanMelhus Vanderbilt University, USA: C. Michael Stein, Dan M. Roden, Ute Schwartz, Daniel Kurnik Washington University in St. Louis, USA: Brian F. Gage, Elena Deych, Petra Lenzini, Charles Eby Wellcome Trust Sanger Institute, United Kingdom: Leslie Y. Chen, PanosDeloukas IWPC Authors Statistical Analysis: University of Alabama, USA: Nita A. Limdi Marshfield Clinic, USA: Michael D. Caldwell North Carolina State University, USA: Alison Motsinger-Reif Stanford University, USA: Russ B. Altman, HershSagrieya, Teri E. Klein, Balaji S. Srinivasan Uppsala University, Uppsala Clinical Research Center, Sweden: Niclas Eriksson University of California, San Francisco, USA: Alan H.B. Wu University of North Carolina, USA: Michael J. Wagner University of Florida, USA: Julie A. Johnson University of Pennsylvania, USA: Stephen E. Kimmel University of Wisconsin-Madison, USA: David Page, Eric Lantz, Tim Chang Vanderbilt University, USA: Marylyn Ritchie Washington University in St. Louis, USA: Brian F. Gage, Elena Deych Genotyping QC of IWPC Samples: Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Liang-Suei Lu Genotype and Phenotype QC: Inje University, Korea: Jae-Gook Shin Marshfield Clinic, USA: Michael D. Caldwell Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan University of Alabama, USA: Nita A. Limdi University of Florida, USA: Julie A. Johnson University of Pennsylvania, USA: Stephen E. Kimmel University of North Carolina, USA: Michael J. Wagner University of Wisconsin-Madison, USA: David Page Washington University in St. Louis, USA: Brian F. Gage Vanderbilt University, USA: Marylyn Ritchie Data Curation: Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan University of North Carolina, USA: Michael J. Wagner Washington University in St. Louis, USA: Elena Deych

  32. Application: Mammography • Provide decision support for radiologists • Variability due to differences in training and experience… to get 90% of cancers, have high false positive rate • Experts have higher cancer detection and fewer benign biopsies • Shortage of experts

  33. Bayes Net for Mammography • Kahn, Roberts, Wang, Jenks, Haddawy (1995) • Kahn, Roberts, Shaffer, Haddawy (1997) • Burnside, Rubin, Shachter (2000) • Note: not CAD (computer-assisted diagnosis), which circles abnormalities in an image… this is based on data entered into National Mammography Database schema by radiologists

  34. Ca++ Lucent Centered Milk of Calcium Mass Stability Ca++ Dermal Mass Margins Mass Density Ca++ Round Mass Shape Ca++ Dystrophic Mass Size Ca++ Popcorn Benign v. Malignant Ca++ Fine/ Linear Breast Density Mass P/A/O Ca++ Eggshell Skin Lesion Ca++ Pleomorphic Tubular Density FHx Ca++ Punctate Age Ca++ Amorphous HRT Architectural Distortion Asymmetric Density LN Ca++ Rod-like

  35. Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Mammography Database

  36. Benign v. Malignant Calc Fine Linear Mass Size Level 1: Parameters P(Benign) = ?? .99 P(Yes| Benign) = P(Yes| Malignant) = .01 .55 ?? ?? P( size > 5| Benign) = P(size > 5| Malignant) = .33 .42 ?? ??

  37. Level 2: Structure + Parameters Benign v. Malignant P(Benign) = .99 Calc Fine Linear Mass Size P(Yes| Benign) = .01 P(Yes| Malignant) = .55 P(Yes) = .02 P( size > 5 )= .1 P(size > 5| Benign ^ Yes) = .4 P(size > 5| Malignant ^ Yes) = .6 P(size > 5| Benign ^ No) = .05 P(size > 5| Malignant ^ No) = .2 P( size > 5| Benign) = .33 P(size > 5| Malignant) = .42

  38. Data • Structured data from actual practice • National Mammography Database • Standard for reporting all abnormalities • Our dataset contains • 435 malignancies • 65,365 benign abnormalities • Link to biopsy results • Obtain disease diagnosis – our ground truth

  39. Hypotheses • Learn relationships that are useful to radiologist • Improve by moving up learning hierarchy

  40. Results (Radiology, 2009) • Trained (Level 2, TAN) Bayesian network model achieved an AUC of 0.966 which was significantly better than the radiologists’ AUC of 0.940 (P = 0.005) • Trained BN demonstrated significantly better sensitivity than the radiologist (89.5% vs. 82.3%—P = 0.009) at a specificity of 90% • Trained BN demonstrated significantly better specificity than the radiologist (93.4% versus 86.5%—P = 0.007) at a sensitivity of 85%

  41. ROC: Level 2 (TAN) vs. Level 1

  42. Precision-Recall Curves

  43. Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Mammography Database

  44. Statistical Relational Learning • Learn probabilistic model, but don’t assume iid data: there may be relevant data in other rows or even other tables • Database schema: defines set of features

  45. SRL Aggregates Information from Related Rows or Tables • Extend probabilistic models to relational databases • Probabilistic Relational Models(Friedman et al. 1999, Getoor et al. 2001) • Tricky issue: one to many relationships • Approach: use aggregation • PRMs cannot capture all relevant concepts

  46. Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Aggregate Illustration Aggregation Function: Min, Max, Average, etc.

  47. New Schema Avg Size this Date 0.03 0.045 0.045 0.02 … Patient Abnormality Date Calcification … Mass Avg Size Loc Benign/ Fine/Linear Size this date Malignant P1 1 5/02 No 0.03 0.03 RU4 B P1 2 5/04 Yes 0.05 0.045 RU4 M P1 3 5/04 No 0.04 0.045 LL3 B P2 4 6/00 No 0.02 0.02 RL2 B … … … … … … … …

  48. Level 3: Aggregates Avg Size this date Benign v. Malignant Calc Fine Linear Mass Size Note: Learn parameters for each node

  49. Database Notion of View • New tables or fields defined in terms of existing tables and fields known as views • A view corresponds to alteration in database schema • Goal: automate the learning of views

  50. Patient Abnormality Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant P1 1 5/02 No 0.03 RU4 B P1 2 5/04 Yes 0.05 RU4 M P1 3 5/04 No 0.04 LL3 B P2 4 6/00 No 0.02 RL2 B … … … … … … … Possible View

More Related