510 likes | 617 Vues
Explore the development of high-throughput automated techniques for phenotyping based on EHR data in the U.S. healthcare system, driven by the increasing prevalence of EHRs and Meaningful Use. Learn about algorithms, components, rules evaluation, visualization, and data transformation in phenotyping. Discover key lessons learned and the algorithm development process. Collaborators include CDISC, Harvard/MIT, Mayo Clinic, and more.
E N D
Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping Jyotishman Pathak, PhD Assistant Professor of Biomedical Informatics June 11, 2012
Project 3: Collaborators & Acknowledgments • CDISC (Clinical Data Interchange Standards Consortium) • Rebecca Kush, Landen Bain • Centerphase Solutions • Gary Lubin, Jeff Tarlowe • Group Health Seattle • David Carrell • Harvard University/MIT • GuerganaSavova, Peter Szolovits • Intermountain Healthcare/University of Utah • Susan Welch, Herman Post, Darin Wilcox, Peter Haug • Mayo Clinic • Cory Endle, Rick Kiefer, Sahana Murthy, GopuShrestha, Dingcheng Li, Gyorgy Simon, Matt Durski, Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin Martin, Kent Bailey, Scott Tabor
Phenotyping is still a bottleneck… [Image from Wikipedia]
EHR systems: United States 2002—2011 [Millwood et al. 2012]
Electronic health records (EHRs) driven phenotyping • EHRs are becoming more and more prevalent within the U.S. healthcare system • Meaningful Use is one of the major drivers • Overarching goal • To develop high-throughputautomated techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings
EHR-driven Phenotyping Algorithms - I • Typical components • Billing and diagnoses codes • Procedure codes • Labs • Medications • Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores) • Pathology • Imaging? • Organized into inclusion and exclusion criteria
EHR-driven Phenotyping Algorithms - II Rules Evaluation Phenotype Algorithm Visualization Data Transform Transform Mappings NLP, SQL [eMERGE Network]
Example: Hypothyroidism Algorithm No thyroid-altering medications (e.g., Phenytoin, Lithium) 2+ non-acute visits in 3 yrs ICD-9s forHypothyroidism AbnormalTSH/FT4 Antibodies forTTG or TPO(anti-thyroglobulin,anti-thyroperidase) No ICD-9s forHypothyroidism NoAbnormalTSH/FT4 Thyroid replace. meds No thyroid replace. meds NoAntiboides for TTG/TPO No secondary causes (e.g., pregnancy, ablation) No hx of myasthenia gravis Case 1 Case 2 Control [Denny et al., 2012]
Hypothyroidism Algorithm: Validation [Denny et al., 2012]
Genotype-Phenotype Association Results published observed gene / disease marker region rs2200733 Chr. 4q25 Atrial fibrillation rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 Crohn's disease rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 Multiple sclerosis rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 Rheumatoid arthritis rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B Type 2 diabetes rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 1.0 5.0 2.0 Odds Ratio [Ritchie et al.2010]
Key lessons learned from eMERGE • Algorithm design and transportability • Non-trivial; requires significant expert involvement • Highly iterative process • Time-consuming manual chart reviews • Representation of “phenotype logic” for transportability is critical • Standardized data access and representation • Importance of unified vocabularies, data elements, and value sets • Questionable reliability of ICD & CPT codes (e.g., billing the wrong code since it is easier to find) • Natural Language Processing (NLP) is critical
Algorithm Development Process - Modified Rules Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization Data Transform Transform Mappings NLP, SQL [eMERGE Network]
Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM) Rules • Conversion of structured phenotype criteria into executable queries • Use JBoss® Drools (DRLs) Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization • Standardized representation of clinical data • Create new and re-use existing clinical element models (CEMs) Data Transform Transform [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] Mappings NLP, SQL
The SHARPn “phenotyping funnel” Intermountain EHR Mayo Clinic EHR [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012]
Clinical Element ModelsHigher-Order Structured Representations [Stan Huff, IHC]
Pre- and Post-Coordination [Stan Huff, IHC]
CEMs available for patient demographics, medications, lab measurements, procedures etc. [Stan Huff, IHC]
SHARPn data normalization flow - I CEM MySQL database with normalized patient information [Welch et al. 2012]
SHARPn data normalization flow - II CEM MySQL database with normalized patient information
Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM) Rules Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization • Standardized representation of clinical data • Create new and re-use existing clinical element models (CEMs) Data Transform Transform [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] Mappings NLP, SQL
Our task: human readable machine computable [Thompson et al., submitted 2012]
NQF Quality Data Model (QDM) • Standard of the National Quality Forum (NQF) • A structure and grammar to represent quality measures in a standardized format • Groups of codes in a code set (ICD-9, etc.) • "Diagnosis, Active: steroid induced diabetes" using "steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)” • Supports temporality & sequences • AND: "Procedure, Performed: eye exam" > 1 year(s) starts before or during "Measurement end date" • Implemented as set of XML schemas • Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.)
Example: Diabetes & Lipid Mgmt. - I Human readable HTML
Example: Diabetes & Lipid Mgmt. - II Computable XML
Algorithm Development Process - Modified • Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM) Rules • Conversion of structured phenotype criteria into executable queries • Use JBoss® Drools (DRLs) Semi-Automatic Execution Evaluation Phenotype Algorithm Visualization • Standardized representation of clinical data • Create new and re-use existing clinical element models (CEMs) Data Transform Transform [Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] Mappings NLP, SQL
JBoss® open-source Drools rules based management system (RBMS) • Represents knowledge with declarative production rules • Origins in artificial intelligence expert systems • Simple when <pattern> then <action>rules specified in text files • Separation of data and logic into separate components • Forward chaining inference model (Rete algorithm) • Domain specific languages (DSL)
Example Drools rule {Rule Name} rule"Glucose <= 40, Insulin On“ when $msg : GlucoseMsg(glucoseFinding <= 40, currentInsulinDrip > 0 ) then glucoseProtocolResult.setInstruction(GlucoseInstructions.GLUCOSE _LESS_THAN_40_INSULIN_ON_MSG); end {Class Getter Method} {Java Class} {binding} {Class Setter Method} Parameter {Java Class}
Automatic translation from NQF QDM criteria to Drools Measure Authoring Toolkit Drools Engine From non-executable to executable Measures XML-based Structured representation Drools scripts Converting measures to Drools scripts Data Types XML-based structured representation Fact Models Mapping data types and value sets Value Sets saved in XLS files [Li et al., submitted 2012]
Automatic translation from NQF QDM criteria to Drools [Li et al., submitted 2012]
Phenotype library and workbench - I http://phenotypeportal.org Converts QDM to Drools Rule execution by querying the CEM database Generate summary reports
Phenotype library and workbench - II http://phenotypeportal.org
Phenotype library and workbench - III http://phenotypeportal.org
Additional on-going research efforts - I • Machine learning and association rule mining • Manual creation of algorithms take time • Let computers do the “hard work” • Validate against expert developed ones [Caroll et al. 2011]
Additional on-going research efforts - I • Origins from sales data • Items (columns): co-morbid conditions • Transactions (rows): patients • Itemsets: sets of co-morbid conditions • Goal: find allitemsets (sets of conditions) that frequently co-occur in patients. • One of those conditions should be DM. • Support: # of transactions the itemsetI appeared in • Support({TB, DLM, ND})=3 • Frequent: an itemsetI is frequent, if support(I)>minsup X: infrequent [Simon et al. 2012]
Additional on-going research efforts - II TRALI/TACO sniffer
Active Surveillance for TRALI and TACO Of the 88 TRALI cases correctly identified by the CART algorithm, only 11 (12.5%) of these were reported to the blood bank by the clinical service. Of the 45 TACO cases correctly identified by the CART algorithm, only 5 (11.1%) were reported to the blood bank by the clinical service.
Additional on-going research efforts - III • Phenome-wide association scan (PheWAS) • Do a “reverse GWAS” using EHR data • Facilitate hypothesis generation [Pathak et al. submitted 2012]
Mayo projects and collaborations • Ongoing • Transfusion related acute lung injury (Kor) • Drug induced liver injury (Talwalkar) • Drug induced thrombocytopenia and neutropenia (Al-Kali) • Active surveillance for celiac disease (Murray) • Warfarin dose response & heartvalvereplacements (Pereira) • Phenotype definition standardization (HCPR/Quality) • Getting started/planning • Pharmacogenomics of systolic heart failure (Bielinski/Pereira) • Pharmacogenomics of SSRI (Mrazek/Weinshilboum) • Lumbar image reporting with epidemiology (Kallmes) • Active clinical trial alerting (CTMS/Cancer Center)
HTP related presentations • June 11th, 2012 • Using EHRs for clinical research (VitalyHerasevich) • Association rule mining and T2D risk prediction (Gyorgy Simon) • Scenario-based requirements engineering for developing EHR add-ons to support CER in patient care settings (JunfengGao) • June 12th, 2012 • Exploring patient data in context clinical research studies: Research Data Explorer (Adam Wilcox et al.) • Utilizing previous result sets as criteria for new queries with FURTHeR (Dustin Schultz et al.) • Semantic search engine for clinical trials (Yugyung Lee) • Knowledge-driven workbench for predictive modeling (Peter Haug et al.) • Clinical analytics driven care coordination for 30-day readmission – Demonstration from 360 Fresh.com (Ramesh Sairamesh)