1 / 25

The Brave New World of Data Harmonization in the Age of Informatics

The Brave New World of Data Harmonization in the Age of Informatics . Naveen Ashish , UC Irvine, Dec 1 2012. Introduction. Investigators will greatly benefit from integrated datasets from multiple institutions Each institution has few subjects (tests) Need certain cohort size

trish
Télécharger la présentation

The Brave New World of Data Harmonization in the Age of Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Brave New World of Data Harmonizationin the Age of Informatics Naveen Ashish, UC Irvine, Dec 1 2012

  2. Introduction • Investigators will greatly benefit from integrated datasets from multiple institutions • Each institution has few subjects (tests) • Need certain cohort size • Why do we have heterogeneity at all ? • Illustrate through actual datasets • True alignment is a challenge • Data Harmonization approach to the problem

  3. Disparate Datasets

  4. Examples Why is AcKcal reported in one not the other ? Similarly for SpO2 VO2 units are different

  5. Examples HR values are in entirely different range VO2, VCO2 values in entirely different range

  6. Domain Semantics

  7. Domain Description • Key entities are: • Subjects • Tests SUBJECT TEST

  8. Subject • Issues • Reported/not reported • Set of attributes • Units HEIGHT WEIGHT AGE SEX SUBJECT RACE BSA

  9. Tests VARIABLES Time TIME Work • Multiple aspects • Time • Work • Ventilation • Cardiac VO2 WORK VCO2 RER TEST O2 RR Vt VENTILATION VE BR CARDIAC V/Q

  10. Approach • “Standard” set of attributes • Recipe for failure • Proposed

  11. Documented Information • Choice of variables • Why do we have this particular set of attributes ? • Reason for exclusion of excluded (attributes) • Units • State • Key points anchored to variable • Eg., when heart rate reaches a value of …. • Demographics (subjects) • Special conditions • ….

  12. Detail on Each Parameter • Details • Definition • Synonyms • Explanation • …. Respiratory Quotient (RQ) also-known-as This value, which is also sometimes known as the Respiratory Quotient (RQ), is the ratio of oxygen consumption to CO2 production. At an RER of 0.8 fat is the primary fuel source. As exercise intensity increases more carbohydrates are burned for energy. At an RER of 1.0 the individual is burning mostly carbohydrate. It is important for a good max test that an RER of 1.1 is reached to signify a good effort by the athlete. RER

  13. Units • Related efforts in caBIG • CDE (Common Data Elements) • UCUM • Unified Code for Units of Measure • Approved 2008 • Goal • Harmonize existing caDSR value domains • Lab and Agent Unit of Measure • Forms Curation and UML Model applications • Purpose of Standard • To capture units of measure using UCUM expressions that are associated with a qualitative laboratory outcome or agent dose administration.

  14. UCUM

  15. Data Semantics

  16. Data • Above information can help in alignment UNIT RANGE SALIENT VO2 PROGRESSION CORRELATION

  17. Issues • Range • Is the data in the “expected” range ? • Progression • Do the value change in expected fashion ? • Salient points • “..stop at heart rate of 150 and ….” • Correlation • Expected correlation with other variables

  18. Process Semantics

  19. Environment ORGANIZATION LOCATION LAB TEST DEGREE TEST PROTOCOL DEVICE INSP. TEMPERATURE ENVIRONMENT EXP. TEMPERATURE PRESSURE INSP O2 INSP CO2 FLOWMETER STPD To BTPS Base O2

  20. An “ontology” ? • Concepts • Variables • Characteristics • Relationships • Check for similarity • Outliers • Progression • ….

  21. Tools

  22. Modeling • Protégé (http://protege.stanford.edu) • Concepts • Attributes • Descriptions • Units • Relationships

  23. Transformation • Information Mediator • Transformation of data (sets) • For alignment • Data driven mappings • KARMA

  24. Scientific Variables • IICurate • Burns et al., (USC Information Sciences Institute) • System for curation and documentation of scientific variables • Focus on information integration !

  25. Vision COMMON DATA MODEL ALIGNMENT ENGINE ALIGNED DATA DATABASE FEEDBACK LOOP Data Source Administrator

More Related