1 / 42

Data Normalization

Data Normalization. Dr. Stan Huff. Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many, many, others…. Acknowledgements. What are detailed clinical models? Why do we need them?. A diagram of a simple clinical model.

Mia_John
Télécharger la présentation

Data Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Normalization Dr. Stan Huff

  2. Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many, many, others… Acknowledgements

  3. What are detailed clinical models?Why do we need them?

  4. A diagram of a simple clinical model Clinical Element Model for Systolic Blood Pressure SystolicBP SystolicBPObs 138 mmHg data quals BodyLocation BodyLocation Right Arm data PatientPosition PatientPosition Sitting data

  5. A stack of coded items is ambiguous (SNOMED CT) Numbness of right arm and left leg Numbness (44077006) Right (24028007) Arm (40983000) Left (7771000) Leg (30021000) Numbness of left arm and right leg Numbness (44077006) Left (7771000) Arm (40983000) Right (24028007) Leg (30021000) Need for a standard model

  6. Estimated Auto Manual What if there is no model? Site #1 70 % 37 Hct, manual: 70 % 35 Hct, auto : Site #2 70 % 37 Hct :

  7. Site 1: OBX|1|CE|4545-0^Hct, manual||37||%| OBX|1|CE|4544-3^Hct, auto||35||%| Site 2: OBX|1|CE|20570-8^Hct||37||%|….|manual| OBX|1|CE|20570-8^Hct||35||%|….|auto| HL7 V2.X Messages

  8. A single name/code and value Hct, manual is 37 % Two names/codes and values Hct is 37 % Method is manual (spun) Too many ways to say the same thing

  9. Pre-coordinated representation <observation> <cd> Hct, manual(LOINC 4545-0 ) </cd> <value> 37 % </value> </observation> Post-coordinated (compositional) representation <observation> <cd> Hct (LOINC 20570-8) </cd> <qualifier> <cd> Method </cd> <value> Manual </value> <qualifier> <value> 37 % </value> </observation> Model fragment in XML

  10. Isosemantic Models Precoordinated Model HematocritManual (LOINC 4545-0) HematocritManualModel 37 % data Post coordinated Model (Storage Model) Hematocrit (LOINC 20570-8) HematocritModel 37 % data quals Hematocrit Method HematocritMethodModel Manual data

  11. Patient Identifier Patient Identifier Date and Time Date and Time Observation Type Observation Type Weight type Observation Value Observation Value Units Units 123456789 123456789 7/4/2005 7/4/2005 Hct Hct, manual manual 37 37 % % 123456789 123456789 7/19/2005 7/19/2005 Hct Hct, auto auto 35 35 % % Relational database implications If the patient’s hematocrit is <= 35 then ….

  12. Signs, symptoms Diagnoses Problem list Family History Use of negation – “No Family Hx of Cancer” Description of a heart murmur Description of breath sounds “Rales in right and left upper lobes” “Rales, rhonchi, and egophony in right lower lobe” More complicated items:

  13. All health care data, including: Allergies Problem lists Laboratory results Medication and diagnostic orders Medication administration Physical exam and clinical measurements Signs, symptoms, diagnoses Clinical documents Procedures Family history, medical history and review of symptoms What do we model?

  14. EMR: data entry screens, flow sheets, reports, ad hoc queries Basis for application access to clinical data Data normalization Creation of maps from models in the local system to the standard model Target for the output of structured data from NLP Validation of data as it is stored in the database Phenotype algorithms (decision logic) Basis for referencing data in phenotype definitions Does NOT dictate physical storage strategy How are the models used?

  15. model BloodPressurePanel is panel { key code(BloodPressurePanel_KEY_ECID); statement SystolicBloodPressureMeas systolicBloodPressureMeas optional systolicBloodPressureMeas.methodDevice.conduct(methodDevice) systolicBloodPressureMeas.bodyLocationPrecoord.conduct(bodyLocationPrecoord) systolicBloodPressureMeas.bodyPosition.conduct(bodyPosition) systolicBloodPressureMeas.relativeTemporalContext.conduct(relativeTemporalContext) systolicBloodPressureMeas.subject.conduct(subject) systolicBloodPressureMeas.observed.conduct(observed) systolicBloodPressureMeas.reportedReceived.conduct(reportedReceived) systolicBloodPressureMeas.verified.conduct(verified); statement DiastolicBloodPressureMeas diastolicBloodPressureMeas optional …. statement MeanArterialPressureMeas meanArterialPressureMeas optional …. qualifier MethodDevice methodDevice optional; md.code.domain(BloodPressureMeasurementDevice_DOMAIN_ECID); qualifier BodyLocationPrecoord bodyLocationPrecoord optional; blp.code.domain(BloodPressureBodyLocationPrecoord_DOMAIN_ECID); modifier Subject subject optional; attribution Observed observed optional; attribution ReportedReceived reportedReceived optional; attribution Verified verified optional; } Model Source Expression (CDL)

  16. HTML Compiler XML Template - .xsd Java Class “In Memory” Form CE Source File SMArt RDF? CE Translator UML? openEHR Archetype? HL7 RIM Static Models? OWL?

  17. Artifacts Used CDL Model Definition CEM XML Schema HL7 Data Source CEM XML Instance

  18. StandardLabObsQuantitative - CDL Definition import StandardLabObs; import ReferenceRangeNar; model StandardLabObsQuantitative is statement extends StandardLabObs { key domain(StandardLabObsQuantitative_KEY_VALUESET_ECID); data PQ primaryPQValue unit.domain (UnitsOfMeasure_VALUESET_ECID) alternate { match CD secondaryCDValue code.domain(LabValue_VALUESET_ECID); match CD altCDValue code.domain(LabValue_VALUESET_ECID); otherwise ST altSTValue; }; qualifier ReferenceRangeNar referenceRangeNar card(0..1); constraint primaryPQValue.isNullReasonCode.domain(LabNullFlavor_VALUESET_ECID); constraint abnormalInterpretation.CD.code.domain (AbnormalInterpretationNumericNom_VALUESET_ECID); constraint deltaFlag.CD.code.domain (DeltaFlagNumericNom_VALUESET_ECID); }

  19. StandardLabObsQuantitative - Schema Snippet <xs:complexType name="StandardLabObsQuantitative"> <xs:sequence> <xs:element name="key" minOccurs="0" maxOccurs="1" type="CD"/> <xs:element name="primaryPQValue" type="PQ"/> <xs:element name="referenceRangeNar" minOccurs="0" maxOccurs="1" type="ReferenceRangeNar"/> <xs:element name="accessionNumber" minOccurs="0" maxOccurs="1" type="AccessionNumber"/> <xs:element name="fillerOrderNumber" minOccurs="0" maxOccurs="1" type="FillerOrderNumber"/> <xs:element name="placerOrderNumber" minOccurs="0" maxOccurs="1" type="PlacerOrderNumber"/> <xs:element name="resultStatus" minOccurs="0" maxOccurs="1" type="ResultStatus"/> <xs:element name="reportingPriority" minOccurs="0" maxOccurs="1" type="ReportingPriority"/> <xs:element name="abnormalInterpretation" minOccurs="0" maxOccurs="1" type="AbnormalInterpretation"/> <xs:element name="ordinalInterpretation" minOccurs="0" maxOccurs="1" type="OrdinalInterpretation"/> <xs:element name="deltaFlag" minOccurs="0" maxOccurs="1" type="DeltaFlag"/> <xs:element name="responsibleObserver" minOccurs="0" maxOccurs="unbounded" type="ResponsibleObserver"/> <xs:element name="performingLaboratory" minOccurs="0" maxOccurs="1" type="PerformingLaboratory"/> <xs:element name="comment" minOccurs="0" maxOccurs="unbounded" type="Comment"/> <xs:element name="subject" minOccurs="0" maxOccurs="1" type="Subject"/> <xs:element name="specimenCollected" minOccurs="0" maxOccurs="1" type="SpecimenCollected"/> <xs:element name="specimenReceivedByLab" minOccurs="0" maxOccurs="1" type="SpecimenReceivedByLab"/> <xs:element name="resulted" minOccurs="0" maxOccurs="1" type="Resulted"/> \ <xs:element name="patientId" minOccurs="0" maxOccurs="1" type="anonymous.2"/> <xs:element name="status" minOccurs="0" maxOccurs="1" type="anonymous"/> <xs:element name="instanceId" minOccurs="0" maxOccurs="1" type="anonymous.2"/> <xs:element name="typeId" minOccurs="0" maxOccurs="1" type="anonymous.3"/> </xs:sequence> <xs:attribute name="class" type="statement.type" default="statement"/> <xs:attribute name="type" type="ecid.type" default="b1ceaebb-dd15-4317-3f99-67ef3af81778"/></xs:complexType>

  20. HL7 Source Instance MSH|^~\&|OADD|153|DADD|XNEPHA|20110208000109||ORU^R01|20110207000036|T|2.2|||| EVN|R01|201102080000| PID||1234567|274382554|007261|WHYLING^KAYLIE^O'TEST||19460413|F||W|||(801)224-1528|(866)772-3150||||21443041|535194412| PV1||O|XNEPHA^XNEPHA^^IM||||28826^Allyson^Josephine^ O'TEST |^||||||||||OP||||||||||||||||||||||||||201102070000|||||||| ORC|RE||F506556|||||||||28826^Allyson^Josephine^ O'TEST ||||^| OBR||^|F506556^|HCT^HEMATOCRIT|R||201102071554|||70011^ROSEN,AUBRY^ O'TEST |||20110207161200|^|28826^Allyson^Josephine^ O'TEST ||||M2415648||||C|F|RFP^RFP|^^^^^R|^~^~^||||||| OBX|1|NM|HCT^HEMATOCRIT|1.1|48|%|||R||F|||201102080000|IM^Performed at Inte|58528^ANDERSON^MARK|

  21. LabObsQuantitative - XML Instance Snippet <labObsQuantitative type="b1ceaebb-dd15-4317-3f99-67ef3af81778"> <key> <code> <value>20570-8</value> </code> <codeSystem> <value>LOINC</value> </codeSystem> <originalText>HCT</originalText> </key> <primaryPQValue> <operator> <value>equals</value> </operator> <unit> <value>%</value> </unit> <value>48</value> </primaryPQValue> <referenceRangeNar type="6f422ce6-7bc6-2cc2-8c96-58c137b5c9fc"> … </referenceRangeNar> <abnormalInterpretation type="9a3c3c60-18f7-5a91-c10c-c15532a96303"> … </abnormalInterpretation> </labObsQuantitative>

  22. Different groups use models differently NLP versus EMR Structuring the models to meet more than one use Options for different granularities of models Hematocrit model, model of pneumonia Quantitative lab result model, x-ray finding Terminology integration – use of standards and terminology services Models for “rare” kinds of data Medication being taken by a friend, not recommended by the physician Issues

  23. Questions?

  24. Data Normalization Dr. Christopher Chute

  25. IHC-Medication, Mayo, IHC LAB to CEM IHC RXNORM resource SharpDb HL7 Initializer Drug CEM CAS Consumer IHC-GCN TO-RXNORM Annotator HL7 (Meds) HL7 Initializer LAB CEM CAS Consumer Generic-LAB- Annotator Mirth HL7 (Labs) Mayo LOINC resource IHC LOINC resource

  26. UIMA Normalization Pipeline • Convert HL7 V2.x Lab / Med Order Messages into CEM XML instances • Load SofA with HL7 message • Create Segment Objects in CAS • Normalize Segments in CAS • Transform Segments into CEM instances

  27. Mayo, IHC LAB to CEM HL7 Pipe Delimited SharpDb One of the new pipelines created to normalize HL7 2.x Lab Messages into CEM instances. Mirth We pre-processed the HL7 messages converting from HL7 pipe syntax into HL7 XML format. Generic-LAB- Annotators HL7 Initializer LAB CEM CAS Consumer Generic-LAB- Annotators Mirth HL7 (XML) Mayo LOINC resource IHC LOINC resource

  28. Mayo, IHC LAB to CEM UIMA Pipeline Flow PID 10109 45373-3 HL7 message CAS (SOFA=HL7-XML) CAS CEM Initialize Parse Normalize Transform PV1 OBX

  29. Normalization AnatomyLab Annotators Date-Time To ISO Format HL7 Segment Parser Syntactic Integrity LOINC lookups IHC codes to LOINC table Mayo codes to LOINC table LexGrid/CTS2 Terminology Services

  30. Architectural Opportunities HL7 2.x HL7 2.x HL7 2.x Mayo CEM format CEM format Mirth CEM format CEM format Time, Syntax Etc. CAS To XML CEM format Mirth Semantic CDA CDA CDA CDA

  31. Tactical Next Step Enhancements • Single CEM for multiple OBX segments • Efficiently utilize terminology services • Incorporate a library for HL7 clean-up routines • Increase scope of vocabulary standardization • Enhancements for the Drug Annotator • Context enhancement issue • Drug name surprises

  32. Additional Vocabularies • Review sources used for normalization opportunities E.g. • In HL7 OBR Segments • Standardize Service ID (Codes) • In HL7 OBX Segments • Standardize Units • Standardize Reference Ranges • Standardize Normal Flags

  33. Drug Name Disambiguity Real patient data, presented a unique case in drug names. “ToDAY” is brand name for: cephapirin sodium. This presents an interesting named entity disambiguation use case.

  34. Where Persistence Fits In… Mayo EDT System 6a 5 1 7 IHC IHC SHARP Mirth Mirth UIMA (Backend CDR NwHIN NwHIN Connect Connect Pipeline 2 4 Systems) 6 Aurion Aurion 8 Gateway Gateway 3 9 10 CEM Instance Database

  35. Persistence Channels • One Channel per model • Data stored as an XML Instance of the model • Fields extracted from XML to use as indices • XML Schema defined for each model • Stored using database transactions

  36. General Channel Design CEM XML Instance Input Message Directory Persistence Store Channel Connector Connector Processed Message Directory Error Message Directory

  37. SharpDB a CEM Instance Database

  38. Database Tables

  39. Patient Demographics • Each message contains patient demographics • Demographics created on first received message based on site patient ID • Internal Patient ID is created and cross mapped to site patient ID • SharpDB is keyed off internally generated Patient ID

  40. Running in a Cloud… • Various images were installed: • NwHIN Gateway provided by Aurion • MIRTH Connect our interface engine • UIMA Pipelines of various sorts • MySQL database for persistence • JBOSS / Drools rules engine All open source, running in a Ubuntu Cloud!

  41. SHARP Hardware Infrastructure Node Server 1 Node Server 2 Cloud Server Admin Client Interface Persistence Storage VPN/ LAN Node Controller Node Controller Cloud Controller Walrus Controller VM VM VM VM VM VM Cluster Controller Storage Controller Image Storage Build/Backup Server To Manage Cloud Node Server 3 Node Server 11 User … VPN/ LAN Node Controller Node Controller Private Switch VM VM VM VM VM VM To Connect To Instances

  42. Data Normalization Summary • Initial “tracer shot” at Data Normalization • Cloud based processing using open source tools • Proof on concept, UIMA for Data Normalization • Move on to new problems / solutions… • Opportunities exist: • Add new annotators (modules) to the pipelines • Widen usage and scope of vocabulary services • Switch to real live flows and add HOSS clean up routines. • Various tweaks in NLP algorithms

More Related