Leveraging Natural Language Processing for Enhanced Clinical Decision Support Systems

Computational Intelligence in Biomedical and Health Care InformaticsHCA 590 (Topics in Health Sciences) Rohit Kate Clinical Natural Language Processing

Reading Paper: What can natural language processing do for clinical decision support? Dina Demner-Fushman, Wendy Chapman, Clement McDonald Journal of Biomedical Informatics 42 (2009) 760-772 Paper: 2010 i2b2/VA Challenge on Concepts, Assertions, and Relations in Clinical TextUzuner Ö., South B., Shen S., DuVall S.Journal of the American Medical Informatics Association 2011;18(5):552-556

Clinical Decision Support Systems • A clinical decision support (CDS) system is any computer program designed to help healthcare professionals to make clinical decisions or present them with patient-specific assessments and recommendations • Suggest diagnosis and medications • Trigger reminders • Flag abnormal values • Alert about drug interactions • Remind the user of overlooked diagnoses • Provide advice based on patient-specific data

CDS Systems and Narrative Text • Such computer based support is much more effective if the computer system has access to electronic medical records (EMRs) and has the ability to process them • Major portion of patient records, including radiology reports, operative notes, discharge summaries etc. are recorded as narrative text (dictated, transcribed or directly entered) in a natural language such as English • Facts that should activate a CDS system are often found only in free text

CDS Systems and NLP • Much of the data that could support CDS is textual and therefore cannot be leveraged by a CDS system without natural language processing (NLP) • NLP to be used for CDS needs to be: • Reliable • High-quality • Modular and flexible • Fast

Active and Passive CDS with NLP • Active: System leverages available information and pushes patient-specific information to the user • Passive: The users themselves seek the support • Users: Depending upon the application, besides clinicians users could be patients, researchers, administrators, students, and coders • Besides free text in the clinical records, NLP for CDS could also be processing biomedical literature, web pages etc.

Active and Passive CDS with NLP Figure from the paper: Users

Example of an Idealized NLP-CDS System • It would monitor EMR for insertions of new data • When free text is entered for example “Right lower lung opacity, which could be contusion or pneumonia”, NLP system will kick in to analyze it • NLP system will extract information that the disorder could be “pulmonary contusion” or “pneumonia”, this will go to the CDS system • The CDS system will look up decision rules for suspected pneumonia and retrieve results of blood test and evaluate white blood cell count • If the count is high, the system will suggest as a reminder message (may be in natural language) that the patient is more likely to have pneumonia than pulmonary contusion and why

Example of an Idealized NLP-CDS System • Idealizing further, the NLP system may look up medical literature and solicit more information and present natural language summaries • For example, present summaries of best approaches to manage both disorders • May look up medical publications, guidelines, actionable recommendations available in free text • This idealized system will have to deal with all the challenges of clinical NLP

Challenges of Clinical Language Processing • Good Performance • Performance should be good enough to be used for clinical applications, should not be significantly worse than the medical experts • System should have flexibility to trade-off precision and recall • Recovery of Implicit Information • NLP system should contain enough medical knowledge to make appropriate inferences • “rupture” means “rupture of membranes” • “patchy opacity” and “focal infiltrate” may indicate “pneumonia”

Challenges of Clinical Language Processing • Interoperability: NLP system should seamlessly integrate into clinical information systems • Many different interchange formats (e.g. HL7) • Different types of reports with different formats, text may contain tables, structured fields etc. • Output of NLP system should be mapped to appropriate controlled vocabulary, e.g. UMLS, SNOMED or ICD

Challenges of Clinical Language Processing • Training set availability • Patient records are confidential and requires approval of institutional review board (IRB) • There are methods to de-identify names etc. but identifying names etc. is not easy • These issues do not arise when processing literature • Limited availability in electronic form • Many clinical documents are still written on paper • Optical Character Recognition (OCR) is not accurate especially with physicians’ notes

Challenges of Clinical Language Processing • Expressiveness • More than 200 different expressions for severity information: faint, mild, borderline, 3rd degree, mild to moderate etc. • Complex modifiers: “no improvement in pneumonia” in text will match a query “improvement in pneumonia” • A lot of abbreviations which could be ambiguous • pvc may mean pulmonary vascular congestion in chest X-ray report and premature ventricular complexes in electrocardigram report

Challenges of Clinical Language Processing • Compactness of text • Very compact containing many abbreviations • Sentence boundaries poorly delineated Admit 10/23 71 yo woman h/o DM, HTN, Dilated CM/CHF, Afib s/p embolic event, chronic diarrhae, admitted with SOB. • Rare events • Medical errors and adverse events are not reported frequently, difficult to train a system to detect them

Shared Tasks in Clinical Language Processing • Evaluation • Difficult to obtain gold-standard data, time-consuming for medical experts to annotate data • Evaluation competitions or Shared Tasks are very useful, they help compare different systems on the same data • i2b2 shared tasks 2008-2012: • https://www.i2b2.org/NLP/Obesity/ • https://www.i2b2.org/NLP/Medication/ • https://www.i2b2.org/NLP/Relations/ • https://www.i2b2.org/NLP/Coreference/ • https://www.i2b2.org/NLP/TemporalRelations/ • ShARe/CLEF eHealth 2013-2014: • https://sites.google.com/site/shareclefehealth/ • http://clefehealth2014.dcu.ie/ • SemEval 2014 Task 7- Analysis of Clinical Text: • http://alt.qcri.org/semeval2014/task7/ • TREC Medical Records task

i2b2 2010: Concepts • Concepts: • Medical Problems • Treatments • Tests • System input: raw text of medical records • System output: A plain text file that contains entries of the form: c=“concept text” offset || t=“concept type” (offset indicates line and token numbers of the document) For example: • c=“cancer” 5:8 5:8 || t=“problem” • c=“chemotherapy” 5:4 5:4 || t=“treatment” • c=“chest x-ray” 6:12 6:13 || t=“test”

i2b2 2010: Assertions • Assertions (attributes of medical problems): • Present • Absent • Possible • Conditional • Hypothetical • Not associated with the patient • System input: raw text of medical records and given concepts • System output: Assertions on all problem concepts (and only problem concepts) c=“concept text” offset || t=“concept type” || a=“assertion value” For example: • c=“hypertension” 5:4 5:4 || t=“problem” || a=“absent” • c=“diabetes” 6:12 6:12 || t=“problem” || a=“possible”

i2b2 2010: Relations • Extract the relations that exist between the concepts: • medical problems and treatments • 6 possible relations • medical problems and tests • 3 possible relations • medical problems and other medical problems • 2 possible relations • System input: raw text medical records with given concepts and assertions (optional) • System output: relations of pairs of concepts in the following format: • c="a cardiac catheterization" 9:12 9:14 || r="TeCP" || c="chest pain" 9:5 9:6 • c="a cardiac catheterization" 9:12 9:14 || r="TeRP" || c="an occluded right coronary artery" 9:23 9:27 • c="a cardiac catheterization" 9:12 9:14 || r="TeRP" || c="a 40-50% proximal stenosis" 9:29 9:32

i2b2 2010: Data • 349 Training reports • 97 discharge summaries from Partners • 73 discharge summaries from Beth-Israel Deaconess Medical Center • 98 Discharge summaries from University of Pittsburgh Medical Center • 81 progress notes from University of Pittsburgh Medical Center • 477 Test reports • 133 discharge summaries from Partners • 123 discharge summaries from Beth-Israel Deaconess Medical Center • 102 Discharge summaries from University of Pittsburgh Medical Center • 119 progress notes from University of Pittsburgh Medical Center

i2b2 2010: Best Results • Total 41 teams participated (22 for concepts, 21 for assertions and 16 for relations) Best F-measures (harmonic mean of precision & recall): • Concepts: 85% F-measure • Assertions: 92.6% F-measure • Relations: 73.7% F-measure

Leveraging Natural Language Processing for Enhanced Clinical Decision Support Systems

Leveraging Natural Language Processing for Enhanced Clinical Decision Support Systems

Presentation Transcript

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Khokher

Rohit Kate

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate