290 likes | 422 Vues
Computational Intelligence in Biomedical and Health Care Informatics HCA 590 (Topics in Health Sciences). Rohit Kate. Clinical Natural Language Processing 2. Reading. Chapter 15, Text 6
E N D
Computational Intelligence in Biomedical and Health Care InformaticsHCA 590 (Topics in Health Sciences) Rohit Kate Clinical Natural Language Processing 2
Reading • Chapter 15, Text 6 • Paper: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, Component Evaluation and ApplicationsGuergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, Christopher G ChuteJournal of American Medical Informatics Association 2010;17:507-513
Clinical NLP Systems • General-purpose Clinical NLP Systems: Could be applied to different tasks • MetaMap • cTAKES • MedLEE • Specialized Clinical NLP Systems • Detecting clinical events • Processing radiology, pathology and other reports • General-purpose systems can be used to build specialized systems, hence the distinction is mostly in the purpose behind building the system
MetaMaphttp://metamap.nlm.nih.gov/ MetaMap Slides adapted from: http://skr.nlm.nih.gov/papers/references/10.11.14.MetaMapTutorial.pptx • A tool to identify concepts in clinical text • Identifying a concept means mapping it to a terminology/ontology or UMLS Metathesaurus • Concept identification is useful/essential for many tasks including • Information extraction/Data mining • Classification/Categorization • Text summarization • Question answering • Literature-based knowledge discovery
Concept Identification Programs • Selected programs that map biomedical text to a thesaurus • SAPHIRE (Hersh et al., 1990) • CLARIT (Evans et al., 1991) • MetaMap (Aronson et al., 1994) • Metaphrase (Tuttle et al., 1998) • MMTx (2001) • KnowledgeMap (Denny et al., 2003) • Mgrep (Meng,2009--unpublished) • Characteristics of MetaMap • Linguistic rigor • Flexible partial matching • Emphasis on thoroughness rather than speed • Restricted to English syntax and vocabulary
Example (best mappings) • PMID – 9339686 • AB –Cerebral blood flow (CBF)in newborn infantsis often below levelsnecessary to sustain brainviability in adults. Frequent Levels (qualifier value) Cerebrovascular Circulation Adult Infant, Newborn Entire brain Viable Brain CEREBRAL BLOOD FLOW IMAGING Sustained
Example (best mappings with WSD) • PMID – 9339686 • AB –Cerebral blood flow (CBF)in newborn infantsis oftenbelow levels necessary to sustain brainviability in adults. Frequent Levels (qualifier value) Cerebrovascular Circulation Adult Infant, Newborn Entire brain Viable Brain CEREBRAL BLOOD FLOW IMAGING Sustained
MetaMap Examples • “inferior vena caval stent filter” maps to • ‘Inferior Vena Cava Filter’ (‘Vena Cava Filters’) and • ‘Stent’ • “medicine” with --allow_overmatchesmaps to • ‘Alternative Medicine’ or • ‘Medical Records’ or • ‘Nuclear medicine procedure, NOS’ or ... • “pain on the left side of the chest” with --quick_composite_phrases maps to • ‘Left sided chest pain’ (under development)
Example: Normal Processing Phrase: “lung cancer.” Meta Candidates (8): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process] 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] 861 Cancer (Malignant Neoplasms) [Neoplastic Process] 861 Lung [Body Part, Organ, or Organ Component] 861 Cancer (Cancer Genus) [Invertebrate] 861 Lung (Entire lung) [Body Part, Organ, or Organ Component] 861 Cancer (Specialty Type - cancer) [Biomedical Occupation or Discipline] 768 Pneumonia [Disease or Syndrome] Meta Mapping (1000): 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] Meta Mapping (1000): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
with --compute_all_mappings Example: Compound Mappings Phrase: “obstructive sleep apnea.” Meta Candidates (8): ... Meta Mapping (1000): 1000 Obstructive sleep apnoea (Sleep Apnea, Obstructive) [Disease or Syndrome] Meta Mapping (901): 827 Obstructive (Obstructed) [Functional Concept] 901 Apnea, Sleep (Sleep Apnea Syndromes) [Disease or Syndrome] Meta Mapping (851): 827 Obstructive (Obstructed) [Functional Concept] 827 Sleep [Organism Function] 827 APNOEA (Apnea) [Pathologic Function] …
Example: Show Sources (-G) Phrase: “scorpion sting.“ Meta Candidates (4): 1000 Scorpion sting {MDR,DXP} [Injury or Poisoning] 861 Sting (Sting Injury {MTH,MSH,MDR,RCD,SNM,SNOMEDCT,SNMI,WHO}) [Injury or Poisoning] 694 Scorpion (Scorpions {LCH,MSH,MTH,SNM,SNOMEDCT,SNMI,CSP,RCD,NCBI})[Invertebrate] 694 SCORPION (Scorpion antigen {MTH,LNC}) [Immunologic Factor] Meta Mapping (1000): 1000 Scorpion sting {MDR,DXP} [Injury or Poisoning]
Example: Restrict to Sources (-GR LCH) Phrase: “scorpion sting.” Meta Candidates (1): 694 Scorpion (Scorpions {LCH}) [Invertebrate] Meta Mapping (694): 694 Scorpion (Scorpions {LCH}) [Invertebrate]
Example: Restrict to Semantic Types (-J neop) Phrase: “lung cancer.” Meta Candidates (3): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process] 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] 861 Cancer (Malignant Neoplasms) [Neoplastic Process] Meta Mapping (1000): 1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process] Meta Mapping (1000): 1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
cTAKES http://incubator.apache.org/ctakes/ • cTAKES: Mayo clinical Text Analysis and Knowledge Extraction System • Developed at Mayo clinic with collaborations • Publicly available • Core components • Sentence boundary detection (OpenNLP technology) • Tokenization (rule-based) • Morphologic normalization (NLM's LVG) • POS tagging (OpenNLP technology) • Shallow parsing (OpenNLP technology) • Named Entity Recognition • Dictionary mapping (lookup algorithm) • Semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications • Assertion module • Dependency parser • Constituency parser • Semantic Role Labeler • Coreference resolver • Drug Profile module • Smoking status classifier
cTAKES Example From: http://informatics.mayo.edu/sharp/images/2/25/CTAKES.ppt
MedLEE(MedLEE slides adapted from: https://cdc.confex.com/cdc/phin2008/recordingredirect.cgi/id/4165) • Medical Language Extraction and Encoding • Extracts, structures, and encodes clinical information in narrative patient reports • Comprehensive coverage • Can be used for diverse clinical applications • Development started in 1991 • Used at Columbia University Medical Center since 1995 • Numerous independent evaluations • Rule-based system constructed manually • Now with a company “Health Fidelity”
MedLEE Applications MedLEE Analytics Clinical Guidance -- Indicate potential notifiable disease for reporting. -- Inform of local outbreak and indicate appropriate tests. -- Indicate need for vaccination. Patient report ....New maculopapular rash on trunk …. Surveillance -- Indicate potential bioterrorist event. -- Transmit syndromic event to health dept for surveillance. Quality Assurance Detect potential cases of medication reaction. Problem:rash Status:new Descriptor:maculopapular Bodyloc:trunk Code:C0460005 (trunk) Code:C0241488 (trunk maculopapular rash) Clinical Research -- Detect cases of rash for inclusion in trial of new treatment. -- Find genetic associations with atopic rash. Coded data
Text Reports Processed • Radiology Reports • Cardiology Reports • Pathology Reports • Admission notes • Discharge Summaries • Resident Sign out notes • Office Visits • Telephone encounters
Applications using MedLEE • Biosurveillance • Syndromic surveillance • Adverse Drug Event detection • Decision Support • Clinical Research • Clinical Trials • Quality Assurance • Automated Encoding • Patient Management • Data mining – finding trends and associations • Linking patient record to the literature • Summarization
MedLEE Example “New maculopapular rash on trunk.”
Detecting Clinical Events • Find clinical events, for example, adverse reactions, drug interactions etc. from medical records for research purposes or to trigger alerts • Simple keyword search is not sufficient • An NLP based system does a better job • Get a structured representation, for example, using MedLEE • Query on this representation • Hripscak et al. [2003] detected 45 types of adverse events from discharge summaries, for example, pulmonary embolism, medication errors etc. (sensitivity: 0.15-0.37, specificity: 0.99)
Processing Radiology Reports • Radiology reports is the genre of clinical reports on which most NLP systems have been applied • Special Purpose Radiology Understanding System (SPRUS) extracted and coded findings and radiologists’ interpretation [Haug et al. 1990] • SymText identified pneumonia-related concepts and detected presence and absence of bacterial pneumonia from radiology reports [Fiszman et al. 1998] • It evolved to MPLUS system that could classify various brain conditions and chief complaints [Christensen et al. 2002 ; Chapman et al. 2005] • Now evolved to Onyx which is being applied to dental exams [Christensen et al. 2007]
Other Clinical NLP Tasks • Summarization: Provide an overview of patient record or scientific literature • For clinicians • For patients • Question-Answering: Provide short answers or short summaries to questions asked on natural language
Summarization • A well-known task in general NLP • Provide clinicians a succinct summary of patient records • Categories necessary: Labs & tests, problem & treatment, history, findings, allergies, meds, plan, and identifying information • Meng et al. [2005] used semantic patterns to extract the information needed to generate a summary • Summarizing scientific publications: Fiszman et al. [2004] Presents findings of biomedical literature in graphical structures
Question-Answering • A well-known task in general NLP • CDS system could support clinical decision making in the form of answering clinical questions • Provide information on particular patients • Data on health and sickness within the local population • Medical knowledge • Other legal, social or ethical questions • Currently these are answered by manually browsing EMRs • NLP can help in automatically answering questions when: • Questions are in natural language • Answers could be found in natural language text • MedQA system [Lee et al. 2006] answers definitional questions by integrating information retrieval, extraction and summarization techniques to generate paragraph level answers
Direct Applications of NLP in Healthcare • Analyzing text written by patients to gauge their mental status • Monitoring medication compliance and drug abuse These applications are currently experimental and not fully deployed.
Analyzing Patient Text • Linguistic Inquiry and Word Count (LIWC) tool [Pennebaker et al. 2003] was used to analyze text written by patients for: • Predicting post-bereavement improvements in mental and physical health • Predicting adjustment to cancer • Recognizing suicidal and non-suicidal individuals • Roark et al. [2007] applied parsing methods to analyze sentences by patients to diagnose mild cognitive impairment
Monitoring Medical Compliance and Drug Abuse • Post-marketing surveillance of Internet chatter (message board postings, blogs etc.) related to pharmaceutical products was done to detect abuse of certain drugs by Butler et al. [2007] using NLP methods • Malouf et al. [2006] found associations, such as side effects, risks and dosage related issues, for epilepsy patients and their caregivers for certain medications from Internet discussion groups • Currently researchers are working at determining adverse drug effects from Internet postings using NLP methods
Future Work and Conclusions • CDS systems, in general, are not currently in wide use and that is also true for NLP-CDS systems, despite demonstrated benefits and local successes, but there is a renewed interest due to health-care data becoming electronic • Improving future use of NLP in CDS will need: • Adapting to clinicians and getting their trust • Progress in clinical NLP • Evaluation of impact on health-care not just the specific NLP task