1 / 17

Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

MCORES: a system for noun phrase coreference resolution for clinical records . 2012 SHARPn Summit “Secondary Use”. Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany, NY, USA.

dawn
Télécharger la présentation

Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MCORES: a system for noun phrase coreference resolution for clinical records 2012 SHARPn Summit “Secondary Use” Andreea Bodnari,1 Peter Szolovits,1 Ozlem Uzuner2 1MIT, CSAIL, Cambridge, MA, USA 2Department of Information Studies, University at Albany SUNY, Albany, NY, USA 10.16.2012- Rochester, MN

  2. Outline • Medical coreference resolution system (MCORES) • Experimental results • Conclusion

  3. Why coreference resolution? • Electronic Medical Records (EMRs) – large information repositories • Clinical information requires processing • Lower level: sentence parsing, tokenization • Higher level: coreference resolution, semantic disambiguation • Coreference resolution: a fundamental step in text processing

  4. Data: i2b2/VA corpus • English medical corpus provided by i2b2 National Center for Biomedical Computing • De-identified medical discharge summaries • Source: PH & BIDMC • Content: 230(PH) + 196(BIDMC) discharge summaries • Annotated concepts and coreference chains • Concept types Persons Problems Treatments Tests Pronouns

  5. Coreference resolution algorithm NP Instance Creation Feature Generation Classification Output Clustering

  6. 1. NP instance creation • Markables of same semantic category are paired together • MCORES creates positive instances only from neighboring markable pairs in a chain 1Instance creation akin to McCharty and Lehnert

  7. 1. NP instance creation Table 3: Distribution of coreferent and non-coreferent instances per semantic category over instances containing exact, partial, and no textual overlap.

  8. 2. Feature Generation • Multi-perspective features • Antecedent perspective • Anaphor perspective • Greedy perspective • Stingy perspective • Phrase-level lexical • Sentence-level lexical • Syntactic • Semantic • Miscellaneous

  9. 2. Feature Generation (lexical) Phrase-level lexical • Token overlap* • Normalized token overlap • Edit-distance • Normalized edit-distance Sentence-level lexical • Sentence-level token overlap* • Filtered sentence-level token overlap* • Left and right mention overlap • stingy and greedy perspectives only * multi-perspective feature

  10. 2. Feature Generation (syntactic & semantic) Syntactic • Number agreement • Noun overlap* • Surname match Semantic • UMLS CUI overlap* • UMLS CUI token overlap* • UMLS semantic type overlap* • Anaphor UMLS semantic type * multi-perspective feature

  11. 2. Feature Generation (miscellaneous) • Token distance • Mention distance • All-mention distance • Sentence distance • Section match • Section distance

  12. 3. Classification • C4.5 decision tree algorithm • Flexible • Readable prediction model • Classify pairs of markables based on values of the feature vectors

  13. 4. Output Clustering • Classifier makes pairwise predictions only • Pairwise predictions clustered into coference chains • Aggressive-merge1 clustering algorithm prediction [M1] - [M2] all preceding pairwise predictions linked to [M1]or [M2] 1Aggresive-merge algorithm proposed by McCarthy and Lehnert

  14. Evaluation • Feature set evaluation • Perspectives evaluation • Performance evaluation against • In house baseline • Third party system (RECONCILEACL09& BART) • Evaluation metric: unweighted averages of Recall, Precision, and F-measures of • MUC • B3 • CEAF • BLANC

  15. Discussion • MCORES’ advantage comes from linking markables with no token overlap • Phrase-level sub-MCORES performs similarly to MCORES • Greedy perspective system is the most favorable single-perspective system • Multi-perspective system performs as well or better than single-perspective systems • Error analysis • MCORES fails to classify misspelled person pairs • Medical problems false positives due to difference between newly and recurring events • Treatments false positives due to medications presenting different routes of administration • Tests false positive due to the large number of full overlap instances that did not corefer

  16. Conclusion • Developed coreference resolution system for the medical domain (MCORES) • MCORES innovates through a multi-perspective and knowledge-based feature set • MCORES outperforms third party systems and an in-house baseline, improving coreference resolution on clinical records

More Related