1 / 21

Infrastructure for Semantic Expansion and Curation of the RadLex Ontology

Infrastructure for Semantic Expansion and Curation of the RadLex Ontology. Rebecca Hazen & Alexander van Esbroeck Northwestern University Dr. David Channin, Mentor. Background. RadLex - Radiology Lexicon Reduce variation and improve clarity in radiology reports

lihua
Télécharger la présentation

Infrastructure for Semantic Expansion and Curation of the RadLex Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infrastructure for Semantic Expansion and Curation of the RadLex Ontology Rebecca Hazen & Alexander van Esbroeck Northwestern University Dr. David Channin, Mentor

  2. Background RadLex - Radiology Lexicon Reduce variation and improve clarity in radiology reports 11,962 terms over 12 categories

  3. Establishing the need… • Missing many terms • Imaging Observations • Imaging Observation Characteristics • Committee dependent development process • Manual, time consuming, expensive • Larger lexicons are harder to manage • Difficult to sustain

  4. Proposed Solution • Develop an automatic term extraction system • Focusing on Imaging Observation and Characteristics • Accelerate the expansion of RadLex • Decrease the demands on committees • Propose lists of strong candidates for inclusion • Reduce development costs

  5. Processing System Description Collect free full-text articles from medical journals Identify new terms using LexEVS and NLP techniques Create ranked lists of imaging observations and characteristics

  6. Processing System Overview LexEVS Concepts/Relationships Article Text Article Finder Candidate Term Identification Data/Annotations Ranked Lists of Imaging Observations & Characteristics Context Processing

  7. LexEVS LexEVS was developed by NCI, NIH, caBIG, Mayo Clinic Designed to fulfill a community need for standards in storing, accessing, managing and distributing controlled vocabularies Combination of LexBIG, LexGrid, EVS Programmable interfaces for accessing and distributing controlled vocabularies Provides a common API

  8. UIMA Architecture • Framework for processing large collections of documents • Processing modules can be connected into pipelines

  9. Article Finder • Locates and retrieves scientific articles • Searches PubMed • Returns free full-text, English, HTML articles. • Removes tags and extracts the article text

  10. Articles Processed • 1,128 Documents {Imaging|CT|MR|PET|X-ray|US|angiography|tomography} findings [Title]

  11. Candidate Phrase Identification • Identifies a list of candidate phrases from the articles • Tokenizer • Part-of-speech Tagger • Linguistic Filter • Extracts sequences of words matching a specific pattern • Increased renal enhancement “-ed” verb, adj, noun

  12. LexEVS Annotator • Use LexEVS to access vocabularies • RadLex 2.0; NCI Thesaurus; HL7; CTCAE • Determine if phrases exist in RadLex as a single concept • Retrieve vocabulary metadata • What us that… • Annotate the document • Build database of annotations • Develop inclusion/exclusion criteria

  13. LexEVS Annotator

  14. Context Processing • Find “indicator” words that are associated with existing RadLex terms • Assign weights to those words as a function of the number of RadLex terms with which they are associated. Focal confluent fibrosis can occur in the cirrhotic liver as a hepatic mass in approximately 14% of cases [ ]. This fibrosis is accompanied by atrophy of the affected liver parenchyma and retraction of the overlying liver capsule (Figure 9 ).

  15. Context Processing • Use those “indicator” words to identify new phrases • Score new phrases as a function of the strength of their association with the “indicator” words. Less extensive findings included interlobular septal thickening. Interlobular septal thickening was seen in 32 patients (89%). A luminal mass was considered to be present if there was a soft-tissue mass in the lumen that arose from the bowel wall.

  16. Phrase Ranking • Calculate a termhood1 value for each phrase • Termhood is based on a combination of: • Nesting • Context Scores • Length • Orthography • Stop List 1 “termhood” refers to the likelihood that a candidate is a real term [2]

  17. Term Splitting • Phrases typically consist of an observation accompanied by one or more characteristics of that observation • Term splitting splits phrases into component characteristics and observations • Based on frequency ratios • Makes two new ranked lists Candidate Term: “mediastinal soft tissue infiltration” • mediastinal • soft tissue • infiltration

  18. Results • # imaging observations • # imaging observation characteristics • % precision • Precision is defined as ….

  19. Conclusions LexEVS is a powerful tool for exploiting a variety of controlled vocabularies Automatic term extraction can identify new imaging observations and observation characteristics Adjusting context and processing can lead to other kinds of terms Broader searches for articles will lead to larger collections of terms

  20. Future Work • Use syntactic structure to improve extraction • Automatic identification of relationships • Infrastructure for distributed editing • Semantic Wiki

  21. Selected References 1. Langlotz CP. RadLex: a new method for indexing online educational materials. Radiographics. 2006 Nov-Dec;26(6):1595-7. 2. Frantzi K, Ananiadou S, Mima H. Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries 2000 3(2);115-130. 3. Baneyx A, Charlet J, Jaulent M. Building an ontology of pulmonary diseases with natural language processing tools using textual corpora. International Journal of Medical Informatics 2007 76:(2-3); 208-215. 4. Zhou L, Tao Y, Cimino J, Chen E, Liu H, Lussier Y, Hripcsak G, Friedman C. Terminology model discovery using natural language processing and visualization techniques. Journal of Biomedical Informatics. 2006 39(6);626-636. 5. Church K, Hanks P. Word association norms, mutual information, and lexicography. Computational linguistics 1990 16(1);22-29. 6. Snow R, Jurafsky D, Ng A. Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems 2005 17;1297-1304.

More Related