1 / 36

Introducing ODIE

Introducing ODIE. NCBO Seminar Series February 18, 2009. Example. IE using ontologies. OE using documents. punch biopsy junctional component pagetoid spread dermal melanocytes Breslow depth lymphocytic infiltrates regression microscopic satellites vascular invasion

Télécharger la présentation

Introducing ODIE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing ODIE NCBO Seminar Series February 18, 2009

  2. Example

  3. IE using ontologies

  4. OE using documents punch biopsy junctional component pagetoid spread dermal melanocytes Breslow depth lymphocytic infiltrates regression microscopic satellites vascular invasion tumor infiltrating lymphocytes Spitz nevus epithelioid nevus

  5. Two Tasks ~ One problem Information Extraction: Uses concepts as source of concepts and relationships to enrich and validate ontology Specific Aims 2,3,4 Ontology Text Ontology Enrichment: Uses concepts as source of concepts and relationships to enrich and validate ontology Specific Aims 1,3,5

  6. Specific Aims Specific Aim 1:Develop and evaluate methods for information extraction (IE) tasks using existing OBO ontologies, including: Named Entity Recognition (NER) Co-reference Resolution (CR) Discourse Reasoning (DR) Attribute Value Extraction (AVE) Specific Aim 2:Develop and evaluate general methods for clinical-text mining to assist in ontology development, including: Concept Discovery (CD) Concept Clustering (CC) Taxonomic Positioning (TP) Specific Aim 3: Develop reusable software for performing information extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture. Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit. Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.

  7. Ontology Enrichment • Machine assisted - Extraction- Filtering and Organization- Visualization- Suggestions • Human decision-maker (developer, curator) • Feedback and improvement of OE

  8. Project Organization Concept Discovery Coreference Resolution ODIE 0.5 Develop and implement architecture and UI; Create framework for using results of research; Implement work of research groups Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms Study and compare methods for ontology enrichment; design methods for evaluation Kaihong Liu Rebecca Crowley Wendy Chapman Kevin Mitchell Wendy Chapman Guergana Savova Melissa Castine Rebecca Crowley Kevin Mitchell Girish Chavan Eugene Tseytlin

  9. Domain Will attempt to develop general tools whenever possible • Priorities for evaluation of components in : • Radiology and pathology reports • NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) • Cancer domains (including hematologic oncology)

  10. Progress • ODIE 0.5 pre-release on NCBO SourceForge • Annotation software and document sets • Res Proj #1: LSP annotation project • Res Proj #2: Coreference resolution annotation • Starting Res Proj #3: Discourse Reasoning

  11. ODIE Software • Toolkit for developers of NLP applications and ontologies • Pre-released on NCBO SourceForge as ODIE 0.5 • Current release focuses on NER and CD • Support interaction and experimentation • Package systems at the conclusion of working with ODIE • Foster cycle of enrichment and extraction needed to advance development of NLP systems • Ontology enrichment as opposed to denovo development • Human-machine collaboration as opposed to fully automated learning

  12. ODIE Download/Info ODIE Installer: http://caties.cabig.upmc.edu/ODIE/odieinstaller.exe GForge Site: https://bmir-gforge.stanford.edu/gf/project/odie/ User Forums: https://bmir-gforge.stanford.edu/gf/project/odie/forum/ ODIE on NCBO Tools Page: http://bioontology.org/tools/ODIE.html

  13. Users/Workflow ODIE is intended for: • users who want to use NCBO ontologies to perform various NLP tasks (+/- may need to add concepts locally to achieve sufficient performance) • users who want to enrich ontologies using concepts derived from documents (very early in process of ontology development)

  14. Plans for ODIE 1.0 Ability to import additional ontologies from Bioportal or from owl files Ability to export proposal/enriched ontologies. Ability to add and configure new processing resources (UIMA or GATE based) Ability to build processing pipelines using processing resources Will come out of the box with a processing pipeline and processing resources for NER, CD and COREF.

  15. Research Project 1:Ontology Enrichment Nearly completed survey of lexical, statistical and hybrid methods for ontology enrichment Methodology to study “utility” of various approaches (Liu, PhD Thesis in progress) First project underway involves the simplest of the methods to be studied – Lexicosyntactic Patterns (LSP) – regular expressions over POS Concept Discovery Study and compare methods for ontology enrichment; design methods for evaluation Kaihong Liu Rebecca Crowley Wendy Chapman Kevin Mitchell

  16. LSP Patterns The presence of certain “lexico-syntactic patterns” can indicate a particular semantic relationship between two nouns Example: DIFFERENTIAL DIAGNOSIS INCLUDES, BUT IS NOT LIMITED TO, SPINDLE CELL NEOPLASM OF PERINEURIAL ORIGIN (SUCH AS SCHWANNOMA) AND SPINDLE CELL MALIGNANT MELANOMA “such as” indicates hyponym relationship between two noun phrase

  17. Technique 1 - LSP • PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS) • COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA

  18. LPS distribution result Number of sentences contain lexico-syntactic pastterns

  19. Step 1 -Domain Expert annotation Annotation tasks: Meaningful medical phrases (MMP) that can stand alone before LSP and after LSP. The phrases before and after LSP have to be related • LSP • Before LSP • After LSP  • PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS) • COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA Calculate : total # of MMP , # of MMP per LSP

  20. Step 2 - Curator Judgment For each pair of terms For each term • Is the concept in the ontology? • If not, should it be added into the ontology? • If not, what is the reason? • What is the relationship between them? • Is this relationship exist in the ontology? • If not, should it be added into the ontology? • If not, what is the reason? New Concept and Relationship Suggestion Rates New Concept and Relationship Acceptance Rates

  21. First experiment result–concept enrichment

  22. First experiment result– concept enrichment (NCIT)

  23. First experiment – extracted relationships

  24. First experiment – extracted relationships

  25. First experiment – Concept Enrichment for RadLex

  26. Research Project 2:Coreference Resolution Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent) Examples of Types of anaphoric relations: Identity (or coreference)Set/subsetPart/whole Anaphora resolution is a computational technique for the discovery of anaphoric relations Coreference Resolution Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms Wendy Chapman Guergana Savova Melissa Castine

  27. Definitions Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent) Type of anaphoric relations Identity (or coreference)Set/subsetPart/wholeOther Anaphora resolution is a computational technique for the discovery of anaphoric relations

  28. Progress Completed and Ongoing: Annotation schema Development Guidelines Training of annotators 4 training sessions IAA: after session 1 – in the 40’s IAA: after session 3 – in the 60’s Planned: Complete Reference Standard (RS) Algorithm testing and further development

  29. Data Sets for RS 50 clinical notes (named entities annotated) 50 Pathology (disorders, tumors) 20 Pathology (conditions) 20 Radiology (conditions) 20 Discharge summaries (conditions) 20 ED (conditions) 20 ED (respiratory conditions) • Mayo • Pitt

  30. QUESTIONS ?

  31. Visualization of document set

  32. NER – viewing concepts

  33. Multiple Ontologies

  34. OE – Concept Suggestion

  35. Ranked Suggestions

  36. Adding Proposals

More Related