1 / 38

Formalizing heritage?

László van den Hoek Barend Mons Erik van Mulligen Erasmus MC Rotterdam. Elena Beißwanger Stefan Schulz Holger Stenzhorn Freiburg/Jena/IFOMIS. Formalizing heritage?. Bridging UMLS and BioTop for text mining. Example. UMLS SN. BioTop. Semantic tagging. Semantic tagging.

Télécharger la présentation

Formalizing heritage?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. László van den Hoek Barend Mons Erik van Mulligen Erasmus MC Rotterdam Elena Beißwanger Stefan Schulz Holger Stenzhorn Freiburg/Jena/IFOMIS Formalizing heritage? Bridging UMLS and BioTop for text mining

  2. Example UMLS SN BioTop

  3. Semantic tagging

  4. Semantic tagging

  5. Improving the state of art • State of the art NLP offers 85% Precision/Recall • In case of multiple possibilities, help eliminate options • Improvement over state of art through incorporating additional information sources • Factual links from GO • Factual / “Factual” links from SwissProt • Our goal: see if UMLS SN can help in a similar manner

  6. WikiProteins Can accommodatemultiple datasets Stores concepts, and expressions describing them

  7. External information Allows concepts to be annotated

  8. Metadata

  9. Ferns (Pteridophyta)

  10. Wiki Filtering OK “Fern is in kingdom plant”

  11. Wiki Filtering Error “Fern is in kingdom bicycle”

  12. Did you mean… (manufacturing plant) Wiki Filtering “Fern is in kingdom plant”

  13. Background • UMLS Semantic Network • BioTop/OWL-DL

  14. Background:UMLS Semantic Network (SN) • Classification scheme • Interface terminology for UMLS • Based on frame-based logic • Classes: Semantic Types (ST) • Relations are defined between ST’s

  15. Portion of the UMLS Semantic Network

  16. Issues with the UMLS ST’s • Some ST’s are defined ambiguously or vaguely • Some arbitrary divisions are present • Categories have relatively low granularity • As is intended, to maintain usability

  17. Why UMLS? • Often used for named entity recognition • Peregrine, Metamap, etc. • Widely used in practice • 1.2M UMLS concepts are tagged with one or more of the 135 ST’s • Coding system, e.g. electronic health records • Adding ontological rigor may extend its applicability

  18. Background: BioTop • Mid-to-upper-level domain ontology • Rooted at BFO, expands into biomedicine • Intended umbrella for OBO Foundry • Written in OWL-DL • Formal rigor from BFO • Unambiguous • Allows reasoning/consistency checking • Still undergoing development

  19. Top: Basic Formal Ontology Middle downwards: BioTop (or Dolce)

  20. Prototype procedure • Map UMLS ST’s onto BioTop • Translate relationships to properties • Answer questions with reasoner • What ST’s are related to “plant” (fern)? • (How) are “plant” and “manufactured object” (bicycle) related? • Evaluate disambiguation using ontology

  21. Mapping Mapping file contains owl:imports for both UMLS ST tree and BioTop Mapping imports imports BioTop UMLS ST

  22. Mapping Within the mapping file, equivalent classes are defined by owl:equivalentClass umls:plant ≡ biotop:plant, biotop:plant ≡ umls:plant Mapping imports imports BioTop UMLS ST

  23. Mapping If no equivalent class can be found, confer with BioTop authors; either,... umls:machine activity ≡ ? Mapping imports imports BioTop UMLS ST

  24. Mapping …appropriate class is added to BioTop core and equivalence is stated, or… umls:machine activity ≡ biotop:MachineAction Mapping imports imports BioTop UMLS ST biotop:MachineAction

  25. Mapping …helper class is added to mapping itself Subclass of BioTop:BacterialCell umls:Chlamydia or Rickettsia ≡ mapping:ChlamydialCell U mapping:Rickettsialcell Mapping Subclass of BioTop:BacterialCell imports imports BioTop UMLS ST

  26. Considerations • Equal name doesn’t mean equivalence • Different name doesn’t mean difference • Some things can’t be translated into classes • A logically sound ontology may still contain real-world contradictions

  27. Mapping results • Initially: • 10 ST’s match directly with a BioTop class • 14 classes were “close enough” • After iterative revisions of BioTop: • 3 ST’s defined as conjunctions • Many classes added to core BioTop • Some are straight matches, others are not • ~70 ST’s remain unmapped • Mainly “Event” and “Phenomenon or Process” trees

  28. ArtefactRole DiagnosticRole FindingRole FoodRole OccupationalRole PoisonRole ResearchRole SignallingRole SignOrSymptomRole TherapeuticRole DrugRole VitaminRole *Role Many things can be defined by their role:

  29. Mapping SN relationships • Some reinterpretation is necessary due to underspecification • Implicit semantics, domain expert required • What does the presence of a link mean? • Some/some, some/all, all/some, all/only, all/each • Naïve approach: add relationships as properties of classes

  30. Biologic Function | affects | Organism Cell Component | affects | Physiologic Function … … Anatomical Abnormality | affects | Organism Anatomical Abnormality | affects | Physiologic Function <owl:ObjectProperty rdf:ID="affects_anatomical_abnormality"> <rdfs:domain rdf:resource=“&umls;#Anatomical_Abnormality"/> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <rdf:Description rdf:about=“&umls;#Organism"/> <rdf:Description rdf:about="&umls;#Physiologic_Function"/> </owl:unionOf> </owl:Class> </rdfs:range> <rdfs:subPropertyOf> <owl:ObjectProperty rdf:ID="affects"/> </rdfs:subPropertyOf> </owl:ObjectProperty>

  31. SPARQL woes <owl:ObjectProperty rdf:ID="affects_anatomical_abnormality"> <rdfs:domain rdf:resource=“&umls;#Anatomical_Abnormality"/> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <rdf:Description rdf:about=“&umls;#Organism"/> <rdf:Description rdf:about="&umls;#Physiologic_Function"/> </owl:unionOf> </owl:Class> </rdfs:range> <rdfs:subPropertyOf> <owl:ObjectProperty rdf:ID="affects"/> </rdfs:subPropertyOf> </owl:ObjectProperty>

  32. SPARQL query SELECT DISTINCT ?d ?p ?r WHERE { ?p rdfs:domain ?d . ?rs rdfs:subClassOf ?d . ?p rdfs:range ?r FILTER ( ?rs = umls:Organism ) } ORDER BY ?d Or: what classes are in the domain of a property that also has Organism in its range?

  33. Current status • Can’t follow OWL:UnionOf approach because of current reasoner limitations • Stop-gap solution: make one property for each SN relationship • Avoids owl:unionOf • Allows us to get actual results • Not in the spirit of OWL • Makes maintenance more difficult

  34. Issues • “Untranslatable” classes • Hidden semantics • Interpretation • Relations: properties or classes? • Lack of a proper query language

  35. Discussion • BioTop used as a framework for formalizing UMLS Semantic Network • Tap into existing resources • Mapping has not yet been put to the test

  36. Future directions • Evaluate effect of ontology on precision/recall of tagging • User interface aid • Extend ontology as needed • Feed back findings to improve UMLS SN

  37. Special thanks • Ronald Cornet (AMC) • Olivier Bodenreider (NIH) • Jeen Broekstra (WUR)

  38. URLs • Mapping: http://purl.org/biotop/umls-mapping • BioTop: http://purl.org/biotop/ • Wikifier: http://wikifier.wikiprofessional.org/ • WikiProteins: http://proteins.wikiprofessional.org/ • Biosemantics group: http://www.biosemantics.org/

More Related