1 / 128

Building Ontologies from the Ground Up When users set out to model their professional activity

Building Ontologies from the Ground Up When users set out to model their professional activity. Mark A. Musen Professor of Medicine and Computer Science Stanford University. v 1.00. “An ontology is a specification of a conceptualization” (T. Gruber).

emele
Télécharger la présentation

Building Ontologies from the Ground Up When users set out to model their professional activity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Ontologies from the Ground Up When users set out to model their professional activity Mark A. Musen Professor of Medicine and Computer Science Stanford University v 1.00

  2. “An ontology is a specification of a conceptualization” (T. Gruber) • A conceptualization is the way we think about a domain • A specification provides a formal way of writing it down

  3. Porphyry’s depiction of Aristotle’s Categories Supreme genus:SUBSTANCE Differentiae: material immaterial Subordinate genera:BODYSPIRIT Differentiae: animate inanimate Subordinate genera:LIVINGMINERAL Differentiae: sensitive insensitive Proximate genera:ANIMALPLANT Differentiae: rational irrational Species:HUMANBEAST Individuals:Socrates Plato Aristotle …

  4. Creating Ontologies in Machine-Processable Form • Provides a mechanism for developers to codify salient distinctions about the world or some application area • Provides a structure for knowledge bases that can enable • Information retrieval • Information integration • Automated translation • Decision support

  5. The New Philosophers • Categorizing “what exists” in machine-understandable form • Providing a structure that enables • Developers to locate and update relevant descriptions • Computers to infer relationships and properties • Creating new abstractions to facilitate the creation of this structure

  6. Part of the CYC Upper Ontology

  7. There is a misconception … • That people building ontologies are all well versed in metaphysics, computer science, knowledge representation, and the content domain • That ontologies in the real world are as “clean” as SUMO, DOLCE, and other upper-level ontologies • That most people who are creating ontologies understand all the ramifications of what they are doing!

  8. Lots of ontology builders are not very good philosophers • Nearly always, ontologies are created to address pressing professional needs • The people who have the most insight into professional knowledge may have little appreciation for metaphysics, principles of knowledge representation, or computational logic • There simply aren’t enough good philosophers to go around

  9. Practical Problems BioInformatics

  10. The pressing need to standardize the names of human genes

  11. But the human genome is only part of the problem … • Scientist maintain huge databases of gene sequences and gene expression for a wide range of “model organisms” (e.g., mouse, rat, yeast, fruit fly, round worm, slime mold) • Database entries are annotated with the entries such as the name of a gene, the function of the gene, and so on • How do you ensure uniformity in the nature of these annotations?

  12. Gene Ontology Consortium • Founded in 1998 as a collaboration among scientists responsible for developing different databases of genomic data for model organisms (fruit fly, yeast, mouse) • Now, essentially all developers of all model-organism databases participate • Goal: To produce a dynamic, controlled vocabulary that can be applied to all organism databases even as knowledge of gene and protein roles in cells is accumulating and changing

  13. Gene Ontology (GO) • Comprises three independent “ontologies” • molecularfunction of gene products • cellularcomponent of gene products • biological process representing the gene product’s higher order role. • Uses these terms as attributes of gene products in the collaborating databases (gene product associations) • Allows queries across databases using GO terms, providing linkage of biological information across species

  14. GO = Three Ontologies • Molecular Function • elemental activity or task • example: DNA binding • Cellular Component • location or complex • example: cell nucleus • Biological Process • goal or objective within cell • example: secretion

  15. GO has been wildly successful!! • Dozens of biologists around the world contribute to GO on a regular basis • The ontology is updated every 30 minutes! • It’s now impossible to work in most areas of computational biology without making use of GO terms

  16. But GO has real problems … • Ontologies are represented in an idiosyncraticformat that is not compatible with standard knowledge-representation systems • The format is based on directed acyclic graphs of concepts, without the general ability to specify machine interpretable properties of concepts or definitions of concepts • Because of the informal knowledge-representation system, lots of errors have crept into GO • Terms that are duplicated in different places • Terms with no superclasses • Uncertain relationships between terms

  17. Tension in the GO Community • Biologists around the world with pressing needs to integrate research databases work together to add terms to GO nearly continuously • Using an impoverished, nonstandard knowledge-representation system • Using no standards to assure uniform modeling conventions from one part of GO to another • Computer scientists bemoan all this ad-hoc-ery and condemn GO as a hack that will become increasingly unusable and unmaintainable

  18. A wonderful keynote talk from the recent meeting on Standards and Ontologies for Functional Genomics The Capulets and MontaguesA plague on both your houses? Professor Carole Goble University of Manchester, UK Warning: This talk contains sweeping generalisations

  19. Prologue  Carole Goble Two households, both alike in dignity, In fair genomics, where we lay our scene, (One, comforted by its logic’s rigour, Claims ontology for the realm of pure, The other, with blessed scientist’s vigour, Acts hastily on models that endure), From ancient grudge break to new mutiny, When “being” drives a fly-man to blaspheme. From forth the fatal loins of these two foes Researchers to unlock the book of life; Whole misadventured piteous overthrows Can with their work bury their clans’ strife. The fruitful passage of their GO-mark'd love, And the continuance of their studies sage, Which, united, yield ontologies undreamed-of, Is now the hours' traffic of our stage; The which if you with patient ears attend, What here shall miss, our toil shall strive to mend. Based on an idea by Shakespeare

  20.  Carole Goble The Montagues One, comforted by its logic’s rigour, Claims ontology for the realm of pure Computer Science, Knowledge engineering, AI Logic and Languages Theory Top down, well-behaved neatness Generic and lots of toys Methodologies & patterns Tools and standards Technology push Academic pursuit

  21.  Carole Goble The Capulets The other, with blessed scientist’s vigour, Acts hastily on models that endure Life Scientists Practice Bottom up, real-world Specific and many of them Methodologies, community practice Tools and standards Application pull Practical pursuit – build ‘n’ use it

  22.  Carole Goble The Philosophers One, comforted by its logic’s rigour, Claims ontology for the realm of pure Philosophers Theory Truth Generic – the one true ontology? Methodologies, patterns & foundational ontologies Not really into tools No push or pull Academic pursuit

  23. Endurants, Perdurants, Being, Substance, Event  Carole Goble Philosophers Spiritual guides Aesthetics KR Montagues Life Scientists Capulets Theoreticians Pragmatists The end Mechanism providers A means to an end Content providers

  24.  Carole Goble The Princes of Genomics Rebellious subjects, enemies to peace, Profaners of this neighbour-stained steel,-- Will they not hear? What, ho! you men, you beasts, That quench the fire of your pernicious rage With purple fountains issuing from your veins, On pain of torture, from those bloody hands Throw your mistemper'd weapons to the ground, And hear the sentence of your moved prince. Three civil brawls, bred of an airy word, By thee, old Capulet, and Montague, Have thrice disturb'd the quiet of our streets, And made genomics's ancient citizens Cast by their grave beseeming ornaments, To wield old partisans, in hands as old, Canker'd with peace, to part your canker'd hate:

  25. A tragedy? As in Romeo and Juliette, the threats are political and sociological

  26. Creating ontologies has become a widespread cottage industry • Professional Societies • MGED: Microarray Gene Expression Data Society • HUPO: Human Protein Organization • Government • NCI Thesaurus • NIST: Process Specification Language • Open Biological Ontologies • GO • Three dozen (and growing) other ontologies • Mostly in DAG-Edit, some in Protégé format

  27. Government Continues to be a Major Driving Force • Highly visible intramural initiatives to create public ontologies at many agencies, including NIST, NIH, VA, CDC • Notable variation in these ontologies’ • Scope • Representational sophistication • “Openness” of content • Opportunities for peer review

  28. NCI Enterprise Vocabulary Services 1997: R. Klausner, Director NCI, wanted a “science management system” • Know about everything funded by NCI • Goals and results – “bench to bedside” • Thereby improve and speed translation of research • Approach: • Create integrative terminology • Evolve terminology scope from supporting grants management to supporting science • Build Web-accessible infrastructure – caCORE

  29. More than 37,000 concepts are represented with extremely detailed granularity in many areas

  30. Definitions may include considerable detail with respect to properties that establish relationships with other concepts

  31. NCI Thesaurus is in Active Use nciterms.nci.nih.gov ncicb.nci.nih.gov/core/EVS (more info) Website: 1500-4000 page hits daily, 14K unique visitors (2004) • API: NCICB & external applications • Fulfills NCI and collaborators’ needs for controlled vocabulary • Public domain, open content license

  32. NCI Thesaurus Guidelines • Develop content model (based on Ontylog description logic from Apelon, Inc.) • Leverage existing sources as appropriate • MeSH, VA NDF-RT, MedDRA … • Develop unique content where needed • Cancer genes, gene products, cancer diagnoses, drugs, chemotherapies, molecular abnormalities etc., and relationships among them • Link to other standards using URLs where possible • OMIM, Swissprot, GO

  33. : NCI uses an Elaborate Process for Editing and Maintenance

  34. The NCI Thesaurus is not without its problems • Upper level concepts are sometimes used inconsistently or not at all • Textual definitions of concepts may not always reflect the meaning implied by the concepts’ position in the ontology • Reliance on a proprietary knowledge-representation system • Prevents the ability to disseminate the ontology freely • Adds an unfortunate degree of uncertainty to the semantics

  35. Throughout this cottage industry • Lots of ontology development, principally by content experts with little training in conceptual modeling • Use of development tools and ontology-definition languages that may be • Extremely limited in their expressiveness • Useless for detecting potential errors and guiding correction • Nonadherent to recognized standards • Proprietary and expensive

  36. But the world is beginning to change! • The Montagues do want to get the modeling right! • The Capulets do want to see their work used by others! • Useful, open tools and standards are now available that make it hard to justify closed, proprietary approaches

  37. Some signs the world is changing … • Developers of several overlapping and incompatible ontologies of anatomy suddenly are trying to understand why their models do not agree • Philosopher Barry Smith suddenly is camping out at biomedical informatics meetings to get the attention of ontology developers • NCI is piloting the use of OWL and Protégé to encode and manage the NCI thesaurus • MGED and several other biomedical ontologies are being authored in OWL and Protégé from the beginning • Downloads of the Protégé system continue to escalate

  38. Protégé’s main features • Simplified editing of ontologies and knowledge bases • Open-source distribution to encourage development by a world-wide community of users • A plug-in architecture that enables developers to add new features easily • Support for a wide range of representation formats • CLIPS/COOL • XML Schema • UML • RDF • OWL

  39. Protégé is ecumenical in its support for formal languages • Open Knowledge Base Connectivity Protocol • CLIPS/COOL • UML • XML Schema • RDF and RDFS • Topic Maps • Ontology Web Language (OWL)

  40. Protégé remains successful because of its user community • There are now 89 plug-ins available for use with Protégé • Collaboration with our users enables rapid debugging and code fixes • Some development, such as the creation of extensions to our basic OWL capabilities, has been a major collaborative experience • Annual users groups meetings provide great opportunities for developers to share strategies, principles, and war stories • Members of the international Protégé community are a huge support base for new users and for fledgling projects

  41. The NCI Thesaurus

  42. Moving from cottage industry to the industrial age • There must be widely available tools that are open-source, that are easy to use, and that adhere to knowledge representation standards: Protégé certainly is a candidate • There must be a large user user community of developers who use the tools and who can provide feedback to one another and to the core team of tool builders

More Related