The Joy of Ontology Suzanna Lewis SMI Colloquium April 20th, 2006
Sections • Why make an ontology • What is an ontology • How to create an ontology • Logically • Technically • Organizationally • National Center for Biomedical Ontology • Case study: Phenotypes, our current work on OBD
Why make an ontology? What is the motivation?
The Problem(s) with data • Inaccessibility of widely distributed data • Over abundance of information • Speed and performance • Interpreting the data syntactically • Interpreting the data semantically
We started the GO • To develop a shared language adequate for the annotation of molecular characteristics across organisms. • To agree on a mutual understanding of the definition and meaning of any word used. and thus to support cross-database queries. • To provide database access via these common terms to gene product annotations and associated sequences.
Annotation of Yeast Microarray Clusters Using GO Microarray data from Figure 2K of Eisen et al. (1998). Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. 95 (25): 14863-14868.
Motivation: to capture biological reality • Inferences and decisions we make are based upon what we know of the biological reality. • An ontology is a computable representation of this underlying biological reality. • Enables a computer to reason over the data in (some of) the ways that we do.
Ontology (as a branch of philosophy) • The science of what exists in every area of reality • The classification of entities: what kinds of things exist • The relations between these entities • Defines a scientific field's vocabulary and the canonical formulations of its theories. • Seeks to solve problems which arise in these domains.
A machine interpretable representation of some aspect of biological reality A biological ontology is: • what kinds of things exist? eye disc sense organ develops from is_a • what are the relationships between these things? eye part_of ommatidium
Entity: a definition anything which exists, including things and processes, functions and qualities, beliefs and actions, software and images
Representation: a definition • An image, idea, map, picture, name, description ... which refers to, or is intended to refer to, some entity or entities in reality • this ‘or is intended to refer to’ should always be assumed
Ontologies represent types in reality ontology reality
Two kinds of representational artifact • Databases, inventories, images: represent what is particular in reality = instances • Ontologies, terminologies, catalogs: represent what is general in reality (exists in multiple instances) = types (universals, kinds)
reality Ontologies are not for representing concepts in people’s heads ontology
The researcher has a cognitiverepresentation of what is general, based on his knowledge of the science Cognitive representation ontology
types substance organism animal mammal cat frog siamese instances
Cognitive representation Ontology = a representation of types reality An ontology is like a scientific text; it is a representation of types in reality
Atomic representational unit: a definition • terms, icons, bar codes, alphanumeric identifiers ... which • refer, or are intended to refer, to entities in reality, and • are not built out of further sub-representations • Representational units are the atoms in the domain of representations
Modular representational unit: a definition A representation which is built out of other representational units, which together form a structure that mirrors a corresponding structure in reality
The Periodic Table Periodic Table
Ontology: a definition • A modular, representational artifact whose representational units are intended to represent • types in reality • the relations between these types which are true universally (i.e. for all instances) • lung is_a anatomical structure • lobe of lung part_of lung
How to create an ontology Part 1 The logic and science
In computer science, there is an information handling problem • Different groups of data-gatherers develop their own idiosyncratic terms in which they represent information. • To put this information together, methods must be found to resolve terminological and conceptual incompatibilities. • Again, and again, and again…
The Reality Do not assume that data integration can be brought about by somehow ‘mapping’ incompatible, low quality ontologies built for different purposes
Two flavors of ontology • Application ontology • Reference ontology
Application Ontology An application ontology is comparable to an engineering artifact such as a software tool. It is constructed for specific practical purposes.
Reference Ontology A reference ontology is analogous to a scientific theory; it seeks to optimize representational adequacy to its subject matter
Assumptions • There are best practices in ontology development which should be followed to create stable high-quality ontologies • Shared high quality ontologies foster cross-disciplinary and cross-domain re-use of data, and create larger communities
Why do we need rules/standards for good ontology? • Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking) • Unintuitive rules lead to errors in classification • Simple, intuitive rules facilitate training of curators and annotators • Common rules allow alignment with other ontologies (and thus cross-domain exploitation of data) • Logically coherent rules enhance harvesting of content through automatic reasoning systems
Ontologies built according to common logically coherent rules • Will make entry easier and yield a safer growth path • You can start small, annotating your data with initial fragments of a well-founded ontology, confident that the results will still be usable when the ontology grows larger and richer
The OBO Foundry OBO Foundry A subset of OBO ontologies whose developers agree in advance to accept a common set of principles designed to assure • intelligibility to biologist curators, annotators, users • formal robustness • stability • compatibility • interoperability • support for logic-based reasoning
The OBO Foundry • The ontology is open and available to be used by all. • The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. • The ontology is in, or can be instantiated in, a common formal language. • The ontology possesses a unique identifier space within OBO. • The ontology provider has procedures for identifying distinct successive versions.
The OBO Foundry • The ontology has a clearly specified and clearly delineated content. • The ontology includes textual definitions for all terms. • The ontology is well-documented. • The ontology has a plurality of independent users. • The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
Orthogonality Orthogonality • Ontology groups who choose to be part of the OBO Foundry thereby commit themselves to collaborating to resolve disagreements which arise where their respective domains overlap
agreed on relations • The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned. • See “Relations in Biomedical Ontologies”, Genome Biology May 2005, Barry Smith , Werner Ceusters, Bert Klagges, Jacob Köhler, Anand Kumar, Jane Lomax, Chris Mungall, Fabian Neuhaus, Alan L Rector, and Cornelius Rosse
Three fundamental dichotomies continuants vs. occurrents dependent vs. independent types vs. instances ONTOLOGIES ARE REPRESENTATIVES OF TYPES IN REALITY
For example in the GO • Molecules, cell components , organisms are independent continuants which have functions • Functions are dependent continuants which become realized through special sorts of processes we call functionings • Processes are occurrents include: functionings, side-effects, stochastic processes
Continuants (aka endurants) • have continuous existence in time • preserve their identity through change • exist in toto whenever they exist at all
Occurrents (aka processes) • have temporal parts • unfold themselves in successive phases • exist only in their phases
Continuants vs. Occurrents Anatomy vs. Physiology Snapshot vs. Video Stocks vs. Flows Commodities vs. Services Products vs. Processes
Dependent entities • require independent continuants as their bearers • There is no grin without a cat
Dependent vs. independent continuants • Independent continuants (organisms, cells, molecules, environments) • Dependent continuants (qualities, shapes, roles, propensities, functions) • E.g. the acidity of this gut