1 / 77

Some thoughts on PATO

This article discusses the ontology PATO and its use in annotation for integration across domains, levels of granularity, and perspectives. It explores the importance of rigorous definitions and proposes a policy for definitions in PATO. It also examines the concept of qualities and their relationship to measurements, providing examples for clarity.

amish
Télécharger la présentation

Some thoughts on PATO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some thoughts on PATO Chris Mungall BBOP Hinxton May 2006

  2. Outline • Motivation revisited • The Ontology: PATO • OBD & using PATO for annotation

  3. Who should use PATO? • Originally: • model organism mutant phenotypes • But also: • ontology-based evolutionary systematics • neuroscience; BIRN • clinical uses • OMIM • clinical records • to define terms in other ontologies • e.g. diploid cell; invasive tumor, engineered gene, condensed chromosome

  4. Unifying goal: integration • Integrating data • within and across these domains • across levels of granularity • across different perspectives • Requires • Rigorous formal definitions in both ontologies and annotation schemas

  5. Some thoughts on the ontology itself • Outline • Definitions • how do we define PATO terms? • what exactly is it we’re defining? • is_a hierarchy • what are the top-level distinctions? • what are the finer grained distinctions? • shapes and colors

  6. It’s all about the definitions • Everything is doomed to failure without rigorous definitions • even more so with PATO than other ontologies • OBO Foundry Principle • Definitions should describe things in reality, not how terms are used • def should not use the word ‘describing’ • Should we come up with a policy for definitions in PATO • currently: 19 defs (2.5 are circular) • proposed breakout session: examine all these

  7. consistency: the property of holding together and retaining shape amplitude: The size of the maximum displacement from the 'normal' position, when periodic motion is taking place placement: The spatial property of the way in which something is placed pointed value: A sharp or tapered end epinastic value: A downward bending of leaves or other plantnparts oblong value: Having a somewhat elongated form withnapproximately parallel sides elliptic value: Elliptic shapen hearted value: Heart shaped fasciated value: Abnormally flattened or coalescedn opacity: The property of not permitting the passage of electromagnetic radiatio opaque value: Not clear; not transmitting or reflecting light or radiant energy undulate value: Having a sinuate margin and rippled surface permeability: The property of something that can be pervaded by a liquid (as by osmosis or diffusion) porosity: The property of being porous; being able to absorb fluids porous value: able to absorb fluids viscosity: a property of fluids describing their internal resistance to flow viscous value: a relatively high resistance to flow. latency: The time that elapses between a stimulus and the response to it power: The rate at which work is done

  8. Proposal: genus-differentia definitions • An S is a Gwhich D • Each def should refine the is_a parent • Single is_a parent • Example: (non-PATO) • binucleate cell def= a cell whichhas two nuclei • Example (proposed PATO def): • convex shape def= a shape which has no indentations • opacity def= an optical quality which exists by virtue of the bearer’s capacity to block the passage of electromagnetic radiation • v similar to existing def

  9. This policy will reap benefits • Advantages: • Helps avoid circularity • Ensures precision • Consistency in wording user-friendly • Considerations: • Sometimes leads to awkward phrasing • -ity suffix - “an opacity which…” • Solution: • allow shortened gerund form • having…, being…., …. • most of the existing defs conform already • implicit prefix “A G which exists by virtue of the bearer…”

  10. From the top down • First, the fake term ‘pato’ must be removed • How do we define ‘attribute’? • Note: I prefer the term ‘quality’ or ‘property’ • attribute implies attribution • length_in_centimetres is an attribute • we can of course continue to say ‘attribute’ but I use ‘quality’ in these slides • most of new new pato defs are phrased as ‘a property of…’ which I like, but inconsistent with calling the root ‘attribute’ • Well then, what is a quality/property?

  11. What a quality is NOT • Qualities are not measurements • Instances of qualities exist independently of their measurements • Qualities can have zero or more measurements • These are not the names of qualities: • percentage • process • abnormal • high

  12. Some examples of qualities • The particular redness of the left eye of a single individual fly • An instance of a quality type • The color ‘red’ • A quality type • Note: the eye does not instantiate ‘red’ • PATO represents quality types • PATO definitions can be used to classify quality instances by the types they instantiate

  13. the type “eye” the type “red” instantiates instantiates the particular case of redness (of a particular fly eye) an instance of an eye (in a particular fly) inheres in (is a quality of, has_bearer)

  14. Qualities are dependent entities • Qualities require bearers • Bearers can be physical objects or processes • Example: • A shape requires a physical object to bear it • If the physical object ceases to exist (e.g. it decomposes), then the shape ceases to exist • Some qualities are relational • they relate a bearer with other entities • e.g. sensitivity (to) • Compare with: functions

  15. The PATO hierarchy • Proposal for a new top level division • Proposal for granular divisions

  16. Proposal 1: top level division • Spatial quality • Definition: A quality which has a physical object as bearer • Examples: color, shape, temperature, velocity, ploidy, furriness, composition, texture • Spatiotemporal quality • Definition: A quality which has a process as bearer • Examples: rate, periodicity, regularity, duration

  17. Proposal 2: subsequent divisions • Based on granularity (i.e. size scale) • a good account of granularity is vital for inferences from molecular (gene) level to organismal (disease) level • How do we partition the levels? • Some qualities are realised at certain levels of granularity • Others can be realised across levels • shape, porosity • Sum-of-parts vs emergent

  18. Granular hierarchy • quality • spatial quality • spatial physical and physico-chemical quality • mass, concentration • spatial biological quality • spatial molecular quality • spatial cellular quality • spatial organismal quality • spatial quality, multiple scales • morphology/form • optical quality • color, opacity, fluorescence

  19. Advantages of dividing by granularity • Modular • strategic question • should we focus on biological qualities and work with others on morphology, physics-based qualities etc? • Good for annotation • easy to constrain at high level • e.g. organismal qualities cannot be borne by molecules • Mirrors GO and OBO Foundry divisions • Easier to find terms • to be proved, but I believe so

  20. Considerations • Possible objection: • The upper level of an ontology is what the user sees first • terms such as “cross-granular quality” may be perceived as undesirable and/or abstruse by some users • Counter-argument • Solvable using ontology views • aka subsets, slims

  21. Relative and absolute • Currently PATO terms often come in 3s: • e.g. mass, relative mass, absolute mass • Why do we need these?

  22. PATO: One or two hierarchies? • Currently two hierarchies • attribute • value • My position: • there should be one hierarchy of qualities • My compromise: • it should be possible to transform PATO automatically into a single hierarchy

  23. attribute CurrentPATO value color colorV hue sat. var. hueV sat.V var.V is_a … blackV blueV darkV paleV range

  24. attribute Proposedchange attribute color color hue sat. var. hue sat. var. is_a … black blue dark pale

  25. Arguments for a single hierarchy • Practical • elimination of redundancy • no clear line for deciding what should be A and what should be V • shape, bumpy vs bumpiness • Ontological • what kind of thing is a ‘value’? Diederich 1997: [quote here]

  26. Arguments against • Two hierarchies reflect cognitive and linguistic structures • e.g. the color of the rose changed from red to brown • 3 cognitive artifacts • we want to present data in a way that is natural to users • …but this can be solved with a single collapsed hierarchy • Two are useful for cross-products • see later - distinguish modifiers from values • EAV is common database pattern • so…?

  27. Compromise: transformations • The Two Hierarchies approach is workable if they can be automatically collapsed • Prerequisite: univocity • Each ‘value’ must be defined to mean exactly one thing only • i.e. Each ‘value’ must be the ‘range’ of a single attribute • Example • having a value ‘fast’ that could be applied to both the spatial quality ‘velocity’ and the process quality ‘duration’ would be forbidden

  28. attribute Collapse on ‘ranges’ value color colorV hue sat. var. hueV sat.V var.V is_a … blackV blueV darkV paleV range

  29. Shapes and colors

  30. How many types of shape are there? • notched, T-shaped, Y-shaped, branched, unbranched, antrose, retrose, curled, curved, wiggly, squiggly, round, flat, square, oblong, elliptical, ovoid, cuboid, spherical, egg-shaped, rod-shaped, heart-shaped, … • How do we define them? • How do we compare them? • Is it worth the effort?

  31. Shape types need precise definitions to be useful • Real shapes are not mathematical entities • but mathematical definitions can help • Axes of classification: • Dimensionality • 2-4D (process “shapes”) • concave vs convex • angular vs non-angular • number of • sides • corners • Primitive and composed shapes • Work with morphometrics community?

  32. Shape likeness • We can post-coordinate some shape types • egg-shaped • head-shaped • A2-segment-shaped • Dangers of circularity • Only for genuine likeness (e.g. homeotic transformation) • not “heart-shaped leaf” • See annotation section of this presentation

  33. Color • Keep PATO HSV model • but is black a color hue? • We should allow overlapping partitions of color space • different domains have ‘sub-terminologies’ of color • Is color relational? • Humans vs tetrachromatic UV-seeing animals • Composition • using has_part

  34. Color hierarchy • Physical quality • Optical quality: a physical quality which exists in virtue of the bearer interacting with visible electromagnetic radiation • Chromatic quality: an optical quality which exists in virtue of the bearer emitting, transmitting or reflecting visible electromagnetic radiation • Color hue • Color saturation • Color variation • Color • Opacity: an optical quality which exists in virtue of the bearer aborbing visible electromagnetic radiation • opaque • translucent • transparent

  35. Part 2: Annotation using PATO • Annotation scheme desiderata • OBD Dataflow • Proposed annotation scheme

  36. Annotation scheme desiderata • Rigour • There is a subset of the scheme which is simple • The entire scheme is expressive

  37. It should have an unambiguous mapping to real world entities • Even if PATO is completely unambiguous, an ill-defined annotation scheme may leave room for ambiguity • Example: • Annotation: • E=eye, Q=red • What does this mean? • both eyes are red in this one fly instance • at least one eye is red in this one fly instance • a typical eye is red in this many-eyed spider • both eyes are red in this one fly at some point in time • both eyes are red in this one fly at all times • all eyes are red in all flies in this experiment • some eyes are red in some flies in this experiment

  38. There should be a certain usable subset that is simple • Rationale - MODs have limited resources: • building entry tools for simple subsets is easier • building databases and query/search engines is easier • curating with a less expressive formalism is easier, faster and requires less training • MODs primary use case is search, for which expressivity is less useful • Specifics • Tools should have an (optional) simple facade • Simple annotations should be expressible in a simple syntax that is understood by users with relatively little training • There should be an exchange format and/or database schemas that use traditional technology as might be used in a MOD • eg XML, relational tables

  39. The scheme must be highly expressive • Rationale • May be required by other NCBCs (BIRN) • May be required for cbio 200 gene list • Will be required in future • Specifics • Expressive superset will be optional • MODs can ‘pick and choose’ their subset • Native exchange and storage format will be logic-based • Details outwith scope of this presentation

  40. Dataflow • How will various kinds of phenotypic data get into OBD? • what kinds of data suppliers will use different formalisms? • 3 scenarios… (more possible)

  41. Example dataflow I • generic MOD curators annotates phenotypes using Phenote • Annotations stored directly in MOD’s central DB • MOD periodically submits to OBD • eg using Phenote to create pheno-xml • OBD converts pheno-xml to native logic-based formalism • Users can query MOD directly, or OBD • OBD will allow more expressive queries and have more data integrated

  42. Example dataflow 2 • Non-MOD generates complex annotations and stores them locally • e.g. BIRN group? • Periodic submissions to OBD • e.g. as OWL or Obo-format instance data • OBD converts to native logic-based formalism • Users can query OBD using more complex queries

  43. Example dataflow 3 • cBio MOD curates 200 genes using Phenote • Annotations may be stored outside normal MOD schema • schema may not be expressive enough for complicated phenotypes • TBD - up to MOD • Periodic submissions to OBD • Phenote can be used to submit pheno-xml, OWL or OBO • MOD doesn’t have to worry about format • OBD converts to native formalism • Users can query OBD using relatively complex queries • Is this (should it be) different from #1?

  44. MOD A MOD B MOD C Non-MOD pheno-detailed XML file OBD

  45. Proposed annotation schema • The schema will be described informally using a simple syntax • I use ‘E’ for entity and ‘Q’ for quality • Pretend it is EAV if you like • with implicit superfluous ‘A’ • The schema has (will have) a formal interpretation • aim: database exchange and removal of ambiguities • can be expressed using logical language • OBD will use an internal logic-based representation

  46. Outline of annotation schema • ‘EAV’ or ‘EQ’ is not enough • Fine for (very) simple subset • Extensions: • time • relational qualities • post-coordination of entity types • count qualities • measurements • …

  47. Standard case: monadic qualities • Examples • E=kidney, Q=hypertrophied • autodef: a kidney which is hypertrophied • We assume that there is more contextual data (not shown) • e.g. genotype, environment, number of organisms in study that showed phenotype • Interpretation (with the rest of the database record): • all fish in this experiment with a particular genotype had a hypertrophied kidney at some point in time

  48. Quantification • long thick thoracic bristles • 2 statements • E=thoracic bristle, Q=long • E=thoracic bristle, Q=thick • Default interpretation • A typical thoracic bristle is long and thick • Optional entity quantifiers • EQuant={some,all,most,<percentage>,<count>} • E=thoracic bristle, Q=long, EQuant=80% • 80% of the thoracic bristles in this one individual fly

More Related