570 likes | 701 Vues
Existing Standards in Systems Biology. Anatoly Sorokin Computation Systems Biology Group University of Edinburgh. Standard. 2000-2010 is decade of standards in biology 31 MIBI standard 56 OBO ontologies About 80 exchange formats Scope of interest Language Controlled vocabulary.
E N D
Existing Standards in Systems Biology Anatoly SorokinComputation Systems Biology GroupUniversity of Edinburgh Anatoly Sorokin
Standard • 2000-2010 is decade of standards in biology • 31 MIBI standard • 56 OBO ontologies • About 80 exchange formats • Scope of interest • Language • Controlled vocabulary
Standards and Languages • CML – description of chemical structure • MathML – representation of mathematical formulas • PSI – standard description of protein interaction data • AnatML – language to describe interaction at organ level • GeneOntology – standard and ontology to describe gene function and regulation
Standards for Computational System Biology • BioPAX – language for database of biological networks exchange • SBML – language of biochemical model exchange • CellML – language to describe mathematical models • SBGN – visual language for biological model description
MI standards • Reporting guidelines specify the minimum amount of meta data (information) and data required to meet a specific aim • Aim is to provide enough meta data and data to enable the unambiguous reproduction and interpretation of an experiment. • Normally informal human readable specifications that inform the development of formal data models (e.g. XML or UML), data exchange formats Anatoly Sorokin
Exchange format • Strict structure to exchange data of model • Mainly XML • Well defined meta-model, often supported by software API Anatoly Sorokin
Ontologies • “ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences” Wikipedia • Often used as controlled vocabulary and description support framework • GeneOntology Anatoly Sorokin
BioPAX • “Biological PAthway eXchange -A data exchange ontology and format for biological pathway integration, aggregation and inference”
BioPAX Goals • BioPAX = Biological PAthway eXchange • Data exchange format for pathway data • Include support for these pathway types: • Metabolic pathways • Signaling pathways • Protein-protein, molecular interactions • Gene regulatory pathways • Genetic interactions • Accommodate representations used in existing databases such as BioCyc, BIND, WIT, aMAZE, KEGG, Reactome, etc. • PathwayCommons – collection of pathways in BioPAX • http://www.pathwaycommons.org
BioPAX • BioPAX ontology and format in OWL (XML) • Ontology built using GKB Editor and Protégé • Semantic mapping still an issue • Level 1 represents metabolic pathway data • Level 2 adds support for molecular interactions, post-translational modifications, experimental description from PSI-MI model (Backwards compatible) • Level 3 adds support for generics, protein states, rearrange reaction representation
Subclass (is a) Contains (has a) Pathway Entity Interaction Physical Entity BioPAX Ontology: Top Level • Pathway • A set of interactions • E.g. Glycolysis, MAPK, Apoptosis • Interaction • A set of entities and some relationship between them • E.g. Reaction, Molecular Association, Catalysis • Physical Entity • A building block of simple interactions • E.g. Small molecule, Protein, DNA, RNA
BioPAX Ontology: Interactions Interaction Physical Interaction Control Conversion ComplexAssembly Catalysis Modulation BiochemicalReaction Transport TransportWithBiochemicalReaction
BioPAX Ontology: Physical Entities PhysicalEntity Protein Small Molecule Complex RNA DNA
Molecular Interactions Pro:Pro All:All Metabolic Pathways Low Detail High Detail Interaction Networks Molecular Non-molecular Pro:Pro TF:Gene Genetic Regulatory Pathways Low Detail High Detail Small Molecules Low Detail High Detail BioPAX and other standards Database Exchange Formats Simulation Model Exchange Formats BioPAX SBML, CellML Genetic Interactions PSI-MI 2 Rate Formulas Biochemical Reactions
Simulation-related standards Result Model Simulation MinimalRequirements ? implements implements Exchange format SED-ML SBRML Makes sense of Makes sense of Ontology Anatoly Sorokin
SBML • “The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable to metabolic networks, cell-signaling pathways, regulatory networks, and many others. ”
SBML • Reaction • container for rate law • Species • reactants, products, or modifiers of reaction • Compartment • container for species • Parameter, Rule, Event
Characteristics of SBML • Many top-level types, little nesting • Units, Compartment, Species, Parameter, Reaction, Rule, Function, Event • Non-modular structure • Next SBML ‘Level’ (3) will introduce modularity • Emphasis on reactions • Some math implicit • Explicit rate equations; implicit integration • Implicit concentration conversion between compartments • Compartments are physical containers for species • Spatial dimensions (volume, surface)
Structure of SBML • Note field of SBase intended to store information for human to read • Annotation field of SBase provide a container for software-generated annotations that are not intended to be seen by humans • The id field is usually required for most structures and is used to identify a component within the model definition. • The name field is optional and provide a human-readable label for the component.
Result Model Simulation MinimalRequirements ? implements implements Data model SED-ML SBRML Makes sense of Makes sense of Ontology Anatoly Sorokin
MIRIAM • Model description require extra information • Biological • Description of elements of model • Mathematical • Definition of math concepts • Referential • Author name • Paper reference etc. • http://www.ebi.ac.uk/compneur-srv/miriam/ Anatoly Sorokin
Reference correspondence • The model must be encoded in a public, standardized, machine-readable format (SBML, CellML, GENESIS ...) • The model must comply with the standard in which it is encoded! • The model must be clearly related to a single reference description. If a model is composed from different parts, there should still be a description of the derived/combined model. • The encoded model structure must reflect the biological processes listed in the reference description. • The model must be instantiated in a simulation: All quantitative attributes have to be defined, including initial conditions. • When instantiated, the model must be able to reproduce all results given in the reference description within an epsilon (algorithms, round-up errors) Anatoly Sorokin
Attribution annotation • The model has to be named. • A citation of the reference description must be joined (completecitation, unique identifier, unambigous URL). The citation should permit to identify the authors of the model. • The name and contact of model creators must be joined. • The date and time of creation and last modification should be specified. An history is useful but not required. • The model should be linked to a precise statement about the terms of distribution. MIRIAM does not require “freedom of use” or “no cost”. Anatoly Sorokin
External resource annotation • The annotation must permit to unambiguously relate a piece of knowledge to a model constituent. • The referenced information should be described using a triplet {data-type, identifier, qualifier} • The data-type should be written as a Unique Resource Identifier (URI) • The identifier is analysed within the framework of the data-type. • Data-type and Identifier can be combined in a single URI http://www.myResource.org/#myIdentifier urn:lsid:myResource.org:myIdentifier • Qualifiers (optional) should refine the link between the model constitutent and the piece of knowledge: “has a”, “is version of”, “is homolog to” etc. Anatoly Sorokin
Result Model Simulation MinimalRequirements ? implements implements Data model SED-ML SBRML Makes sense of Makes sense of Ontology Anatoly Sorokin
SBO • Part of OBO Foundry • Assign meanings to mathematical elements of SBML • Allows automatic validation of semantic consistency of math part of model • http://www.ebi.ac.uk/sbo Anatoly Sorokin
SBO • Types and roles of reaction participants, including terms like “substrate”, “catalyst” etc., but also “macromolecule”, or “channel”. • Parameter used in quantitative models. This vocabulary includes terms like “Michaelis constant” , “forward unimolecular rate constant”etc. A term may contain a precise mathematical expression stored as a MathML lambda function. The variables refer to other parameters. • Mathematical expressions. Examples of terms are “mass action kinetics”, “Henri-Michaelis-Menten equation” etc. A term may contain a precise mathematical expression stored as a MathML lambda function. The variables refer to the other vocabularies. • Modelling framework to precise how to interpret the rate-law. E.g. “continuous modelling”, “discrete modelling” etc. • Event type, such as “catalysis” or “addition of a chemical group”. Anatoly Sorokin
SBO Anatoly Sorokin
Result Model Simulation MinimalRequirements ? implements implements Data model SED-ML SBRML Makes sense of Makes sense of Ontology Anatoly Sorokin
MIASE • Minimum Information About a Simulation Experiment • What base model to use & which modifications to apply • What simulation task to run on those models (algorithms, see KiSAO; simulation parameters) • How to post-process the numerical results and to present them • http://www.ebi.ac.uk/compneur-srv/miase/ • Subset of MISE bould be encoded in SED-ML Anatoly Sorokin
Description of models Anatoly Sorokin
Description of models Anatoly Sorokin
Simulations Anatoly Sorokin
Simulation task Anatoly Sorokin
Data generation Anatoly Sorokin
Data generation Anatoly Sorokin
Production of results Anatoly Sorokin
Result Model Simulation MinimalRequirements ? implements implements Data model SED-ML SBRML Makes sense of Makes sense of Ontology Anatoly Sorokin
KiSAO • Kinetic Simulation Algorithm Ontology • Classification of simulation algorithms & methods • Definition, literature references • Relations between different simulation algorithms & methods • http://www.ebi.ac.uk/compneur-srv/kisao/index.html Anatoly Sorokin
KiSAO http://bioportal.bioontology.org/visualize/40844 Anatoly Sorokin
Result Model Simulation MinimalRequirements ? implements implements Data model SED-ML SBRML Makes sense of Makes sense of Ontology Anatoly Sorokin
SBRML • Systems Biology Results Markup Language • A new markup language for specifying the results from operations on SBML models • http://www.comp-sys-bio.org/tiki-index.php?page=SBRML Anatoly Sorokin
SBRML Anatoly Sorokin
SBRML Anatoly Sorokin
Dimension example Anatoly Sorokin