1 / 31

Annotating SABIO-RK: Integration of MIRIAM and SBO

Annotating SABIO-RK: Integration of MIRIAM and SBO. Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg. 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester, UK. Why we have developed SABIO-RK ?.

olive
Télécharger la présentation

Annotating SABIO-RK: Integration of MIRIAM and SBO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotating SABIO-RK:Integration of MIRIAM and SBO Martin Golebiewski Scientific Databases and Visualization Group EML Research, Heidelberg 2nd BioModels.net Training Camp 13-15 th of January 2007, Manchester, UK

  2. Why we have developed SABIO-RK ? • Biochemical model simulations need experimental reaction kinetics data • Kinetic parameter values highly depend on environmental conditions(temperature, pH, concentrations of reactants and modifiers, etc.) • Enzyme characteristics vary between organisms, tissues and cellular locations • Kinetic parameters are only interpretable with their corresponding kinetic laws • Most databases do not link experimental kinetic data for single reactions to complete sets of information comprising all the information mentioned above • Data must be easily accessible and interchangeable (data export for exchange) We aimed at creating a database that collects and standardizes kinetic data,relates the data to its biochemical, environmental and experimental context,cross-links corresponding data and associates it with external resourcesto make the data comparable and accessible in standard formats

  3. Database population and access SABIO-RK • Merges information about biochemical reactions and pathways mainly collected from other databases (e.g. KEGG) with corresponding kinetic data manually extracted from literature (including the environmental context) • Is curated manually, assisted by semi-automatic tools (e.g. lists of values) • Unifies, systematically structures and interrelates the data • Can be accessed through a web-based user interface and through web-services • Supports export of the data in SBML for exchange • Links entities and expressions to complementary databases and ontologies

  4. Database population: data extraction • Data source: • Kinetic data contained in publications • Text with non-local, highly scattered information • Tables, Formula, Graphs, Pictures • Some information is only noted as reference • Problems: • No 1:1 relation between the paper and the input mask! • No controlled vocabulary (e.g. different names of one compound or enzyme) fuzziness of descriptions Full-text publication SABIO-RK input interface

  5. Problems in the database population • Missing or only partial information in the data source: • Incomplete reactions (products not mentioned) • Assay conditions missing or reference to another paper • Kinetic law equation (or fitting equation) not described • Multiplicity of kinetic law types: • no real standard used in publications (or even available, except SBO) •  varying notations referring to several kinetic theories • Parameter units: • Multiple definitions (e.g. Katal or Unit for enzyme activities) • Different compositions (e.g. µmol/s or µmol/(s*mg) for Vmax) • Wrong parameter unit (e.g. 1/s for Vmax) • Identification of compounds, reactions and enzymes: • - Ambiguous descriptions of chemical compounds or enzymes (e.g. missing stereochemical information for stereoisomers, simplifying trivial names, ...)

  6. Data integration problems e.g. Parameter units: =nmol/(min*mg) 1 U = the amount of enzyme which catalyses the transformation of 1 µmol of the substrate per minute under standard conditions =U/mg

  7. Annotations and controlled vocabularies • Infosource • PubMed ID • title • authors • journal • Environment • buffer • pH • temperature SBML Unit Annotations to external resources Controlled vocabulary defined as Unit determined under from a parameter units • Kinetic Law • typeSBO • equation • General Information • organismNCBI-ID • tissue • pathway • comments belongs to • Kinetic Parameter • name • type (e.g. Km, kcat) SBO • value (range) • standard deviation • comment • SBO-ID for a reported for • Reaction • stoechiometry • EC classification • enzyme variant • Protein complex • UniProt IDs catalyzes corresponding species participate in • Compound • recommended name • synonymic names • IDs in external databases • (e.g. KEGG, ChEBI) • additional information • Reactant, Modifier (Species) • compound name (given in publication) • role (e.g. substrate, inhibitor) SBO • cellular locationGene Ontology • comments (modifications etc.) refers to

  8. Annotations of entities in SABIO-RK • Annotations shown to the user: • Chemical compounds to KEGG compound and ChEBI • Enzymatic activities to Expasy, KEGG, IntEnz, IUBMB and Reactome (query links in the user interface based on the enzyme classification EC) • Enzyme protein complexes to UniProt/Swiss-Prot • Cellular locations (compartments etc.) to Gene Ontology (as query link) • Publications (data sources) to PubMed • Annotations integrated in SABIO-RK, not yet implemented for the output: • Organisms to NCBI taxonomy • Kinetic law types and parameter types to SBO (Systems Biology Ontology) • Species role (substrate, product, modifier, etc.) to SBO • Reactions to KEGG reactions • More annotations following the MIRIAM standard are planed ...

  9. Controlled vocabularies in SABIO-RK • To unambiguously identify entities or terms • Facilitate the search, interpretation and comparison of the data • Permits a matching with other database resources based on shared vocabulary • Facilitate the integration of different database entries into kinetic models • Lists of values (LOV) in the input interface: • Species (compounds) and species roles (e.g. substrate, product, modifier …) • Biochemical reactions and pathways • Organisms (NCBI taxonomy), tissues and cellular locations • Kinetic law types (e.g. ‚Competitive inhibition‘ or ‚Sequential ordered Bi Bi‘) • Parameter types (e.g. Km‚ kcat, Vmax, Ki, Kd, rate constant, pH, pK ...) • Parameter units (e.g. mM, µM, 1/s, nmol/min, U/(h*mg) ...) • Corresponding species for kinetic parameters (like for Km, Ki or concentrations)

  10. Other notation standards in SABIO-RK • Semi-controlled notation standards: • Kinetic law equation (analyzed for mathematical correctness when entered) • Enzyme variants (e.g. wildtype, mutant E540K, wildtype isoenzyme PFKL ...) • Protein complex of the enzyme: e.g. (Q6UG02)*4 for a hometetramer • Recombinant enzymes: e.g. ‚expressed in Escherichia coli BL21(DE3)’ • Buffer composition in the experimental setup

  11. Controlled vocabularies in SABIO-RK List of values (LOV) SABIO-RK input interface

  12. Identifying chemical compounds Every chemical compound can have multiple synonymic descriptions e.g.: Trivial name and systematic chemical description Valproic acid = 2-Propylpentanoic acid Different parts of the molecule could be considered as lead structure Acetylphenol=Phenylacetate Abberrant order of the substituents of a lead structure (prefixes) 2-Amino-6-methyl-4-pyrimidol=6-Methyl-2-amino-4-pyrimidol Description of substituents as prefix (like amino-) or suffix (like –amine) 1-(4-Iodo-2,5-dimethoxyphenyl)-2-aminopropane=1-(4-iodo-2,5-dimethoxy-phenyl)propan-2-amine 3,17-Dioxoandrost-4-ene=4-Androstene-3,17-dione Different nomenclature systems (e.g. abberrant order of the morphems) 2-Amino-6-methyl-4-pyrimidol=2-Amino-6-methylpyrimidin-4-ol 2-Methylpropan-2-ol=2-Hydroxy-2-methyl-propane

  13. Normalization of compound names • Goals: • Comparing and linking databases with names of chemical compounds, i.e. synonym detection disregarding orthographic and (minor) morpho- syntactic variance in naming • Matching chemical compound names against existing synonym lists (e.g. ChEBI, PubChem) to identify synonyms with differences in naming not arising from orthographic variations, like trivial names and systematic names.

  14. Normalization of compound names CompoundID: 10296 IUPAC Name: 2-phenylpropanoic acid Canonical SMILES: CC(C1=CC=CC=C1)C(=O)O Synonyms Hydratropic acid 2-Phenylpropionic acid 2-Phenylpropanoic acid alpha-Phenylpropioic acid alpha-Methylphenylacetic acid .alpha.-Phenylpropionic acid alpha-Methylbenzeneacetic acid Benzeneacetic acid, .alpha.-methyl- .alpha.-Methylphenylacetic acid .alpha.-Methylbenzeneacetic acid ALPHA-PHENYLPROPIONIC ACID Benzeneacetic acid, alpha-methyl- (S)-alpha-Methylbenzeneacetic acid Benzeneacetic acid, .alpha.-methyl-, (S)- Benzeneacetic acid, .alpha.-methyl-, (R)- Benzeneacetic acid, alpha-methyl-, (R)- Benzeneacetic acid, alpha-methyl-, (S)- Normalized Name: alpha-phenylpropionate

  15. Linguistic assisted compound analysis Systematic compound name Structure Classification

  16. Access to SABIO-RK • Available interfaces: • Web-based user interfacefor browsing and searching the data manually • Web Services (API access)can be automatically called by external tools, e.g. by other databases or simulation programsfor biochemical network models Both interfaces support the export of the data in SBML

  17. SABIO-RK user interface: Query

  18. SABIO-RK user interface: Query result

  19. SABIO-RK user interface: Reaction

  20. SABIO-RK user interface: Enzyme

  21. SABIO-RK user interface: database entry with kinetic data

  22. SBML export from SABIO-RK

  23. SBML export from SABIO-RK • Reactions are coupledin exported SBML files • every species is onlydefined once in theexported SBML file ifseveral reactions referto the same species • Export of layout • information in SBML • using the SBML layout extension • - to draw reaction maps

  24. SABIO-RK • API access • Integration in simulation tools • Cross-linking with other databases • Several possible entry points • Supports data export in SBML Web servicemethods

  25. Data in SABIO-RK: statistics PubMed records: 923 Organisms 312 Pathways 90 Reactions: 9600 Enzymes 416 Measured parameters: enzyme activities (rate constant, kcat or Vmax ) 8118 Km (Michaelis constant) 8701 Ki (inhibiton constant) 1774 as of 09/01/2007

  26. Data in SABIO-RK: statistics

  27. Conclusions • SABIO-RK is a web-accessible database containing biochemical reactionkinetics data for systems biologists and experimenters • Merges general reaction information retrieved from external databases with kinetic data manually extracted from literature • Manual curation of the data with some semi-automatic support • High degree of interrelation within the database • Type of kinetics, modes of inhibition or activation and corresponding equations are shown with their parameters, measured values and experimental conditions • Access through a web-based user interface or through web services (API) • Export of the data in SBML from both interfaces • Controlled vocabulary used and content annotated to ontologies and external resources

  28. Future goals • Information about detailed reaction mechanisms (elementary reaction steps) • Expansion of the data export functions (more data, more annotations) • Tools for information extraction and data integration • Expand the usage of annotations and controlled vocabularies • Extension of the database model to store signaling reactions • Convince scientists to directly insert their kinetic data into SABIO-RK

  29. SABIO-RK project team and many more: students, colleagues at EML Research and other collaborators…. Financial support:

  30. Workshop Invitation Workshop Storage and Annotation of Reaction Kinetics’ Data May 21-23, 2007 Heidelberg, Germany http://projects.eml.org/sdbv/projects/events/workshop2007/index_html Topics: - Data generation -Data storage and integration -Data annotation -Data usage

  31. http://sabio.villa-bosch.de/SABIORK

More Related