1 / 15

Describing Bioinformatic Metadata at EBI

Describing Bioinformatic Metadata at EBI. James Malone malone@ebi.ac.uk. Cross-Domain Data available from EBI. Literature and ontologies. Genomes. Protein sequence. DNA & RNA sequence. Protein structure. Gene expression. Chemical entities. Protein families, motifs and domains.

abaxter
Télécharger la présentation

Describing Bioinformatic Metadata at EBI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Describing Bioinformatic Metadata at EBI James Malone malone@ebi.ac.uk

  2. Cross-Domain Data available from EBI Literature and ontologies Genomes Protein sequence DNA & RNA sequence Protein structure Gene expression Chemical entities Protein families, motifs and domains Protein interactions Pathways Systems 2 Master headline

  3. The Sorts of Data we Serve • We manage databases of biological data such as nucleic acid, protein sequences and macromolecular structures • ENA: nucleotide sequencing information • UniProt: protein sequence and functional information • ArrayExpress: functional genomics data repository • Ensembl: genome info for vertebrates and other eukaryotes • InterPro: database of predictive protein "signatures" • PDBe: data resource on biological macromolecular structures Master headline

  4. Sorts of Metadata we need • Low complexity – high volume (genome sequencing) • High complexity – low volume (mouse phenotyping) • 1000 genomes in order of magnitude physics data • Provenance models • Experimental variables • Publication details • Synonym and domain specific language • Cross-domain mappings • Metadata has existed and been captured for a while, e.g. InterPro IDs Master headline

  5. Master headline

  6. Metadata: Minimum Information Standards • Minimum Information Standards specify minimum amount of meta data (and data) required to meet a specific aim (usually reporting data or submitting to public repository) • MIAMI: Minimum Information About a Microarray Experiment • MIARE: Minimum Information About an RNAi Experiment • MIAPE: Minimum Information About a Proteomic Experiment • MIFlowCyt: Minimum Information about a Flow Cytometry Experiment • ISA: cross domain experiment reporting • Some public repositories require some conformation, e.g. ArrayExpress – MIAME scoring Master headline

  7. Ontologies • As a method of representing knowledge in which concepts are described both by their meaning and their relationship to each other. • Increasingly important component to formalise metadata • Thriving bio-ontology community • e.g. Gene Ontology ‘project to standarise the representation of gene and gene product attributes • e.g. ChEBI ‘ontology of molecular entities focused on small chemical compounds’ • e.g. Ontology of Biomedical Investigations ‘ontology to describe experimental protocols from inception to analysis’ Master headline

  8. Metadata that is Interoperable Chemical Entities of Biological Interest (ChEBI) Relation Ontology Cell Type Ontology Various Species Anatomy Ontologies Anatomy Reference Ontology Disease Ontology Goal: community is interoperable set reference ontologies Consumed by application ontologies for specific needs E.g. Experimental Factor Ontology @ www.ebi.ac.uk/efo

  9. Applying Ontologies in Data Curation @ www.ebi.ac.uk/gxa Query for Cell adhesion genes in all ‘organism parts’ ‘View on EFO’ Master headline Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  10. Strategies for Integrating Multi-Domain Data • Consuming reference ontologies and mapping to multiple ontologies where overlap exists offers us maximum interoperability QUERY Rdf triple Atlas Rdf triple Rdf triple Rdf triple Amino Acid Ontology Rdf triple Rdf triple SwissProt Master headline

  11. ELIXIR Report • Data Integration & Interoperability Recommendations – Jul 2009 • ELIXIR should build a distributed data infrastructure based on a Service Oriented Architecture using WS technology • Ontologies needed in areas of disease, anatomy and taxon • Annotation systems for associating data to metadata • Pan‑domain coordination and funding for reporting standards Master headline

  12. Experiments Assays Current Challenges • Literature – data gap • Curation relatively slow, more advanced tooling required • Ontologies not interoperable yet and more needed • Bio-ontology funding • New high-throughput methods Master headline

  13. Challenges: ScalingWorld-wide sequencing data production is now just an order of magnitude behind CERN Large Hadron Collider produces 15 petabytes per year from single point source LHC grid is 140 computer centres - 33 countries centered at CERN (Tier 0) Sequencing is producing data in hundreds of centers in dozens of countries with Tier 0 sites (EBI & NCBI) More than 150 Terabytes of 1000genomes data in the Short Read Archive and this represents more than half of all the data in the archive Slide: Laura Clarke, EBI Master headline

  14. Summary • EBI uses combination of metadata strategies • Minimal Information useful for reporting standards • Ontologies provide powerful method describing domain knowledge • Ontologies also allow community consensus to be built as well as strategies for data integration • ELIXIR suggests : • Infrastructures should be WS compatible • Annotation tools required • Pan-domain coordination is essential Master headline

  15. Acknowledgements • Ontology creation: • James Malone, Tomasz Adamusiak, Ele Holloway, Helen Parkinson, Jie Zheng (U Penn) • Atlas GUI Development • Misha Kapushesky, Pasha Kurnosov, Anna Zhukova. Nikolay Kolesinkov • External Review and anatomy: • Jonathan Bard, Jie Zheng • ArrayExpress Production Staff • EBI Rebholz Group (Whatizit text mining tool) • Many source ontologies for terms and definitions esp. Disease Ontology, Cell Type Ontology, FMA, NCIT, OBI • Funders: EC (Gen2Phen,FELICS, MUGEN, EMERALD, ENGAGE, SLING), EMBL, NIH • Eric Neumann, Joanne Luciano and Alan RuttenbergHCLS Group - Eric Prud'hommeaux and Scott Marshall Developing an Ontology from the Application Up malone@ebi.ac.uk

More Related