610 likes | 728 Vues
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium. Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany. 2 nd Evaluation Conference, 19-20 May 2009, Vienna, Austria.
E N D
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany 2nd Evaluation Conference, 19-20 May 2009, Vienna, Austria
Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB
SysMO-DB Team EML Research gGmbH, Germany Sergejs Aleksejevs Wolfgang Müller Carole Goble Isabel Rojas Olga Krebs Katy Wolstencroft University of Manchester, UK Stuart Owen Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK
Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Own solutions Suspicion Data issues Resource Issues Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. • Many do not have data, or follow the standards that exist or know who is doing what. • Much of the data cannot be compared • Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping
Principles… • A series of small victories • Realistic • Don‘t reinvent • Sustainable and extensible • Migrate to standards • Provide instant gratification • Address doubt and anxiety • Build it rather than write about it.
Another view on the goal Public • Specialist public databases • BRENDA, PDB, BioModels, WikiPathways, KEGG, UniProt, GenBank, SGD, PubMed Reference Data Sets Community Supported Data Sets SysMO Project • File Management systems • Plone, Alfresco, PHProjekt, eGroupWare, Wikis Specialist databases that you make your own: BASE, maxD, myExperiment Personal Specialist public databases you have a bit of: SABIO, JWS Online, myExperiment Pile of spread sheets on my hard drive
Some numbers& Some consequences • 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist • 11 projects, 91 institutes • 20 person days/year/project • 2.5 person days/year/institute • “just in case“ approach impossible • Focus on real needs • “just in time“, “just enough“ • The right 20% • Help people help themselves • Communication! 80-20-rule: 80% of the featureswon‘t be used anyway Useful features
Social Approach • Questionnaires • PALS • 19 Postdocs and PhD students • All three kinds of people • Our design and technical collaboration team • Very intense face to face and virtual collaboration • UK and Continental PALS Chapters • Audits and Sharing • Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
Communication via PALs Show what is thereSuggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements Collect answers DB team PALS Projects
SysMO-DB PALs Meeting statistics 10 months • 2 PAL all hands meetings • 2 PAL chapter meetings • 9 visits to 6 SysMO projects • Numerous Skype chats, mails, telcons Impact on development? See later in talk
“I need a kind of “yellow pages” that tells me who is in what project and what they are working on” “Excel spreadsheets are our most common way of collecting and processing data” “We need a way of collecting structuring and collecting and sharing Standard Operating Procedures”
Experimentalists Modellers Exchange Bioinformaticians Exchange Exchange Exchange
SysMO Approach JERM SBML Workflow Management System Yellow Pages SysMO-SEEK web portal interface Assets Catalogue Search SysMO DB Workflow Repository Nature Protocols SOP Repository Consortium Data Public data Models Processes Sops and Workflows SBML Models Repository JWS Online Spreadsheet Repository
Discovery SysMO-SEEK • Single, web based, access point • Access control & Versioning management • Yellow pages (“who is who”) • People, Expertise, Equipment • Assets catalogue (“who has what”) • SOPs, Spreadsheets, pre-published models • Metadata about Data held by projects • Access to other repositories • Models (JWS Online), • Workflows (myExperiment), • Public web services (BioCatalogue) • Call out to external resources • e.g. PubMed Does not hold results. Holds metadata on results and links to results A component for SysMO groups to incorporate in their own environments and applications
Data Comparison and Exchange Microarray Metadata Metabolomics Proteomics Proteomics Single Cell Data • Public data sources • model organism databases – (e.g. SGD) • BRENDA …. • Data produced by SysMO • SABIO-RK, iChiP, MeMo …. • Local databases & Files • Excel Spreadsheets • The most common form of experimental data format.
SysMO LAB Spreadsheet Our Extra Work!!
Challenge Aim: Maintain the independence of the projects • Data registered in the SEEK Assets Catalogue • Data remains at the host project site • Data pulled from host project site on request 1. Need to map to a common metadata model for each data type (microarray, metabolomic…) so data can be found, understood and compared. Just Enough Results Models (JERM) 2. Need to create software that interfaces with the different existing project data management setups (Alfresco, eGroupWare, MediaWiki, BASE, Excel…) JERM Adapters and Extractors
JERM: Just Enough Results Model Microarray Metadata Metabolomics Proteomics Proteomics Single Cell Data • Way to “wrap“ data sources to match our agreed common data model for each data type • Minimum information needed to exchange data of each type • Databases • Content management Systems • Excel Spreadsheets • Data File Store Extract Export JERM Import
What is Metadata? • Information, additional to the raw/processed data itself. • What a potential user of the data would need to know to be able to make full and accurate use of the data in a subsequent scientific analysis. • Machine readable descriptions of Data, Models, Services, Resources, Applications [COSMIC]
Minimum Information Initiatives CIMRCore Information for Metabolomics Reporting MIABEMinimal Information About a Bioactive Entity MIACAMinimal Information About a Cellular Assay MIAMEMinimum Information About a Microarray Experiment MIAME/EnvMIAME / Environmental transcriptomic experiment MIAME/NutrMIAME / Nutrigenomics MIAME/PlantMIAME / Plant transcriptomics MIAME/ToxMIAME / Toxicogenomics MIAPAMinimum Information About a Phylogenetic Analysis MIAPARMinimum Information About a Protein Affinity Reagent MIAPEMinimum Information About a Proteomics Experiment MIAREMinimum Information About a RNAi Experiment MIASEMinimum Information About a Simulation Experiment MIENSMinimum Information about an ENvironmental Sequence MIFlowCytMinimum Information for a FlowCytometry Experiment MIGenMinimum Information about a Genotyping Experiment MIGSMinimum Information about a Genome Sequence MIMIxMinimum Information about a Molecular Interaction Experiment MIMPPMinimal Information for Mouse Phenotyping Procedures MINIMinimum Information about a Neuroscience Investigation MINIMESSMinimal Metagenome Sequence Analysis Standard MINSEQEMinimum Information about a high-throughput SeQuencing Experiment MIPFEMinimal Information for Protein Functional Evaluation MIQASMinimal Information for QTLs and Association Studies MIqPCRMinimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAMMinimal Information Required In the Annotation of biochemical Models MISFISHIEMinimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDAStandards for Reporting Enzymology Data TBCTox Biology Checklist BioPAX : Biological Pathways Exchangehttp://www.biopax.org/ FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions MIBBI: Minimum Information for Biological and Biomedical Investigations http://www.mibbi.org/index.php/MIBBI_portal
Just Enough Results Model JERM • Inspired by MCISB Key Results initiative and SBRML [Paton et al] • Harvested standards • Analysed current practice and consortium schemas and spreadsheets • Designing the corresponding JERMs • Mapping data sources of the projects to JERMs.
Where is it used? JERM Experimental Data Metadata Assay People Investigation Homogenised terminology and values in the datasets themselves Projects Study Experimental conditions Models Factors studied SOPs Workflows ISA-TAB compliant
Just Enough Results Model Access Control JERM Web Service Access Interface JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor BRENDA SABIO-RK Metadata myDB mySpread Sheet • Minimum metadata for SysMO exchange • What an experiment is. • Find • Extract metadata from datasets for the Assets catalogue • Access • Expose data results through a JERM interface • Access controlled by consortiums, groups and individuals
JERM SysMOLab Wiki COSMIC Alfresco MOSES Wiki BaCell-SysMO Alfresco ANOTHER A DATA STORE
In Practice for Spreadsheets JERM Native + JERM Template JERMed + +
Now + + browse search Register Extract Matched to the JERM Adding metadata Whole record
Near future JERM + + + browse search Register Extract Matched to the JERM Adding metadata here Whole record Filtered record Enriched record
Future Collections of Records + + + browse search Register Extract Matched to the JERM Adding metadata here Meta-analysis
Just Enough Results Model Tools Access Control JERM Web Service Access Interface JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor BRENDA SABIO-RK Metadata myDB mySpread Sheet • JERM Source Extractor Generator • New spreadsheets adopt JERM template • Legacy spreadsheet JERM mapper. • Databases have JERM mapper • Spreadsheet Ontology Annotator • Restrict the values that a range of fields can have.
Model • JWS Online - database of curated models and a model simulator. • ToBiN – platform for storage and analysis of genome scale metabolic networks (PSYSMO) • Biomodels - database of curated models (EMBL-EBI) • Copasi – Complex Pathway Simulator (Mendes et al) • Pre-publication SEEK store • Semantic SBML (TRANSLUCENT); SBRML (MCISB) More After the Demo!
Experimental Processes • Protocols and SOPs • SOPs assets deposited or linked to • SOP gathering • Nature Protocols format recommendation • High level classification for indexing and tagging • Got a few, need more. Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References
Experimental Processes http://openwetware.org • Protocols and SOPs • SOPs assets deposited or linked to • SOP gathering • Nature Protocols format recommendation • High level classification for indexing and tagging • Got a few, need more. http://www.molmeth.org
Bioinformatics Processes: Workflows • Data preparation, annotation and analysis pipelines • SBML model construction and population • Linking together Data sets, Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) Workflow Management System Free and Open Source
Building models using workflows Data integration: workflows for model parameterisation and validation. Manipulation of SBML models in workflows LibSBML: data integration & constructing and annotating SBML models [Li et al]
Ramp up when more data resources become workflow accessible • Libraries of SysMO workflows • Spreadsheet Smart.
http://myexperiment.org • Microarray Analysis • SBML Model manipulation • Pathway Analysis • Chemical structure analysis • Protein structure analysis • Kinetic data • Excel Spreadsheet handling • Controlled vocabulary look-ups
Now…Demo!!!!!!Everyone contributedBut obviously we only have time for a few examples
ModelsJWS Online model interface http://jjj.mib.ac.uk http://jjj.bio.vu.nl http://jjj.biochem.sun.ac.za • Sysmo models interface at JWS Online • SBML upload and webservices • JWS update, new interface (to be released soon), SBGN schema’s
JWS Online SysMO home ~/sysmo
JWS Online interface MOSES model link to localhost /sysmo