1 / 55

dbgg – database for genetical genomics update

dbgg – database for genetical genomics update. Morris Swertz ( m.a.swertz@rug.nl ) Braunschweig CASIMIR meeeting July 2, 2008. Objective. Share genotype/phenotype data and tools:. 10. 10.000. Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information

arva
Télécharger la présentation

dbgg – database for genetical genomics update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. dbgg – database for genetical genomics update Morris Swertz (m.a.swertz@rug.nl) Braunschweig CASIMIR meeeting July 2, 2008

  2. Objective • Share genotype/phenotype data and tools:

  3. 10 10.000 Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information Associated data files material 10.000 process strains genome 10,000 markers inbreed 100 1,000,000 10,000 individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 hybridize expressions preprocess norm exprs. network 100 100,000 Complicated experiments microarrays probes

  4. 10 10.000 Collaborator 1 10.000 strains genome Incompatible data! markers inbreed 100 1,000,000 10,000 Collaborator 3 Incomplete data! individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 Collaborator 2 hybridize expressions preprocess norm exprs. network 100 100,000 Barriers to sharing data microarrays probes

  5. 10 10.000 Investigation 1 10.000 Incomplete and/or incompatible data! strains genome markers inbreed 100 1,000,000 10,000 individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 Investigation 3 10 10.000 hybridize expressions preprocess norm exprs. network 10.000 strains genome 100 100,000 markers microarrays probes inbreed 100 1,000,000 10,000 Investigation 2 10 10.000 individuals genotype genotypes map QTL profiles correlate 10.000 strains genome 100,000 10,000,00 markers hybridize expressions preprocess norm exprs. network inbreed 100 100,000 100 1,000,000 10,000 microarrays probes individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 hybridize expressions preprocess norm exprs. network 100 100,000 microarrays probes Barriers to sharing data

  6. 10 10.000 10.000 strains genome markers inbreed 100 1,000,000 10,000 individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 10 10.000 hybridize expressions preprocess norm exprs. network 10.000 strains genome 100 100,000 markers microarrays probes inbreed 100 1,000,000 10,000 10 10.000 individuals genotype genotypes map QTL profiles correlate 10.000 strains genome 100,000 10,000,00 markers hybridize expressions preprocess norm exprs. network inbreed 100 100,000 100 1,000,000 10,000 microarrays probes individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 hybridize expressions preprocess norm exprs. network 100 100,000 microarrays probes Barriers to sharing software tools

  7. 10 10.000 10.000 strains genome markers inbreed 100 1,000,000 10,000 individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 10 10.000 hybridize expressions preprocess norm exprs. network 10.000 strains genome 100 100,000 markers microarrays probes inbreed 100 1,000,000 10,000 10 10.000 individuals genotype genotypes map QTL profiles correlate 10.000 strains genome 100,000 10,000,00 markers hybridize expressions preprocess norm exprs. network inbreed 100 100,000 100 1,000,000 10,000 microarrays probes individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 hybridize expressions preprocess norm exprs. network 100 100,000 microarrays probes Barriers to sharing software tools

  8. Hard to find and reuse tools 10,000 QTL profiles 10,000 QTL profiles 10,000 QTL profiles Barriers to sharing software tools

  9. 10 Use a standard tool? 10.000 10.000 strains genome markers inbreed 100 1,000,000 10,000 individuals genotype genotypes map QTL profiles correlate 100,000 10,000,00 hybridize expressions preprocess norm exprs. network 100 100,000 microarrays probes

  10. 10 Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information Associated data files material 100.000 process strains genome 10,000 Yes, if it could be easily adapted! (and they can’t) SNP arrays inbreed 100 10,000,000 10,000 individuals genotype genotypes map QTL profiles correlate 1000 1000 LC/MS mass peaks preprocess aligned peaks network More biotechnologies, more protocols

  11. Objectives • Share genotype/phenotype data and tools: • Interoperable software • Simple flat file exchange format • Database server • R/web-service interfaces • A procedure to extend the software • Build on extensible data model • Data • Annotations • Investigations • Integration references • Next steps

  12. The software • Share genotype/phenotype data and tools: • Interoperable software • Simple flat file exchange format • Database server • R/web-service interfaces • A procedure to extend the software • Build on extensible data model • Data • Annotations • Investigations • Integration references • Next steps

  13. Software: flat file exchange format • Raw and processed data in matrix form E.g. microarray data. Rows = individuals, cols = affy probes.

  14. Software: flat file exchange format • Annotation info in tabular form E.g. probe annotation data. Rows = probes cols = attributes of each probe.

  15. Software: exchange an experiment Described on http://gbic.biol.rug.nl/dbgg annotations Raw and processed data dbGG Import tool dbGG Export tool dbGG database

  16. Software Software: web user interface http://gbicserver1.biol.rug.nl:8080/dbgg/molgenis.do

  17. Software: interface to R source(“http://localhost:8080/molgenis4gg/R”) #download data use.experiment(name=“metanetwork”) #set default traits <- get.metabolitedata(name=“mytraits”) genotypes <- get.markerdata(name=“mygenotypes") #calculate mQTLs library(“MetaNetwork”) qtls <- qtlMapTwoPart(genotypes=genotypes, traits=traits, spike=4) #upload results for others to use add.mqtldata(qtls, name=“myqtls”) inspect MetaNetwork protocol: Fu, Swertz, Keurentjes, Jansen, Nature Protocols, 2007.

  18. Software: interface to Taverna add dbGG interface

  19. Software: interface to Taverna Use data in dbGG

  20. This enables automatic processing(see also CASIMIR use ‘case 1’) dbGG Smedley, Swertz, Wolstencroft et al, Submitted.

  21. Use BioMART and MOLGENIS to access data and Taverna to automate the workflows Gene symbols ws ws ws SNPs Strain SNP Alleles Pathways ws Your dbGG Smedley, Swertz, Wolstencroft et al, Submitted.

  22. Software: extension procedure(using MOLGENIS) Little language <!-- entity organization --> <entityname="Experiment"label="Experiment"> <fieldname="ExperimentID"key="1“ readonly="true" label="ExperimentID(autonum)"/> <fieldname="Medium" type="xref" xref_field="Medium.name"/>/> <fieldname="Protocol" label="Experiment Protocol"/> <fieldname="Temperature"type="int" Domain specific language <!-- entity organization --> <entityname="Experiment"label="Experiment"> <fieldname="ExperimentID"key="1“ readonly="true" label="ExperimentID(autonum)"/> <fieldname="Medium" type="xref" xref_field="Medium.name"/>/> <fieldname="Protocol" label="Experiment Protocol"/> <fieldname="Temperature"type="int" Reusable assets and generator/interpreter + dbGG v1: for microarrays dbGG v2: for mass spectrometry

  23. Software: extension procedure

  24. Website: demos and downloads http://gbic.biol.rug.nl/dbgg

  25. Outline • To share genotype/phenotype data and tools: 1. Interoperable software • Flat file exchange format • Database server • R/web-service interfaces • A procedure to extend the software 2. Build on extensible data model • Data • Annotations • Investigations • Integration references • Next steps

  26.  Data • Simple and close to current practice: Genotype data Subjects: STRAINS M A R K E R S DATA ELEMENTS T r a i t s: TRAIT  SUBJECT

  27.  Data • Simple and close to current practice: Genotype data Expression data Subjects: INDIVIDUALS P R O B E S DATA ELEMENTS T r a i t s: TRAIT  SUBJECT

  28.  Data • Simple and close to current practice: Genotype data Expression data Classic phenotype data Metabolite abundance data Protein abundance data And so on… TRAIT  SUBJECT

  29.  Data with any Dimension Type • Individual, • Strain, • Sample, • … SUBJECT TRAIT DATA ELEMENT • Probe • Marker • Mass Peak • … TRAIT  SUBJECT

  30.  Data • Simple and close to current practice: What about QTL data? Traits: MARKERS P R O B E S DATA T r a i t s:

  31.  Data • Simple and close to current practice: What about QTL data? Probe association data? Interaction network data? Traits: MARKERS P R O B E S DATA T r a i t s: TRAIT  TRAIT SUBJECT  SUBJECT

  32. dimension ELEMENT columns rows  Data with any Dimension Type • Minimal data model SUBJECT TRAIT DATA ELEMENT DATA ELEMENT

  33. The data model • To share genotype/phenotype data and tools: • Extensible data model • Data • Annotations • Investigations • Integration references

  34.  Annotations • Simple and close to current practice Probe annotations • PROBE IS A VARIANT OF TRAIT • HAVING: • Name • Gene • Chromosme • Locus

  35.  Annotation extends Trait or Subject SUBJECT • STRAIN • Name • Type: CSS, RIL.. • Parent Strains • INDIVIDUAL • Name • Strain • Mother • Father • Sex • SAMPLE • Name • Individual • Tissue And so on … TRAIT dimension ELEMENT • PROBE • Name • Gene • Chromosme • Locus column • MARKER • Name • Allele • Chromosme • Locus • MASSPEAK • Name • MZ • RetentionTime And so on … DATA ELEMENT row

  36.  Annotation simple in practice QTL data Genotype data STRAIN MARKER MARKER DATA ELEMENT PROBE DATA ELEMENT Extensions are automatic “under the hood” PROBE isa TRAIT isa DIMENSION ELEMENT dimension ELEMENT Expression data INDIVIDL TRAIT MARKER DATA ELEMENT PROBE

  37.  Data and  annotations DATA ELEMENTS PROBES

  38. The data model • To share genotype/phenotype data and tools: • Extensible data model • Data • Annotations • Investigations • Integration references

  39.  Investigation workflow in the lab QTL data Genotype data DATA STRAIN DATA MARKER ? ? MARKER DATA ELEMENT PROBE DATA ELEMENT Expression data DATA INDIVIDL ? MARKER DATA ELEMENT

  40.  Investigation building on FuGE QTL data Genotype data DATA Affy Array DATA QTL Mapping DATA DATA Affy M430 Protocol Affy M430 platform Bioconductor Norm. Mapping Protocol R Software FuGE: Expression data DATA DATA SNP Array DATA application Protocol Illumina Protocol Illumina Bead Studio Equipment Software FuGE: Jones et al Nature Biotech 25, 1127-1133

  41. column row Summary of data model  PROBE MARKER STRAIN INDIVIDL …   SUBJECT DATA PROTOCOL APPLICTION INVESTI GATION Software TRAIT dimension ELEMENT Equipment PROTOCOL DATA ELEMENT

  42. The data model • To share genotype/phenotype data and tools: • Extensible data model • Data • Annotations • Investigations • Integration references

  43.  References for integration • Ontology references and database references INVESTI GATION 2 INVESTI GATION 1 Hyperlink … Incompatible naming  Map mouse on human ontologies GENE Name = Mip1alpha GENE Name = Mip1a ONTOLOGY ENTRY Id = 0005615 Term = ABC Ontology=GO ONTOLOGY ENTRY Id = MP:0005385 Term = cardiovascular Ontology=MP Compatible Identifiers  DATABASE REFERENCE Id = ENSMUS098 Db=ENSEMBL DATABASE REFERENCE Id = ENSMU0S98 Db=ENSEMBL DATABASE REFERENCE Id = ENSMUS98 Db=ENSEMBL DATABASE REFERENCE Id = 1419561_AT Db=AFFY 430 FuGE: Jones et al Nature Biotech 25, 1127-1133

  44. column row Summary of data model  PROBE MARKER STRAIN INDIVIDL extensible to more experiments…   SUBJECT DATA PROTOCOL APPLICTION INVESTI GATION Software TRAIT dimension ELEMENT Equipment PROTOCOL DATA ELEMENT  ONTOLOGY ENTRY Hyperlink … DATABASE REFERENCE

  45. What is on the todo

  46. Todo • Publication: submitted  • Building a catalog of tools on top of dbGG • Experiments: in Braunschweig and Groningen • Illumina, Affy, Metabolites • Tool ‘plug-ins’ • QTL graphs, import of annotations etc. • Exploit interoperability • E.g. integrate mouse & human with ontologies • Load annotations from other dbGG/BioMARTs • Build on and extend R/Taverna interaction

  47. Summary and questions • Share genotype/phenotype data and tools: • Interoperable software • Simple flat file exchange format • Database server • R/web-service interfaces • A procedure to extend the software • Build on extensible data model • Data • Annotations • Investigations • Integration references • Next steps

  48. m.a.swertz@rug.nl Morris A. Swertz Bruno M. Tesson Richard A. Scheltema Gonzalo Vera Rudi Alberts Damian Smedley Katy Wolstencroft Andrew R. Jones Klaus Schughart John M. Hancock Helen E. Parkinson Engbert O. de Brock Carole Goble Paul Schofield Ritsert C. Jansen the GEN2PHEN consortium the CASIMIR consortium Thank you

  49. Appendix:Procedure to (re)generate a MOLGENIS

  50. MOLGENIS for data

More Related