340 likes | 504 Vues
MAGE-TAB - a simple tab delimited format for describing microarray (and potentially other) experiments in a MIAME compliant way. MAGE-TAB workshop NCI, January 24, 2008. What is needed to describe a microarray (and potentially any) experiment adequately (MIAME)?.
E N D
MAGE-TAB - a simple tab delimited format for describing microarray (and potentially other) experiments in a MIAME compliant way MAGE-TAB workshop NCI, January 24, 2008
What is needed to describe a microarray (and potentially any) experiment adequately (MIAME)? • Description of biological sample (research subject, aliquot – i.e., biomaterial) properties • Description of the assay (e.g., microarray design) • Data from the assays - raw and processed • Material and data processing protocols • Experiment design – which sample went to which assay and produced which data file
How to do this? • Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet) • Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet) • Data from the assays - raw and processed – files or tables • Material and data processing protocols – free text • Experiment design – which sample went to which assay and produced which data file – a graph (normally a DAG)
5. Experiment design graph Sample 1 Cy3 Data Hybridisation Sample 2 Cy5
5. Experiment design graph Protocol (Cy3) Sample 1 (Homo S., Brain Hybridisation (SMD array 1) Data Sample 2 (Homo S., Kidney) Protocol Protocol (Cy5)
Normalisation protocol (P-XMPL-2) Material processing protocols (P-XMPL-1) liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel liver 2 (Homo sap.) Kidney 1 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel FGDM.txt Kidney 2 (Homo sap.) Brain 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 2 (Homo sap.)
Normalisation protocol (P-XMPL-2) Material processing protocols (P-XMPL-1) liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel liver 2 (Homo sap.) Kidney 1 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel FGDM.txt Kidney 2 (Homo sap.) Brain 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 2 (Homo sap.)
Important observation • In high throughput experiments the experiment design graphs are • Regular (similar subgraphs repeated many times) • For most nodes there is only small number of incoming and outgoing edge • They can be presented in ‘layers’ in a natural way (in fact any DAG can be represented in layers) • This makes their representation as a spreadsheet simple and natural
B C A W F E G H I 5 4 3 2 1 0 Layers in a more complex DAG
l3 l2 v13 v11 l1 v12 (c113, c ...,) (c121, ..., c12m) (c111, ..., c11n) ... ... ... l3 l2 vk3 vk1 l1 vk2 (c113, ...,) (c121, ..., c12m) (ck11,..., c11n)
Real world examples • Simplified E-TABM-234.sdrf
Ontology usage • Characteristics can be either free text or an ‘ontology entry’ • Ontology entry is identified by a ‘source’ column following it
Elements to describe an experiment • Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet) • Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet) • Data from the assays - raw and processed – files or tables • Material and data processing protocols – free text • Experiment design – which sample went to which assay and produced which data file – a graph (normally a DAG) – also a spreadsheet!
MAGE-TAB • 1., 5. Sample properties and experiment design - SDRF • 2. Array design 2 - ADF (Array design file) • 4. Protocols – IDF (Investigation design file) • 3. Data files and data matrices
Array Design (ADF) • ADF has been there around since MAGE-ML times as a way to describe an array • A (table) spreadsheet - one row per array feature listing feature coordinates, the sequence, biological annotation, etc
Investigation design file IDF • Lists general information about the experiment and gives (a free text) description of all the protocols
Data files • Raw data – native formats (cel, genpix) • Normalised, summarised data – columns may be individual references
Normalisation protocol (P-XMPL-2) Material processing protocols (P-XMPL-1) liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel liver 2 (Homo sap.) Kidney 1 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel FGDM.txt Kidney 2 (Homo sap.) Brain 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 2 (Homo sap.)
Experimental Factors • Most important experimental variables (e.g., organs – liver, kidney, brain – in the examples above) • Any column in the EDF can be marked as an experimental factor • These can be propagated down the edges of the graph to columns in FGEM • In FGEM they serve as concise annotation
MAGE-TAB • Investigation design file – IDF • Array design file ADF • Experiment (sample) design file SDRF • Data files and data matrices (FGEM)
Standard design templates • simple iterated design; • iterated design with technical replicates; • iterated design with pooling; • iterated designs for dual channel experiments; • dual channel iterated designs with dye swap; • dual channel iterated designs with a reference sample; • dual channel iterated design with a reference and dye swap; • dual channel iterated design with a pooled reference; • loop design; • loop design with die swap; • time series experiments; http://www.mged.org/Workgroups/MAGE/mage.html#mage-tab
MAGE-TAB • Any experiment can be represented in MAGE-TAB in a structured way to MIAME granularity • Large experiments with a regular design can be represented in a natural way • It is possible to create MAGE-TAB files using generic spread-sheet software • It is flexible – the granularity of the experiment description can vary
Some points for later discussion • What should be the level of prescription on ID column names (source, sample, extract, ..., summary data)? • For instance, it is very often confusing to me what is source and what is sample • Do we need the concept of Experimental Factors at all? • An alternative could be optional labelling of variable characteristics as ‘intentional’
Acknowledgements • Tim Rayner, Helen Parkinson • Cathy Ball, Don Maier • Philippe Rocca-Sera (ADF) • Michael Miller, Ugis Sarkans, Paul Spellman, Anna Farne • Mohammad Shojatalob – MAGE-TAB export from ArrayExpress • MAGE working group Funding - NHGRI/NIBIB, MGED sponsors