1 / 52

ArrayExpress and Expression Atlas: Mining Functional Genomics data

ArrayExpress and Expression Atlas: Mining Functional Genomics data. Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL gabry@ebi.ac.uk. What is functional genomics (FG)?. The aim of FG is to understand the function of genes and other parts of the genome

didina
Télécharger la présentation

ArrayExpress and Expression Atlas: Mining Functional Genomics data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics TeamEBI-EMBL gabry@ebi.ac.uk

  2. ArrayExpress What is functional genomics (FG)? • The aim of FG is to understand the function of genes and other parts of the genome • FG experiments typically utilize genome-wide assays to measure and track many genes (or proteins) in parallel under different conditions • High-throughput technologies such as microarrays and high-throughput sequencing (HTS) are frequently used in this field to interrogate the transcriptome

  3. ArrayExpress What biological questions is FG addressing? • When and where are genes expressed? • How do gene expression levels differ in various cell types and states? • What are the functional roles of different genes and in what cellular processes do they participate? • How are genes regulated? • How do genes and gene products interact? • How is gene expression changed in various diseases or following a treatment?

  4. ArrayExpress Components of a FG experiment

  5. ArrayExpress FG public repositories: ArrayExpress • Is a public repository for FG data, which provides easy access to well annotated data in a structured and standardized format • Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ • Facilitates the sharing of experimental information associated with the data such as microarray designs, experimental protocols,…… • Based on community standards: MIAME guidelines & MAGE-TAB format for microarray, MINSEQE guidelines for HTS data (http://www.mged.org/minseqe/)

  6. ArrayExpress Community standards for data requirement • MIAME = Minimal Information About a Microarray Experiment • MINSEQE = Minimal Information about a high-throughput Nucleotide SEQuencing Experiment • The checklist:

  7. MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray or HTS experiments Standards for microarray & sequencingMAGE-TAB format ArrayExpress

  8. ArrayExpress ArrayExpress – two databases

  9. What is the difference between them? ArrayExpress Archive • Central object: experiment • Query to retrieve experimental information and associated data Expression Atlas • Central object: gene/condition • Query for gene expression changes across experiments and across platforms 9 ArrayExpress

  10. ArrayExpress – two databases 10 ArrayExpress

  11. ArrayExpress Archive – when to use it? • Find FG experiments that might be relevant to your research • Download data and re-analyze it. Often data deposited in public repositories can be used to answer different biological questions from the one asked in the original experiments. • Submit microarray or HTS data that you want to publish. Major journals will require data to be submitted to a public repository like ArrayExpress as part of the peer-review process. 11 ArrayExpress

  12. How much data in AE Archive?(as of mid-September 2012) 12 ArrayExpress

  13. HTS data in AE Archive(as of mid-September 2012) Microarray vs HTS RNA-, DNA-, ChIP-seq breakdown 13 ArrayExpress

  14. ArrayExpresswww.ebi.ac.uk/arrayexpress/ 14 ArrayExpress

  15. Browsing the AE Archive The date when the data were loaded in the Archive Number of assays Species investigated Curated title of experiment AE unique experiment ID loaded in Atlas flag Raw sequencing data available in ENA The direct link to raw and processed data. An icon indicates that this type of data is available. The total number of experiments and assay retrieved The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed 15 ArrayExpress

  16. Browsing the AE Archive 16 ArrayExpress

  17. Searching AE with the Experimental factor ontology (EFO) • Application focused ontology modeling the relationship between experimental factors (EFs) in AE • Developed to: • increase the richness of annotations that are currently made in AE Archive • to promote consistency • to facilitate automatic annotation and integrate external data • EFs are transformed into an ontological representation, forming classes and relationships between those classes • Combine terms from a subset of well-maintained and compatible ontologies, e.g. Gene Ontology, NCBI Taxonomy 17 ArrayExpress

  18. Building EFO An example Take all experimental factors Find the logical connection between them Organize them in an ontology disease disease sarcoma is the parent term [-] neoplasm disease neoplasm cancer is a type of [-] cancer neoplasm cancer neoplasm is synonym of [-] sarcoma disease sarcoma cancer is a type of [-] Kaposi’s sarcoma Kaposi’s sarcoma Kaposi’s sarcoma sarcoma is a type of 18 ArrayExpress

  19. Exploring EFO An example More information at: http://www.ebi.ac.uk/efo 19 ArrayExpress

  20. Searching AE ArchiveSimple query 20 ArrayExpress

  21. AE Archive query output • Matches to exact terms are highlighted in yellow • Matches to synonyms are highlighted in green • Matches to child terms in the EFO are highlighted in pink 21 ArrayExpress

  22. AE Archive – experiment view 22 ArrayExpress

  23. SDRF file – sample & data relationship 23 ArrayExpress

  24. ArrayExpress – two databases 24 ArrayExpress

  25. ArrayExpress Expression Atlas – when to use it? • Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas; • Discover which genes are differentially expressed in a particular biological condition that you are interested in.

  26. Array (platform) designs relating to the experiment must be provided. Probe annotation must be adequate to enable re-annotation of external references (e.g. Ensembl gene ID, Uniprot ID) At least 3 replicates for each value of the experimental factor Maximum 4 experimental factors Adequate sample annotation using EFO terms Presence of data files: CEL raw data files for Affymetrix assays, processed data files for non-Affymetrix ones Expression Atlas constructionExperiment selection criteria during curation 26 ArrayExpress

  27. Expression AtlasconstructionAnalysispipeline Cond.1 Cond.2 Cond.3 genes Cond.1 Cond.2 Cond.3 Input data (Affy CEL, non-Affy processed) Linear model* (Bio/C Limma) Output: 2-D matrix 1= differentially expressed 0 = not differentially expressed * More information about the statistical methodology: http://nar.oxfordjournals.org/content/38/suppl_1/D690.full 27 ArrayExpress

  28. Expression AtlasconstructionAnalysispipeline “Is gene X differentially expressed in condition 1 in this experiment?” = a single expression value for gene X Cond.1 mean Cond.2 mean Mean of all samples Cond.3 mean Compare and calculate statistic 28 ArrayExpress

  29. Expression Atlasconstruction Exp.1 Cond.1 Cond.2 Cond.3 Statistical test genes Exp. 2 Cond.4 Cond.5 Cond.6 Statistical test genes Cond.X Cond.Y Cond.Z Exp. n Statistical test genes Each experiment has its own “verdict” or “vote” on whether a gene is differentially expressed or not under a certain condition 29 ArrayExpress

  30. Expression Atlas construction Summary of the “verdicts” from different experiments 30 ArrayExpress

  31. Expression Atlas 31 ArrayExpress

  32. Atlas home pagehttp://www.ebi.ac.uk/gxa/ Restrict query by direction of differential expression Query for genes Query for conditions The ‘advanced query’ option allows building more complex queries 32 ArrayExpress

  33. Atlas home pageThe ‘Genes’ and ‘Conditions’ search boxes 33 ArrayExpress

  34. Atlas home pageA single gene query 34 ArrayExpress

  35. Atlas gene summary page

  36. Atlas experiment page 36 ArrayExpress

  37. Atlas experiment page – HTS data 37 ArrayExpress

  38. Atlas home pageA ‘Conditions’ only query 38 ArrayExpress

  39. Atlas heatmap view 39 ArrayExpress

  40. Atlas gene-condition query 40 ArrayExpress

  41. Atlas advanced search 41 ArrayExpress

  42. Atlas advanced search 42 ArrayExpress

  43. Atlas advanced search 43 ArrayExpress

  44. A glimpse of what’s coming… “Differential atlas” “Is gene X differentially expressed in condition 1 in this experiment?” = a single expression value for gene X Cond.1 mean Cond.2 mean Mean of all samples Cond.3 mean Compare and calculate statistic 44 ArrayExpress

  45. A glimpse of what’s coming… “Differential atlas” mock-up (1) 45 ArrayExpress

  46. A glimpse of what’s coming… “Differential atlas” mock-up (2) 46 ArrayExpress

  47. A glimpse of what’s coming… “Baseline atlas” • Gene expression in normal tissues, not looking for differentially expressed genes based on different conditions • E.g. “Give me all the genes expressed in normal human kidney” • Can also filter genes by expression level (e.g. FPKM values) • Start with Illumina Body Map 2.0 RNA-seq data • 16 tissues: adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells • We are working on something similar for mouse 47 ArrayExpress

  48. A glimpse of what’s coming… “Baseline atlas” mock-up display 48 ArrayExpress

  49. Data submission to AE 49 ArrayExpress

  50. Data submission to AEwww.ebi.ac.uk/microarray/submissions.html 50 ArrayExpress

More Related