1 / 33

ENCODE and GTEx Comparison for EN-TEx Project

Explore the comparison between the ENCODE and GTEx projects in the EN-TEx dataset, including regulatory element annotations, personal genome reconstructions, and gene expression analysis.

aespinosa
Télécharger la présentation

ENCODE and GTEx Comparison for EN-TEx Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENCODE updates Anna Vlasova 08/02/2017 Group meeting

  2. Outline • EN-TEx project • Comparison with the GTEx • Personal genomes • ENCODE Encyclopedia v.3 • Assay matrix • Annotations, candidate regulatory elements • Data access 2

  3. EN-TEx project • A collaboration between GTEx and ENCODE projects • Sequencing: 4 donors x ~20 tissues • EN-TEx Assays: • total RNAseq • small RNAs, micro RNAs, • RAMPAGE • histone marks • ATAC-Seq • Genotyping arrays • Methylation arrays • DNA-Seq: Illumina, PacBio, 10xGenomics 3

  4. EN-TEx project 4

  5. EN-TEx project EN-TEx datasets Comparison with GTEx Regulatory elements annotation Personal genome reconstruction 5

  6. EN-TEx project.GTEx comparison • EN-Tex: • total RNAseq,stranded • GTEx: polyA+, non stranded • Non polyA+ RNAs pattern in different tissues • Circular RNAs • Antisense expression • Retained introns • Novel isoforms 6

  7. EN-TEx project.GTEx comparison PCA plot for EN-Tex and GTEx v.7 samples, RPKMs, 46,093 long genes 7

  8. EN-TEx project.GTEx comparison PCA clustering after batch correction with limma Actual EN-Tex vs GTEx differences might be also removed Differences between tissues are dominating 8

  9. Gene expression examples ENSG00000259001.2, gene_name=RPPH1,gene_type=antisense biggest RC value After batch correction with limma Before batch correction The high expression level of this gene in the total RNAseq samples was previously reported Variability in the protocols totalRNA vs polyA+ was removed because it is completely overlap with the batch 16

  10. EN-TEx project.GTEx comparison Differential gene expression analysis between EN-Tex and 80 samples from GTEx Distribution of the upregulated genes by categories EN-Tex GTEx Does it makes sense to compare datasets in such a way? 10

  11. EN-TEx project.Personal genome reconstruction 60x 55x 35x from Michael Schatz 11

  12. EN-TEx project.Personal genome from Alex Dobin 12

  13. EN-TEx project.Personal genome from Alex Dobin 13

  14. Diploid annotation Transcript ENST00000513158.1, gene ENSG00000251314.2 has significant differences between two haplotypes Haplotype 2 Haplotype 1 Reference annotation 14

  15. Diploid annotation chr5_1 HAVANA gene 95266864 95889235 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENSG00000251314.2_1"; chr5_1 HAVANA transcript 95266864 95684208 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENST00000502645.2_1"; chr5_1 HAVANA transcript 95266865 95623568 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENST00000511775.1_1"; chr5_1 HAVANA transcript 95888695 95889235 . + . gene_id "ENSG00000251314.2_1"; transcript_id "ENST00000513158.1_1"; chr5_2 HAVANA gene 95264535 95933349 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENSG00000251314.2_2"; chr5_2 HAVANA transcript 95264535 95933349 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENST00000502645.2_2"; chr5_2 HAVANA transcript 95264536 95621206 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENST00000511775.1_2"; chr5_2 HAVANA transcript 95886762 95933264 . + . gene_id "ENSG00000251314.2_2"; transcript_id "ENST00000513158.1_2"; 15

  16. Diploid quantification Proportion of allelic expression within an isoform Haplotype _1 or _2 (maternal/paternal) Proportion of allelic expression within a gene

  17. Diploid quantification Gene quantification = sum of all reads assign to all isoforms in both haplotypes 17

  18. EN-TEx project.Personal genome This data is not ready yet! Another personal genome available for training and analysis: GM12878 Mark Gerstein 18

  19. hap1 hap2 5 4 9 Diploid annotation.GM12878 Number of genomic features calculated per chr1-22 and chrX. chrY, chrM and scaffolds are excluded 19

  20. Diploid annotation.GM12878 Transcrip lengths Gene lengths Exon lengths 20

  21. Gene expression.GM12878, total RNAseq x 2 replicates Correlation between diploid and reference mappings Correlation between replicates in each mappings Replicate 1 Replicate 2 21

  22. Gene expression difference.GM12878 There are 147 genes DE , edgeR , log(FC)>=2, FDR<=0.01 Up-regulated in diploid quantifications Up-regulated in reference quantifications 22

  23. Allele speicific gene expression.GM12878 Allele specific expression statistics 23

  24. EN-TEx project.Personal genome 24

  25. Acknowledgments 25

  26. ENCODE Encyclopedia v.3 26 https://www.encodeproject.org/

  27. ENCODE Encyclopedia v.3 • In total there are >13,000 experiments • ENCODE, modEncode, Roadmap(REMC), Genomics of Gene Regulation (GGR) and encyclopedia of regulatory networks (modERN) • Human, Mouse, Drosophila, C.elegans • Different assays, >40 types • Among others: single cell experiments, 3D chromatin interaction, shRNA/CRISPR genome editing,.. • Uniform data processing. • Pipelines are available in the github 27

  28. ENCODE Encyclopedia v.3 https://www.encodeproject.org/data/annotations/ 28

  29. ENCODE Encyclopedia v.3. Middle level • Number of annotation file sets: expression matrices, promoter/enhancer-like regions, blacklisted regions • Registry of Candidate Regulatory Elements (cREs) • List of candidate enhancers/promoters based on DNase and H3K27ac/H3K4me3 signals • ~2.6M human cREs and ~1.6M mouse CRs • Cell-type specific, data is in the bed format • Web-based tool to access cREs SCREEN http://zlab-annotations-v4.umassmed.edu/ 29

  30. ENCODE Encyclopedia v.3. Top level • Complete set of chromatin states for well characterized epigenomes • Human cell types and mouse embryonic tissues • ChromHMM models for >260 experiments • Segway tool: semi-automated genomic annotation • Cell type-specific annotations and encyclopedia (164 human cell types) • Contiguous regions of high functionality score • Functional labels: inactive region,transcribed, promoter, bivalent,... http://noble.gs.washington.edu/proj/encyclopedia/ 30

  31. ENCODE Encyclopedia v.3. Segway annotation A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types, https://doi.org/10.1101/086025 31

  32. ENCODE Encyclopedia v.3 • Metadata for the files can be found • in ENCODE portal • in the Julien’s index file. index=/users/rg/jlagarde/projects/encode/scaling/whole_genome/3ncod3_production_files/files_local_fullpath_dcc_list.txt 32

  33. Thank you! 33

More Related