150 likes | 381 Vues
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations. Tuuli Lappalainen et al. . Geuvadis Analysis Group Meeting II July 11, 2012, Barcelona. How to do RNAseq in a distributed setting?
E N D
Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al. Geuvadis Analysis Group Meeting II July 11, 2012, Barcelona
How to do RNAseq in a distributed setting? • What are the most important covariates in RNAseq? What kind of factors affect replication? What causes lab effects? -> companion paper? • How do different features of the transcriptome vary in populations? • mRNA levels, different types of splicing, N-TARs, conjoined genes, … • How does genome variation affect transcriptome variation? • catalogue of tQTLs • functional mechanisms of regulatory variants • frequency spectra of variants, genome-wide view to regulatory variation • loss-of-function -> companion paper • Data availability and visualization Integrating everything into a good story! “…and then we looked at X just because we could…”
Basics • Material and methods • it will be a lot easier if we keep on writing these to the wiki during the analysis • Basic descriptive stats of the data and things that we measure
How to do RNAseq • Main paper: data is good. Detailed descriptions in the companion • Rank correlation to measure sample similarity • Micha´s script, PCA • Technical covariates of quantifications • What are the most important covariates in RNAseq? • How do different QC measures correlate with each other, and what are the diagnostic measures of different problems? • What kind of factors affect replication rate of samples and genes? • What causes lab effects? Companion paper: coordinated by Peter, contributions from Olof, Jonas, Tuuli, Micha, Marc
How does the transcriptome vary? • Descriptions of different types of transcriptome characteristics – what ever stats/descriptions make most sense and are most interesting for each type • quantitative and qualitative mRNA variation, differential expression • Splicing events • N-TARs • fusion genes • RNA editing • miRNA levels • Novel miRNAs • How does miRNA expression regulate mRNA levels? • Focus on variation and on the gain of sequencing populations • Annotation bias: Do we see that rare and non-European features are underrepresented in the existing annotations?
How does genetic variation link to transcriptome variation? • Transcriptome QTLs • common variants (also repeat genotypes) • integrating different features of transcriptome variation in the same model • miRNA variants > mRNA • Rare regulatory variants • ASE approach • Transcriptome effects of loss-of-function variation • validating predicted LOF variants • compensatory mechanisms • better predictions of NMD and splicing • unannotated LOF effects – novel splice variants
How do tQTLs affect gene expression? • Are we finding causal regulatory variants? • Functional annotations of tQTLs • Different types of genetic variants: SNPs, indels, SVs
Genome-wide landscape of regulatory variation • Partitioning individual variation in allelic expression to rare and common variants • How much of transcriptome variation can we explain by the tQTLs that we discover? • Do genes with different conservation / ontology have different amount of (genetic) transcriptome variation?
Data availability and visualization • Accessibility will bring citations • Data files that we’ll make available:bams,uantification of genes, transcrfipts, exons, junctions, introns. ASE results? tQTLs? miRNAfastqs and quantifications • Browser • how to display data from 464 samples? 5-number summary tracks and data available for all individual values? • RNAseq coverage • quantitative track • Quantifications of exons, junctions… • ? • tQTLp-values • quantitative track • ASE • allelic ratios as a quantitative track • proportion of individuals with significant bias