Transcriptome analysis

BIT 815: Analysis of Deep Sequencing Data Transcriptome analysis • With a reference • Challenging due to size and complexity of datasets • Many tools available, driven by biomedical research • GATK and R/Bioconductor offer many options • Start by mapping reads to reference genome with a mapping/alignment tool – deal with exon-intron junctions • Reconstruct transcripts from mapped reads – deal with alternate splicing products • Calculate relative abundance of different transcripts • Estimate biological significance based on annotation • Example tools: Bowtie/TopHat, Cufflinks, Myrna

Workflow summary from a review “From RNA-seq reads to differential expression results”, by Oshlack et al, Genome Biol 11:220, 2010. Note emphasis on statistical analysis methods; an equal emphasis should be placed on experimental design.

BIT 815: Analysis of Deep Sequencing Data The ‘Tuxedo’ suite of programs: Bowtie, TopHat, Cufflinks and CummeRbund See Trapnell et al, Nature Protocols 7:562 – 578, 2012 for details

TopHat maps reads • Cufflinks assembles transcripts • Cuffmerge merges transcript data detected in different treatments • Cuffdiffevaluates differential expression • CummeRbund provides visualization tools

BIT 815: Analysis of Deep Sequencing Data Why merge data across treatments?

BIT 815: Analysis of Deep Sequencing Data Differential transcript abundance mechanisms

BIT 815: Analysis of Deep Sequencing Data Transcriptome analysis • Without a reference • First step is assembly • Transcriptome assembly pipelines • Velvet/Oases – Oases is a post-assembly processor for Velvet • Trans-ABySS (BCGSC) – based on ABySS parallel assembler • Rnnotator – based on Velvet • Trinity (Broad Institute) – a set of three programs • Common strategy: Assembly at multiple k-values, then merging of resulting contigs, followed by refinement • Once an assembly is available, continue with analysis as before

BIT 815: Analysis of Deep Sequencing Data After Transcriptome Assembly… • Some amount of analysis of differential splicing versus differential promoter activity is possible, but conclusions may be less robust in the absence of a reference • The fraction of the total number of genes that can be discovered by RNA-seq depends on the diversity of tissue types and developmental stages analyzed, as well as the depth of sequencing

330 million SOLiD reads from a human cell line detect only about 67% of all annotated transcripts in the human genome. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Labaj et al, Bioinformatics 27:i383-91, 2011

BIT 815: Deep Sequencing Transcriptomeanalysis with RSEMRNA-Seq with Expectation MaximizationLi & Dewey, BMC Bioinformatics 12:323, 2011 (a). Allows estimation of transcript abundance without a reference genome, based on alignments to assembled transcripts, although the transcripts can be taken from a reference genome sequence if it is available (b). Uses the Bowtie aligner by default, but considers reads that map to multiple locations in the reference transcript collection (c). For each sample, files of estimated transcript and isoform abundance are produced, along with SAM files of alignments. (d). The files of transcript and isoform abundance can be used to evaluate differential expression using tools from R and Bioconductor

Transcriptome analysis

Transcriptome analysis

Presentation Transcript

Exploring the Human Transcriptome

RNA-Seq and transcriptome analysis

Transcriptome analysis

RNA-Seq and transcriptome analysis

Transcriptome Sequencing with Reference

The Transcriptome

Whole transcriptome analysis of germinating smoke water treated maize seeds

Transcriptome

Transcriptome analysis of the TnrA regulon in Bacillus subtilis

Genomics I: The Transcriptome

Transcriptome analysis using Open Reading frame ESTs (ORESTES)

Respective contributions of MIAME, GeneOntology and UMLS for transcriptome analysis

Transcriptome

Whole transcriptome analysis of smoke-water treated maize seeds

Transcriptome and analysis of gene transcription

Transcriptome Analysis & Applications

Transcriptome Analysis

Semantic Web Technologies for Analysis of Transcriptome