420 likes | 894 Vues
The Transcriptome. Gene Discovery Quantitation of Gene Expression. Reading: Ch 15.1. BIO520 Bioinformatics Jim Lund. WHY?. The genes (proteins) expressed determine the state of the cell. Signaling. Metabolic capabilities. Differentiation state (cell type).
E N D
The Transcriptome Gene Discovery Quantitation of Gene Expression Reading: Ch 15.1 BIO520 Bioinformatics Jim Lund
WHY? • The genes (proteins) expressed determine the state of the cell. • Signaling. • Metabolic capabilities. • Differentiation state (cell type). • Response to changes in environment. • Verifies gene predictions. • Transcriptional regulation • Normal vs. abnormal • Conditional expression
Transcriptome Analysis • Gene (transcript) discovery • transcripts • alternative splicing/processing • Transcript assays • Promoter analysis • Transcription Factors • Cellular control networks
Gene Discovery • Inference from genomic DNA • Prokaryotes & fungi OK • cDNA characterization • EST • SAGE
EST (Expressed Sequence Tag) • Sequence cDNA libraries • proportional libraries • subtracted or normalized libraries • Which end? • 5’ or 3’ or Whole
“regular” or proportional Subtracted Miss alternate transcripts normalized Tissue Primer dT vs random Library Type
Which end? • Whole cDNA • BEST & HARDEST (Long) • 3’-end • Consistent technically, limited information • 5’end • Coding “identity” highest • 5’ AND 3’ • Good, but technical & informatic challenge
EST Data Analyses • Clustering Analysis • Assemble ESTs into genes. • Alternative splicing forms • Find coding SNPs. • Truncated, unspliced, and junk ESTs can be misleading • Project: Unigene • Program: stackPACK • Frequency analysis • Digital Differential Display • DDD is a computational method for comparing sequence-based gene representation profiles among individual cDNA libraries or pools of libraries.
EST Results (old) • Known genes (30%) • Similarities to other ORFs, ESTs (30%) • Infer Function? • Novel Class (30%, w/ time)
Typical Progress/Results • Humans • 6,694,833 ESTs • 124,179 clusters (“sets”) • 29,000 sets contain EST and mRNA seqs. • CGAP EST library ”plateau” broken by: • different tissues, different states • normalized libraries
Data Quality Considerations • 99% correct data (1% errors!). • Frameshifts-effects depend on tools • BLASTX tool to “find” frameshifts • How sensitive? • TBLASTX, TBLASTN to “use” in other projects • How sensitive?
Gene Expression Assays • EST (Poor method) • SAGE • Microarray Hybridization • Next Gen Sequencing. • Transcriptional Fusions • GFP, LacZ fusions
Serial Analysis of Gene Expression (SAGE) • Collect mRNA • Isolate short oligomers from each transcript. • Ligate together the oligomers and clone them. • Sequence thousands of clones. • Map the 1x104 – 1x105 oligomers to their genes. • Find which genes are transcribed and their relative expression levels. • http://www.sagenet.org (Vogelstein at JHU)
SAGE technique • Prepare biotin labeled cDNA • Cleave with anchoring enzyme (NlaIII)
SAGE technique • Ligate on linkers • Cleave with tagging enzyme (BsmFI)
SAGE technique • Ligate, PCR, and gel purify ditags (102bp). • Recleave with anchoring enzyme (NlaIII), ligate to form concatemers. • Size select, clone and sequence concatemers.
Microarray Hybridization • Determine gene expression by parallel hybridization of labeled cDNA to DNA attached to a fixed support. • http://cmgm.stanford.edu/pbrown/
Microarray Hybridization • Producing chips • Producing probes / reading arrays • Analyzing and interpreting data
Transcriptional Array orf 1 orf 2 orf 3 1 2 3 3 cm 4 5 6 200 spots 7 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 > All human genes mRNA mRNA
1 2 6 8 Transcriptional Array-1 orf 1 orf 2 orf 3 1 2 3 3 cm 4 5 6 200 spots 7 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 Condition 2 > All human genes mRNA mRNA mRNA
Transcriptional Array-2 orf 1 orf 2 orf 3 1 1 2 2 3 3 3 cm 6 4 5 6 200 spots 7 7 8 8 9 2 40,000 dot/9 cm or Condition 1 Condition 2 > All human genes mRNA mRNA
Microarray Technologies • Spotted arrays (Brown et al.) • Spot arrays on glass slides • PCR fragments • Long (50-70bp) oligo arrays • Synthesis • Affymetrix (www.affymetrix.com) • High density array of 25 bp oligos • Made using light directed oligonucleotide synthesis and photolithography • Agilent, CombiMatrix • Made using light directed oligonucleotide synthesis and mirrors.
Affymetrix photolithographic technology • Lithographic masks are used to either block or transmit light onto specific locations of the array. • The surface is then flooded with a solution containing either adenine, thymine, cytosine, or guanine, and coupling occurs only in those regions on the glass that have been deprotected through illumination. • The coupled nucleotide also bears a light-sensitive protecting group, so the cycle can be repeated. • Microarray is built as the probes are synthesized through repeated cycles of deprotection and coupling. • Typically ends at 25 bps.) • Current arrays have 1.3 million unique features per array.
Affymetrix GeneChips: Expression Analysis • Available for humans and model organisms. • Made only by Affymetrix. • Chip designs change slowly. • GeneChips: • Human: 50,000 RefSeq genes and ESTs • C. elegans: 22,500 genes (12/00 genome annotation) • Rat 230: 30,000 genes, ESTs • Yeast: 6100 gene set • Tiling arrays for model organisms • http://affymetrix.com
Quantitation of fluorescence signals (Image to data) • Hybridization, scan in chip image. • Gridding • Determine where the spots are. • Spot intensity and local background determination. • Normalization • Adjust to make the red and green total signal intensities the same. • Gene expression ratio. • Red channel/green channel. • Programs: • ScanAlyze, http://rana.lbl.gov/EisenSoftware.htm • GenePix, http://www.moleculardevices.com/pages/instruments/microarray_main.html
Microarray data Big tables of numbers!
Viewing microarray data Clustergram Scatter plot: log(ch1) vs log(ch2) M vs A: expression levell vs expression change Volcano plot: log(expr) vs p-value