470 likes | 571 Vues
Interrogating the transcriptome in all its diversity. Joel H Graber. Why were so many predictions of the number of genes in a mammalian genome wrong?. Nature Genetics , June 2000, v25 , n2. Mammalian genomes contain far more transcript variants than protein variants.
E N D
Interrogating the transcriptome in all its diversity Joel H Graber
Why were so many predictions of the number of genes in a mammalian genome wrong? • Nature Genetics, June 2000, v25, n2.
Mammalian genomes contain far more transcript variants than protein variants • Average protein products per locus = 1.7 • Average distinct transcripts per locus = 5.7 Genome Biology(2009) 10:201.
A processed, protein coding mRNA molecule includes distinct functional regions Genomic sequence Protein coding sequence 5’-untranslated region (5’-UTR) 3’-untranslated Region (3’-UTR)
~ 1-100 Mbp ~ 1-1000 kbp 5’ 3’ 3’ 5’ 5’ 3’ … … … … 3’ 5’ promoter (~103 bp) Polyadenylation site (~10-100 bp) enhancers (~10-100 bp) other regulatory sequences (~ 10-100 bp) Pieces of a (Eukaryotic) Protein -Coding Gene(on the genome) exons (cds&utr) / introns (~ 102-103 bp) (~ 102-105 bp)
Alternate mRNA processing can lead to multiple transcript and/or protein products … … 3 transcripts 1 protein product
Translation control mRNA localization DNA = water in pipes Protein = water in pool Transcription control mRNA degradation mRNA = water in hose Protein degradation Carolyn demonstrates gene regulation
A somewhat more formal view of regulation in the various stages of gene expression
Systematic changes to mRNA processing can significantly change the regulatory program of a cell • Changes can be in a single gene or systemic • Regulatory control during transcript generation • Transcription initiation site • Splicing pattern • 3’-processing (polyadenylation and cleavage) site • RNA editing • Subsequent isoform-specific regulatory control • Stability • Translational efficiency • Localization
Implications of transcript variation for gene expression measurement • Most large scale expression studies report one level per gene per sample • Microarrays: • One reported value of expression per probeset; • Duplicate probesets are either averaged or discarded • mRNAseq • RPKM (reads per kilobase of transcript per million reads) • For many genes, summarization to one expression level in a given cell type is inadequate
Every time we find a new way to measure RNA, we find previously unknown types Mattick et al, Trends Genet 2009
Classes of alternative transcripts • Alternative splicing • Alternative transcript initiation sites • Alternative cleavage and polyadenylation (3’-processing) • Combinations of one or more of these
The cascade of alternative mRNA processing in gene regulation mRNA processing selections during mRNA generation can have a profound effect on downstream regulation of the resulting transcript
Processing and specifically alternative processing are controlled by cis-elements and transfactors • mRNA processing signals are typically constrained in both sequence content and positioning • Activity of specific sites is a function of the strength of the local signals and the cell/environment specific concentrations/activities of transfactors
Alternative splicing can occur in several ways http://www.wormbook.org/
Cis elements required for splicing 3‘ss 5‘ss BP Yeast GUAUGU UACUAAC YAG ESE ESE Vertebrates YYYY AG GUAAGU CURAY NCAG GU 10-15 ESE? ESE? Plants AG GUAAGU CURAY UGYAG GU UA-rich UA-rich 62 100 70 49 64 95 100 44 79 99 58 53 42 100 57 5‘ss – 5‘ splice site (donor site) 3‘ss – 3‘ splice site (acceptor site) BP – branch point (A is branch point base) YYYY10-15 – polypyrimidine track Y – pyrimidine R – purine N – any base
Frequency of bases in each position of the splice sites Donor sequences: 5’ splice site exon intron %A 30 40 64 9 0 0 62 68 9 17 39 24 %U 20 7 13 12 0 100 6 12 5 63 22 26 %C 30 43 12 6 0 0 2 9 2 12 21 29 %G 19 9 12 73 100 0 29 12 84 9 18 20 A GGU A A G U Acceptor sequences: 3’ splice site intron exon %A 15 10 10 15 6 15 11 19 12 3 10 25 4 100 0 22 17 %U 51 44 50 53 60 49 49 45 45 57 58 29 31 0 0 8 37 %C 19 25 31 21 24 30 33 28 36 36 28 22 65 0 0 18 22 %G 15 21 10 10 10 6 7 9 7 7 5 24 1 0 100 52 25 Y Y Y Y Y Y Y Y Y Y Y N Y AGG Polypyrimidine track (Y = U or C; N = any nucleotide)
Example 1: Insulin-like growth factor 1 (Igf1) • AKA somatomedin C or mechano growth factor • Produced primarily by the liver as an endocrine hormone • Primary action is mediated by binding to IGF1R • Natural activator of the AKT pathway • A primary mediator of the effects of growth hormone • Expression has been • Negatively correlated with lifespan • Positively correlated with body size • Its regulatory control remains poorly understand after 30y
IGF1 is subject to extensive alternative mRNA processing ~83,000 nt
IGF1 mRNA data indicates at least 15 or more transcript isoforms
Salient features of IGF1 expression • Mature, circulating IGF1 protein is a cleavage product, coded entirely in exons 3 and 4 • Exon 5 contains an additional peptide cleavage product, with demonstrated independent functionality • Exons 1 and 2 are mutually exclusive, and likely not the only upstream, transcript initiating exons • Exon 5 can be skipped, included or 3’-terminal • Exon 6’s reading frame changes depending on whether it is spliced from exon 4 or 5
Alternative 3’-processing can arise in several ways with varying consequences Adapted from Yan J, et al.,Genome Research. 2005; 15(3):369-75.
PolyA site selection depends on sequence elements and abundance/stochiometry of trans-factors PAS 5’ UGUA AAUAAA 30 kD PAPOL 160 kD 73 kD 68 kD 25 kD 100 kD CPSF 50 kD 77 kD 77 kD Symplekin 64 kD UG-rich 50 kD 64 kD CSTF DSE U-rich hnRNP H G-rich Up to >80 proteins in complex 3’
NMF defines patterns of signals that control 3’-processing (cleavage and polyadenylation)
Example 2: Insulin-like growth factor 2 mRNA binding protein 1 (Igf2bp1) • Contains four K homology domains and two RNA recognition motifs • Binds to the 5’-UTR of IGF2 mRNA, regulating translation • Can act as an oncogene if misregulated • Evolutionarily conserved, with critical role in mRNA localization and translational control
Consequences: Igf2bp1 has transforming potential only when expressed in its truncated isoform ~50,000 nt ~6,500 nt 5’ 3’ AAA… AAA… Mayr and Bartel, Cell 2009
Inclusion (or exclusion) of regulatory sequences in the 3’-UTR fine tune expression and response • Spicheret al, Mol Cell Biol 1998
Example 3: Regulated control of polyA site selection for anitbodies during B-cell maturation
Alternative transcription initiation can arise in several ways with varying consequences
CAGE tags showed an unexpectedly high frequency in the 3’-UTR
3’-UTR CAGE tags occur in evolutionarily conserved contexts with a common local sequence
The definition of a gene becomes much more fluid: Ins2-IGF2 • Two genes with spurious connection? • One large genes with distinct, disjoint transcripts?
Cleaved 3’-UTR RNA products (uaRNAs) are often tissue-specific and can localize differentially
Next time: Details of measuring transcript differences in large-scale