1 / 30

Prediction of Gene Expression in Yeast using Conserved Sequence Templates

Prediction of Gene Expression in Yeast using Conserved Sequence Templates. Derek Chiang & Alan Moses Molecular & Cell Biology, UCB Eisen Lab. PROBLEM. What information controlling gene expression is encoded in genome sequences?.

lotte
Télécharger la présentation

Prediction of Gene Expression in Yeast using Conserved Sequence Templates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of Gene Expression in Yeast using Conserved Sequence Templates Derek Chiang & Alan Moses Molecular & Cell Biology, UCB Eisen Lab

  2. PROBLEM What information controlling gene expression is encoded in genome sequences? >Saccharomyces cerevisiae chr V CGTCTCCTCCAAGCCCTGTTGTCTCTTACCCGGATGTTCAACCAAAAGCTACTTACTACCTTTATTTTATGTTTACTTTTTATAGATTGTCTTTTTATCCTACTCTTTCCCACTTGTCTCTCGCTACTGCCGTGCAACAAACACTAAATCAAAACAGTGAAATACTACTACATCAAAACGCATATTCCCTAGAAAAAAAAATTTCTTACAATATACTATACTACACAATACATAATCACTGACTTTCGTAACAACAATTTCCTTCACTCTCCAACTTCTCTGCTCGAATCTCTACATAGTAATATTATATCAAATCTACCGTCTGGAACATCATCGCTATCCAGCTCTTTGTGAACCGCTACCATCAGCATGTACAGTGGTACCTTCGTGTTATCTGCAGCGAGAACTTCAACGTTTGCCAAATCAAGCCAATGTGGTAACAACCACACCTCCGAAATCTGCTCCAAAAGATACTCCAGTTTCTGCCGAAATGTTT Features?

  3. Gene Expression: Experiment

  4. Gene Expression: Mechanism Promoter Structure

  5. Conserved Sequence Rules Hidden Variables Sites S1, S2, …Positions P11, P12, …P21, P22, …

  6. Conserved Sequence Rules Hidden Variables Sites S1, S2, …Positions P11, P12, …P21, P22, … PRIORS from sequence conservation

  7. Conserved Sequence Rules S2 S1 Conserved Sequence Rules Hidden Variables Sites S1, S2, …Positions P11, P12, …P21, P22, … { S1 > 0; S2 > 0; min( | P1j – P2j | < 30 ) } j

  8. Conserved Word Pairs • Finding Conserved Words • Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing • Validate with Gene Expression

  9. Finding Conserved Words • 3860 CLUSTALW alignments from MIT • Conserved wordCw Found in same position in 3+ genomes within 600 bp of gene start > MET28_YAP5 Scer AACCTAAAACCAAAAAAAA-A-AAATAAGTCACGTGCACT Spar AATAAAAAATAGACTAACA-A-ATTGCGGTCACGTGCACT Smik AATCCCAGGCCAAAAACCAGA-AATTGAGTCACGTGCAGT Sbay GTCACGTGCCCGACGGCCCCACAACTGTGGCATCCATCTT

  10. Conserved Word Pairs • Finding Conserved Words • Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing • Validate with Gene Expression

  11. TEST: Joint Word Conservation Word 2 Conserved Y N S2 S1 Y Word 1 Conserved N

  12. TEST: Joint Word Conservation Y N Y 32 162 N 3226 134 Word 2 Conserved S2 S1 Word 1 Conserved Chi-square Test for Independence(Yates adjustment)

  13. TEST: Joint Word Conservation • 2090 words (Length 6: Word-Rev complement ) • 2.06  106 Word PAIRS (Exclude overlap) HISTOGRAM Empirical c20.005 = 30 Bonferroni c20.05 = 31 -log10( Frequency ) Yates adjusted c2

  14. TEST: Close Spacing For conserved genes g : 1 … N Jg = ( P1g, P2g ) sampled jointly from position distributions Test Statistic P1 P2 NULL: X obtained from random sampling ALT: Xsmaller than expected from sampling

  15. TEST: Close Spacing Nonparametric Bootstrap 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Avg. Min. Distance X = 32 bp

  16. TEST: Close Spacing Nonparametric Bootstrap 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 7 8 2 4 1 6 3 5 2 3 6 8 1 4 7 5 Bootstrap Distance X*b = 65 bp

  17. TEST: Close Spacing • Position distributions P1 = { P11, … , P1v } P2 = { P21, … , P2w } • Data J1 = ( P11, P21 ), … , Jn = ( P1n, P2n ) • Bootstrap J1*b = ( P11*b, P21 *b ), … , Jn = ( P1n *b, P2n *b) resampled with replacement from P1, P2 • Record quantile of X in 100000 samples of X*b (empirical null distribution) Nonparametric Bootstrap

  18. TEST: Close Spacing HISTOGRAM of Bootstrap Quantiles for Word Pairs

  19. Conserved Word Pairs • Finding Conserved Words • Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing • Validate with Gene Expression

  20. Validating Expression Subsets Conserved Sequence (SUBSETTING) Rules S2 Genome (6000 genes) S1 { S1 > 0; S2 > 0; min( | P1j – P2j | < d ) } ? j SUBSET (N genes) Assess gene expression

  21. Validating Expression Subsets Some Nonparametric Tests • Subset mean (Mann-Whitney) • Subset distribution (Kolmogorov-Smirnov) • Subset weighted correlation ( ? ) • Kernel density classification (Mixture of normals) • Find optimal word pair distance d* for each test …

  22. Validating Expression Subsets Kolmogorov-Smirnov Frequency log2(Gene expression)

  23. Conserved Word Pairs – EXAMPLES aa / N starv Cadmium Heat shock AACTGT-CACGTG (Cbf1-Met31) N 29Pair c2 55Spacing p 0.01 63 bp MET28_YAP5 189 bp YHR112C_YHR113 p < 10-7 p < 10-8

  24. Conserved Word Pairs – EXAMPLES Other DNA damage MMS AAACGA-CCGATA (Upc2-Hap1) N 32Pair c2 39.5Spacing p 0 34 bp 101 bp p < 10-5

  25. Pipeline Summary # Genes > 10 Spacing < 0.05 K-S < 10-5 & Improve > 10 Known TF

  26. Future Directions • Better gene expression subset tests (Timecourse) • More flexible sequence models (IUPAC, Self-dimer) • Automate distance cutoff (Distance d) • Parameter optimization: 8 threshold values!( Conserved: # aligned genomes & # upstream bp, Joint conservation c2, Bootstrap quantile, K-S probs, Min gene #, Distance d )

  27. Acknowledgements Michael Eisen Eisen Lab Justin Fay Audrey Gasch Hunter Fraser Venky Nandagopal Dan Pollard Ben Lewis

  28. Validating Expression Subsets Kolmogorov-Smirnov

  29. Comparing Expression Samples

  30. Comparing Expression Samples

More Related