Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003 PowerPoint Presentation
Download Presentation
Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

121 Vues Download Presentation
Télécharger la présentation

Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Welcome toResearch Simulation 1PSSMs & Search for Repeated SequencesMonday, 9 June 2003

  2. heterocysts sucrose Cyanobacteria Free-living Nostoc Anabaena/Nostoc grown on NO3-, air N2 CO2 O2 Matveyev and Elhai (unpublished)

  3. heterocysts sucrose NH3 Cyanobacteria Free-living Nostoc Anabaena/Nostoc grown on NO3-, air NH3 N2 O2 CO2 Matveyev and Elhai (unpublished)

  4. Tandem Heptameric RepeatsDo they come in complementary pairs?

  5. Tandem Heptameric RepeatsDo they come in complementary pairs? C2. Consider the sequence below of one strand of a DNA fragment. 5'-AGAGAGAGCTAAGGTCTCTCC-3' Which of the following is a likely structure for the single-stranded fragment to assume? A B C

  6. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  7. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  8. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  9. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  10. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  11. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  12. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  13. Tandem Heptameric RepeatsDo they come in complementary pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C

  14. Tandem Heptameric RepeatsDo they come in complementary pairs? IF:

  15. Tandem Heptameric RepeatsDo they come in complementary pairs? TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC

  16. Tandem Heptameric RepeatsDo they come in complementary pairs? TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC

  17. hetQ 5’-GTA ..(8).. TACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CAT ..(8).. ATGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN NtcA N RNA Polymerase Regulatory Protein and their Binding Sites GTA ..(8).. TAC

  18. mRNA GTA…(8)…TAC …(20-24)…TAnnnT Differentiation in cyanobacteriaWhat does NtcA bind to? Herrero et al (2001) J Bacteriol 183:411-425

  19. Table 1: Examples of position-specific scoring matrices from sequence alignment A. Sequence alignmenta A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A Position-specific scoring matrices

  20. A. Sequence alignmenta A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A B. Table of occurrencesa A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1 C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2 G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2 Position-specific scoring matrices

  21. B. Table of occurrencesa A 0 1 0 0 5 2 1 3 4 3 C 2 0 0 0 0 1 4 0 0 2 G 0 0 5 0 0 1 0 1 0 0 T 3 4 0 5 0 1 0 1 1 0 C. Position-specific scoring matrix (B = 0)b A 0 .20 0 0 1.0 .40 .20 .60 .80 .60 C .40 0 0 0 0 .20 .80 0 0 .40 G 0 0 1.0 0 0 .20 0 .20 0 0 T .60 .80 0 1.0 0 .20 0 .20 .20 0 Position-specific scoring matrices

  22. Table 2: Scoring a sequence with a PSSM urt-71 T A G T A T C A A A Scorea .60 .20 1.0 1.0 1.0 .20 .80 .60 .80 .60 w/ps’countsb .51 .24 .75 .79 .79 .24 .61 .51 .65 .51 Normal’db 1.6 .75 4.2 2.5 2.5 .75 3.4 1.6 2.0 1.6 Position-specific scoring matrices Score = .60 * .20 * 1.0 * …

  23. A. Sequence alignmenta A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A Position-specific scoring matricesIntroduction of pseudocounts A? qG,6 = 5 real counts pG = ? pseudocounts

  24. Position-specific scoring matricesIntroduction of pseudocounts Score(position,nucleotide) = (q + p) / (N + B) p = pseudocounts = B * (overall frequency of nucleotide) [A] = 0.36[T] = 0.36[C] = 0.18[G] = 0.18 B = Total number of pseudocounts = Square root (N) ? or = 0.1 ?

  25. C. Position-specific scoring matrix (B = 0)b A 0 .20 0 0 1.0 .40 .20 C .40 0 0 0 0 .20 .80 G 0 0 1.0 0 0 .20 0 T .60 .80 0 1.0 0 .20 0 D. Position-specific scoring matrix (B = N = 2.2)c A .099 .24 .099 .099 .79 .38 .24 C .33 .056 .056 .056 .056 .19 .61 G .056 .056 .75 .056 .056 .19 .056 T .51 .65 .099 .79 .099 .24 .099 Position-specific scoring matricesIntroduction of pseudocounts

  26. A. Sequence alignmenta A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A B. Table of occurrencesa A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1 C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2 G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2 Position-specific scoring matricesNormalization How to account for similarity due to similar base composition? Compare ScorePSSM / Scorebackground frequency 0.79 / 0.32 = 2.2

  27. E. Position-specific scoring matrix (B = 0.1)c A .006 .20 .006 .006 .99 .40 .20 .59 C .40 .004 .004 .004 .004 .20 .79 .004 G .004 .004 .98 .004 .004 .20 .004 .20 T .59 .79 .006 .99 .006 .20 .006 .20 F. Position-specific scoring matrix: Log-odds form (B = 0.1)c,d A 2.2 0.7 2.2 2.2 0.0 0.4 0.7 0.2 C 0.4 2.5 2.5 2.5 2.5 0.7 0.1 2.5 G 2.5 2.5 0.0 2.5 2.5 0.7 2.5 0.7 T 0.2 0.1 2.2 0.0 2.2 0.7 2.2 0.7 Position-specific scoring matricesLog odds form Log odds = -log(score) Score * score * score … log + log + log …

  28. Position-specific scoring matricesDecrease complexity through info analysis Uncertainty (Hc) = - Sum [piclog2(pic)] H1= -{[4/11 log2(4/11)] + [3/11 log2(3/11)] + [1/11 log2(1/11)] + [3/11 log2(3/11)]} = 1.87 H31= -{[1/11 log2(1/11)] + [1/11 log2(1/11)] + [1/11 log2(1/11)] + [8/11 log2(8/11)]} = 1.28 Information content = Sum (Hmax– Hc) (summed over all columns)