Créer une présentation
Télécharger la présentation

Download

Download Presentation

Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

121 Vues
Download Presentation

Télécharger la présentation
## Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Welcome toResearch Simulation 1PSSMs & Search for Repeated**SequencesMonday, 9 June 2003**heterocysts**sucrose Cyanobacteria Free-living Nostoc Anabaena/Nostoc grown on NO3-, air N2 CO2 O2 Matveyev and Elhai (unpublished)**heterocysts**sucrose NH3 Cyanobacteria Free-living Nostoc Anabaena/Nostoc grown on NO3-, air NH3 N2 O2 CO2 Matveyev and Elhai (unpublished)**Tandem Heptameric RepeatsDo they come in complementary**pairs?**Tandem Heptameric RepeatsDo they come in complementary**pairs? C2. Consider the sequence below of one strand of a DNA fragment. 5'-AGAGAGAGCTAAGGTCTCTCC-3' Which of the following is a likely structure for the single-stranded fragment to assume? A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? IF:**Tandem Heptameric RepeatsDo they come in complementary**pairs? TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC**Tandem Heptameric RepeatsDo they come in complementary**pairs? TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC**hetQ**5’-GTA ..(8).. TACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CAT ..(8).. ATGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN NtcA N RNA Polymerase Regulatory Protein and their Binding Sites GTA ..(8).. TAC**mRNA**GTA…(8)…TAC …(20-24)…TAnnnT Differentiation in cyanobacteriaWhat does NtcA bind to? Herrero et al (2001) J Bacteriol 183:411-425**Table 1: Examples of position-specific scoring matrices from**sequence alignment A. Sequence alignmenta A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A Position-specific scoring matrices**A. Sequence alignmenta**A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A B. Table of occurrencesa A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1 C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2 G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2 Position-specific scoring matrices**B. Table of occurrencesa**A 0 1 0 0 5 2 1 3 4 3 C 2 0 0 0 0 1 4 0 0 2 G 0 0 5 0 0 1 0 1 0 0 T 3 4 0 5 0 1 0 1 1 0 C. Position-specific scoring matrix (B = 0)b A 0 .20 0 0 1.0 .40 .20 .60 .80 .60 C .40 0 0 0 0 .20 .80 0 0 .40 G 0 0 1.0 0 0 .20 0 .20 0 0 T .60 .80 0 1.0 0 .20 0 .20 .20 0 Position-specific scoring matrices**Table 2: Scoring a sequence with a PSSM**urt-71 T A G T A T C A A A Scorea .60 .20 1.0 1.0 1.0 .20 .80 .60 .80 .60 w/ps’countsb .51 .24 .75 .79 .79 .24 .61 .51 .65 .51 Normal’db 1.6 .75 4.2 2.5 2.5 .75 3.4 1.6 2.0 1.6 Position-specific scoring matrices Score = .60 * .20 * 1.0 * …**A. Sequence alignmenta**A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A Position-specific scoring matricesIntroduction of pseudocounts A? qG,6 = 5 real counts pG = ? pseudocounts**Position-specific scoring matricesIntroduction of**pseudocounts Score(position,nucleotide) = (q + p) / (N + B) p = pseudocounts = B * (overall frequency of nucleotide) [A] = 0.36[T] = 0.36[C] = 0.18[G] = 0.18 B = Total number of pseudocounts = Square root (N) ? or = 0.1 ?**C. Position-specific scoring matrix (B = 0)b**A 0 .20 0 0 1.0 .40 .20 C .40 0 0 0 0 .20 .80 G 0 0 1.0 0 0 .20 0 T .60 .80 0 1.0 0 .20 0 D. Position-specific scoring matrix (B = N = 2.2)c A .099 .24 .099 .099 .79 .38 .24 C .33 .056 .056 .056 .056 .19 .61 G .056 .056 .75 .056 .056 .19 .056 T .51 .65 .099 .79 .099 .24 .099 Position-specific scoring matricesIntroduction of pseudocounts**A. Sequence alignmenta**A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A B. Table of occurrencesa A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1 C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2 G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2 Position-specific scoring matricesNormalization How to account for similarity due to similar base composition? Compare ScorePSSM / Scorebackground frequency 0.79 / 0.32 = 2.2**E. Position-specific scoring matrix (B = 0.1)c**A .006 .20 .006 .006 .99 .40 .20 .59 C .40 .004 .004 .004 .004 .20 .79 .004 G .004 .004 .98 .004 .004 .20 .004 .20 T .59 .79 .006 .99 .006 .20 .006 .20 F. Position-specific scoring matrix: Log-odds form (B = 0.1)c,d A 2.2 0.7 2.2 2.2 0.0 0.4 0.7 0.2 C 0.4 2.5 2.5 2.5 2.5 0.7 0.1 2.5 G 2.5 2.5 0.0 2.5 2.5 0.7 2.5 0.7 T 0.2 0.1 2.2 0.0 2.2 0.7 2.2 0.7 Position-specific scoring matricesLog odds form Log odds = -log(score) Score * score * score … log + log + log …**Position-specific scoring matricesDecrease complexity**through info analysis Uncertainty (Hc) = - Sum [piclog2(pic)] H1= -{[4/11 log2(4/11)] + [3/11 log2(3/11)] + [1/11 log2(1/11)] + [3/11 log2(3/11)]} = 1.87 H31= -{[1/11 log2(1/11)] + [1/11 log2(1/11)] + [1/11 log2(1/11)] + [8/11 log2(8/11)]} = 1.28 Information content = Sum (Hmax– Hc) (summed over all columns)