1 / 28

280 likes | 408 Vues

Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003. heterocysts. sucrose. Cyanobacteria. Free-living Nostoc. Anabaena/Nostoc grown on NO 3 - , air. N 2. CO 2. O 2. Matveyev and Elhai (unpublished). heterocysts. sucrose. NH 3. Cyanobacteria.

Télécharger la présentation
## Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Welcome toResearch Simulation 1PSSMs & Search for Repeated**SequencesMonday, 9 June 2003**heterocysts**sucrose Cyanobacteria Free-living Nostoc Anabaena/Nostoc grown on NO3-, air N2 CO2 O2 Matveyev and Elhai (unpublished)**heterocysts**sucrose NH3 Cyanobacteria Free-living Nostoc Anabaena/Nostoc grown on NO3-, air NH3 N2 O2 CO2 Matveyev and Elhai (unpublished)**Tandem Heptameric RepeatsDo they come in complementary**pairs?**Tandem Heptameric RepeatsDo they come in complementary**pairs? C2. Consider the sequence below of one strand of a DNA fragment. 5'-AGAGAGAGCTAAGGTCTCTCC-3' Which of the following is a likely structure for the single-stranded fragment to assume? A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? 5'-AGAGAGAGCTAAGGTCTCTCC-3' A B C**Tandem Heptameric RepeatsDo they come in complementary**pairs? IF:**Tandem Heptameric RepeatsDo they come in complementary**pairs? TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC**Tandem Heptameric RepeatsDo they come in complementary**pairs? TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC**hetQ**5’-GTA ..(8).. TACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CAT ..(8).. ATGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN NtcA N RNA Polymerase Regulatory Protein and their Binding Sites GTA ..(8).. TAC**mRNA**GTA…(8)…TAC …(20-24)…TAnnnT Differentiation in cyanobacteriaWhat does NtcA bind to? Herrero et al (2001) J Bacteriol 183:411-425**Table 1: Examples of position-specific scoring matrices from**sequence alignment A. Sequence alignmenta A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A Position-specific scoring matrices**A. Sequence alignmenta**A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A B. Table of occurrencesa A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1 C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2 G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2 Position-specific scoring matrices**B. Table of occurrencesa**A 0 1 0 0 5 2 1 3 4 3 C 2 0 0 0 0 1 4 0 0 2 G 0 0 5 0 0 1 0 1 0 0 T 3 4 0 5 0 1 0 1 1 0 C. Position-specific scoring matrix (B = 0)b A 0 .20 0 0 1.0 .40 .20 .60 .80 .60 C .40 0 0 0 0 .20 .80 0 0 .40 G 0 0 1.0 0 0 .20 0 .20 0 0 T .60 .80 0 1.0 0 .20 0 .20 .20 0 Position-specific scoring matrices**Table 2: Scoring a sequence with a PSSM**urt-71 T A G T A T C A A A Scorea .60 .20 1.0 1.0 1.0 .20 .80 .60 .80 .60 w/ps’countsb .51 .24 .75 .79 .79 .24 .61 .51 .65 .51 Normal’db 1.6 .75 4.2 2.5 2.5 .75 3.4 1.6 2.0 1.6 Position-specific scoring matrices Score = .60 * .20 * 1.0 * …**A. Sequence alignmenta**A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A Position-specific scoring matricesIntroduction of pseudocounts A? qG,6 = 5 real counts pG = ? pseudocounts**Position-specific scoring matricesIntroduction of**pseudocounts Score(position,nucleotide) = (q + p) / (N + B) p = pseudocounts = B * (overall frequency of nucleotide) [A] = 0.36[T] = 0.36[C] = 0.18[G] = 0.18 B = Total number of pseudocounts = Square root (N) ? or = 0.1 ?**C. Position-specific scoring matrix (B = 0)b**A 0 .20 0 0 1.0 .40 .20 C .40 0 0 0 0 .20 .80 G 0 0 1.0 0 0 .20 0 T .60 .80 0 1.0 0 .20 0 D. Position-specific scoring matrix (B = N = 2.2)c A .099 .24 .099 .099 .79 .38 .24 C .33 .056 .056 .056 .056 .19 .61 G .056 .056 .75 .056 .056 .19 .056 T .51 .65 .099 .79 .099 .24 .099 Position-specific scoring matricesIntroduction of pseudocounts**A. Sequence alignmenta**A T T T A G T A T C A A A A A T A A C A A T T C G T T C T G T A A C A A A G A C T A C A A A A C A T T T T G T A G C T A C T T A T A C T A T T T A A G C T G T A A C A A A A T C T A C C A A A T C A T T T G T A C A G T C T G T T A C C T T T A B. Table of occurrencesa A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1 C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2 G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2 Position-specific scoring matricesNormalization How to account for similarity due to similar base composition? Compare ScorePSSM / Scorebackground frequency 0.79 / 0.32 = 2.2**E. Position-specific scoring matrix (B = 0.1)c**A .006 .20 .006 .006 .99 .40 .20 .59 C .40 .004 .004 .004 .004 .20 .79 .004 G .004 .004 .98 .004 .004 .20 .004 .20 T .59 .79 .006 .99 .006 .20 .006 .20 F. Position-specific scoring matrix: Log-odds form (B = 0.1)c,d A 2.2 0.7 2.2 2.2 0.0 0.4 0.7 0.2 C 0.4 2.5 2.5 2.5 2.5 0.7 0.1 2.5 G 2.5 2.5 0.0 2.5 2.5 0.7 2.5 0.7 T 0.2 0.1 2.2 0.0 2.2 0.7 2.2 0.7 Position-specific scoring matricesLog odds form Log odds = -log(score) Score * score * score … log + log + log …**Position-specific scoring matricesDecrease complexity**through info analysis Uncertainty (Hc) = - Sum [piclog2(pic)] H1= -{[4/11 log2(4/11)] + [3/11 log2(3/11)] + [1/11 log2(1/11)] + [3/11 log2(3/11)]} = 1.87 H31= -{[1/11 log2(1/11)] + [1/11 log2(1/11)] + [1/11 log2(1/11)] + [8/11 log2(8/11)]} = 1.28 Information content = Sum (Hmax– Hc) (summed over all columns)

More Related