1 / 53

Comparative Sequence Analysis in Molecular Biology

Comparative Sequence Analysis in Molecular Biology. Martin Tompa Computer Science & Engineering Genome Sciences University of Washington Seattle, Washington, U.S.A. Outline. What genome data is available? What is phylogenetic footprinting?

len
Télécharger la présentation

Comparative Sequence Analysis in Molecular Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Sequence Analysisin Molecular Biology Martin Tompa Computer Science & Engineering Genome Sciences University of Washington Seattle, Washington, U.S.A.

  2. Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment

  3. Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment

  4. DNA: the cell’s program Cell DNA Nucleotide (A, C, G, or T)

  5. DNA TCCAACGGTGCTGAGGTGCAC Protein Gene DNA, Genes, and Proteins DNA: program for cell processes Proteins (and RNA): execute cell processes

  6. How Much DNA in a Cell? An organism’s genome is the total DNA in one of its cells. • How many nucleotides in a genome? M. tuberculosis bacterium 4,000,000 D. melanogaster fruit fly 200,000,000 H. sapiens human 3,000,000,000 P. nudum whisk fern 250,000,000,000 • How can we understand the genome’s program? • Lab benchwork is costly and time-consuming. • We will return to this question.

  7. How Many Genomes Are Available? • 46 vertebrate genomes sequenced (primates to rodents to marsupials to birds to fishes) • 1025 bacterial genomes sequenced (as of 4/6/2010) • Insects, fungi, worms, plants, … • Many more will be finished very soon • Fertile ground for comparative genomics

  8. 1982-2003: number of nucleotides in GenBank doubled every 18 months Since 2003: doubled every 3 years

  9. Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment

  10. Phylogenetic Footprinting(Tagle et al. 1988) • Functional regions of DNA (regions under “purifying constraint”) evolve slower than nonfunctional ones. • Consider a set of corresponding DNA sequences from related species. • Identify unusually well conserved subsequences (i.e., ones that have not mutated much over the course of evolution): “motifs”

  11. Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment

  12. How to Find Conserved Motifs ACTAACCGGGAGATTTCAGAhuman AAGTTCCGGGAGATTTCCAchimp TAGTTATCCGGGAGATTAGAmouse AAAACCGGTAGATTTCAGGrat

  13. Multiple Sequence Alignment AC--TAACCGGGAGATTTCAGA human AAGTT--CCGGGAGATTTCC-Achimp TAGTTATCCGGGAGATT--AGAmouse AA---AACCGGTAGATTTCAGGrat (Finding the optimal alignment is NP-complete.)

  14. Phylogenetic Footprinting • Use whole-genome multiple alignment such as provided by UCSC Genome Browser. • Search for regions of well conserved alignment. • Regulatory elements [Cliften; Kellis; Kolbe; Prakash; Woolfe; Xie (2)] • RNA elements [Pedersen; Washietl] • General conservation & constraint [Bejerano; Boffelli; Cooper; Margulies (4); Pollard; Prabhakar; Siepel]

  15. Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment

  16. Which Alignment Columns to Trust? • Vertebrate alignment has 3.8 billion columns • Automatically generated • Recent comparison (Margulies et al., 2007) of 4 whole-mammal alignment methods revealed widespread disagreement

  17. Which Alignment Columns to Trust?(with Amol Prakash, generalizing Karlin and Altschul 1990) Goal: label each alignment column with confidence measure of alignment correctness • Identify sequences that do not belong • Users forewarned about regions of interest • Genome browser designers consider realigning • Alignment tool designers get feedback for possible improvements

  18. Sample Suspicious Alignment Human -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Chimp -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Rhesus -----------GTTGCCATGC-AAAAATATTATGTCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Mouse -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CGTGTCAA----------TTAACAC Rat -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CGTGTCAA----------TTAACAC Dog -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Cow -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Elephant -----------GTTGCTATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Tenrec -----------GTTGCCATAC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Opossum -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATATCAA----------TTAACAC Chicken -----------GTTGCCATGCAAAAAATAATATGGCTTTACTAAAATTTACACAAC---CCTGACAA----------TTAACAC ZebrafishGAACATATCCGAGTGCTGTAA-AATACTACTGGGA----ACCAGAAATG—-ACAAGTTCCATGACAGCTTTGCCTTTTTGGCTC

  19. Human Chimp Mouse Rat Chicken Pr(12345| ) Pr(125 | ) Pr(34 |) • sc(12345 | ) = log() Scoring Function Pr(1,2) Pr(1)Pr(2) Pairwise:score(1,2) = log ( ) Multiple: 1 2 3 4 5

  20. Outline of Computation Input Multiple sequence alignment A For each branch k of the tree { Compute scoring function sck (Felsenstein) Find all maximally scoring segments of A usingsck(Ruzzo & Tompa) Compute K,  using sck (Karlin & Altschul) Compute p-value pk of each segment score using K, (Karlin & Altschul) } Output Discordance: maxkpk

  21. Suspicious Alignment Regions • Case study: human chromosome 1 alignment to 16 other vertebrates in UCSC Genome Browser • Identify suspicious alignment regions: • Length  50 bp • p-value  0.1 at each position, all with respect to the same branch k • At most 50% gapped columns

  22. Proposed Track on the UCSC Browser

  23. 247,000,000 9.7% 15% 3.3% 2.3% 26% 1.3% 29% 24% .004%

  24. Genomic Locations of Suspicious Regions 6% of chromosome 1 alignments containing mouse are exonic 35% of chromosome 1 alignments containing zebrafish are exonic

  25. Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment

  26. DNA TCCAACGGTGCTGAGGTGCAC Protein Gene DNA, Genes, and Proteins DNA: program for cell processes Proteins: execute cell processes

  27. Regulation of Genes • What turns genes on and off? • When is a gene turned on or off? • Where (in which cells) is a gene turned on? • How many copies of the gene product are produced?

  28. Regulation of Genes Transcription Factor RNA polymerase DNA Gene Regulatory Element

  29. Regulation of Genes Transcription Factor RNA polymerase DNA Gene Regulatory Element

  30. Goal • Identify regulatory elements in DNA sequences. These are: • Binding sites for proteins • Short subsequences (5-25 nucleotides) • Up to 1000 nucleotides (or farther) from gene • Inexactly repeating patterns (“motifs”)

  31. CLUSTALW multiple sequence alignment (rbcS gene) Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATT Pea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACA Tobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACC Ice-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC Turnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGC Wheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAA Duckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAA Larch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC Cotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----A Pea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------A Tobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA Ice-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAA Turnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------A Wheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC-------- Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATT Larch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA Cotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTA Pea GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTA Tobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATG Ice-plant GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGG Turnip CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATA Wheat CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG Duckweed TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATC Larch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTAC Pea TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAAC Tobacco CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plant TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC Larch TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA Turnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAG Wheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCC Duckweed CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG

  32. AGTCGTACGTGAC...(Human) AGTAGACGTGCCG...(Chimp) ACGTGAGATACGT...(Rabbit) GAACGGAGTACGT...(Mouse) TCGTGACGGTGAT... (Rat) Finding Short Motifs Size of motif sought: k = 4

  33. Most Parsimonious Solution AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... ACGT ACGT ACGT ACGG “Parsimony score”: 1 mutation (Finding the most parsimonious motif is NP-complete.)

  34. Substring Parsimony Problem • Given: • phylogenetic tree T, • set of orthologous sequences at leaves of T, • length k of motif • threshold d • Problem: • Find each set S of k-mers, one k-mer from each leaf, such that the parsimony score of S in Tis at most d. • This problem is NP-hard.

  35. … ACGG: +ACGT: 0 ... … ACGG:ACGT :0 ... … ACGG:ACGT :0 ... … ACGG:ACGT :0 ... … ACGG: 1 ACGT: 0 ... 4k entries AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TCGTGACGGTG … ACGG: 2ACGT: 1... … ACGG: 1ACGT: 1... … ACGG: 0ACGT: 2 ... … ACGG: 0 ACGT: +... FootPrinter’s Exact Algorithm(with Mathieu Blanchette, generalizing Sankoff and Rousseau 1975) Wu [s] = best parsimony score for subtree rooted at node u, if u is labeled with string s.

  36. Wu [s] =  min ( Wv [t] + d(s, t) ) v:child t ofu Average sequence length Number of species Total time O(n k (4k + l )) Motif length Running Time

  37. Improvements • Better algorithm reduces time from O(n k (42k + l ))toO(n k (4k + l )) • By restricting to motifs with parsimony score at most d, greatly reduce the number of table entries computed (exponential in d, polynomial in k) • Amenable to many useful extensions (e.g., allow insertions and deletions)

  38. Gilthead sea bream (678 bp) Medaka fish (1016 bp) Common carp (696 bp) Grass carp (917 bp) Chicken (871 bp) Human (646 bp) Rabbit (636 bp) Rat (966 bp) Mouse (684 bp) Hamster (1107 bp) Application to -actin Gene

  39. Common carp ACGGACTGTTACCACTTCACGCCGACTCAACTGCGCAGAGAAAAACTTCAAACGACAACATTGGCATGGCTTTTGTTATTTTTGGCGCTTGACTCAGGATCTAAAAACTGGAACGGCGAAGGTGACGGCAATGTTTTGGCAAATAAGCATCCCCGAAGTTCTACAATGCATCTGAGGACTCAATGTTTTTTTTTTTTTTTTTTCTTTAGTCATTCCAAATGTTTGTTAAATGCATTGTTCCGAAACTTATTTGCCTCTATGAAGGCTGCCCAGTAATTGGGAGCATACTTAACATTGTAGTATTGTATGTAAATTATGTAACAAAACAATGACTGGGTTTTTGTACTTTCAGCCTTAATCTTGGGTTTTTTTTTTTTTTTGGTTCCAAAAAACTAAGCTTTACCATTCAAGATGTAAAGGTTTCATTCCCCCTGGCATATTGAAAAAGCTGTGTGGAACGTGGCGGTGCAGACATTTGGTGGGGCCAACCTGTACACTGACTAATTCAAATAAAAGTGCACATGTAAGACATCCTACTCTGTGTGATTTTTCTGTTTGTGCTGAGTGAACTTGCTATGAAGTCTTTTAGTGCACTCTTTAATAAAAGTAGTCTTCCCTTAAAGTGTCCCTTCCCTTATGGCCTTCACATTTCTCAACTAGCGCTTCAACTAGAAAGCACTTTAGGGACTGGGATGC Chicken ACCGGACTGTTACCAACACCCACACCCCTGTGATGAAACAAAACCCATAAATGCGCATAAAACAAGACGAGATTGGCATGGCTTTATTTGTTTTTTCTTTTGGCGCTTGACTCAGGATTAAAAAACTGGAATGGTGAAGGTGTCAGCAGCAGTCTTAAAATGAAACATGTTGGAGCGAACGCCCCCAAAGTTCTACAATGCATCTGAGGACTTTGATTGTACATTTGTTTCTTTTTTAATAGTCATTCCAAATATTGTTATAATGCATTGTTACAGGAAGTTACTCGCCTCTGTGAAGGCAACAGCCCAGCTGGGAGGAGCCGGTACCAATTACTGGTGTTAGATGATAATTGCTTGTCTGTAAATTATGTAACCCAACAAGTGTCTTTTTGTATCTTCCGCCTTAAAAACAAAACACACTTGATCCTTTTTGGTTTGTCAAGCAAGCGGGCTGTGTTCCCCAGTGATAGATGTGAATGAAGGCTTTACAGTCCCCCACAGTCTAGGAGTAAAGTGCCAGTATGTGGGGGAGGGAGGGGCTACCTGTACACTGACTTAAGACCAGTTCAAATAAAAGTGCACACAATAGAGGCTTGACTGGTGTTGGTTTTTATTTCTGTGCTGCGCTGCTTGGCCGTTGGTAGCTGTTCTCATCTAGCCTTGCCAGCCTGTGTGGGTCAGCTATCTGCATGGGCTGCGTGCTGGTGCTGTCTGGTGCAGAGGTTGGATAAACCGTGATGATATTTCAGCAAGTGGGAGTTGGCTCTGATTCCATCCTGAGCTGCCATCAGTGTGTTCTGAAGGAAGCTGTTGGATGAGGGTGGGCTGAGTGCTGGGGGACAGCTGGGCTCAGTGGGACTGCAGCTGTGCT Human GCGGACTATGACTTAGTTGCGTTACACCCTTTCTTGACAAAACCTAACTTGCGCAGAAAACAAGATGAGATTGGCATGGCTTTATTTGTTTTTTTTGTTTTGTTTTGGTTTTTTTTTTTTTTTTGGCTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTTCACAATGTGGCCGAGGACTTTGATTGCATTGTTGTTTTTTTAATAGTCATTCCAAATATGAGATGCATTGTTACAGGAAGTCCCTTGCCATCCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCAAGTCCACACAGGGGAGGTGATAGCATTGCTTTCGTGTAAATTATGTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTTGAATGATGAGCCTTCGTGCCCCCCCTTCCCCCTTTTTGTCCCCCAACTTGAGATGTATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTTAAAAATGAGGCCAAGTGTGACTTTGTGGTGTGGCTGGGTTGGGGGCAGCAGAGGGTG Parsimony score over 10 vertebrates: 0 1 2

  40. Motifs Absent from Some Species • Find motifs • with small parsimony score • that span a large part of the tree • Example: in tree of 10 species spanning 760 Myrs, find all motifs with • score 0 spanning at least 250 Myrs • score 1 spanning at least 350 Myrs • score 2 spanning at least 450 Myrs • score 3 spanning at least 550 Myrs

  41. Application to c-fos Gene 10 Puffer fish Chicken Pig Mouse Hamster Human 7 2 2 1 2 2 1 0 1 Asked for motifs of length 10, with 0 mutations over tree of size 6 1 mutation over tree of size 11 2 mutations over tree of size 16 3 mutations over tree of size 21 4 mutations over tree of size 26 Found: 0 mutations over tree of size 8 1 mutation over tree of size 16 3 mutations over tree of size 21 4 mutations over tree of size 28

  42. Application to c-fos Gene Motif Score Conserved in Known? CAGGTGCGAATGTTC 0 4 mammals TTCCCGCCTCCCCTCCCC 0 4 mammals yes GAGTTGGCTGcagcc 3 puffer + 4 mammals GTTCCCGTCAATCcct 1 chicken + 4 mammals yes CACAGGATGTcc 4 all 6 yes AGGACATCTG 1 chicken + 4 mammals yes GTCAGCAGGTTTCCACG 0 4 mammals yes TACTCCAACCGC 0 4 mammals metK in B. subtilis

  43. Microbial Footprinting • 1105 prokaryotes with genomes completely sequenced (as of 4/6/2010) • For any prokaryotic gene of interest, plenty of close genes in other species available • Relatively simple genomes • MicroFootPrinter (with Shane Neph) • Designed specifically for phylogenetic footprinting in microbial genomes • undergraduate Computational Biology Capstone project • User specifies species and gene of interest • Automates collection of orthologous genes, cis-regulatory sequences, gene tree, parameters

  44. Demo • MicroFootPrinter home • Examples: Agrobacterium tumefaciens genes regulated by ChvI (with Eugene Nester) • chvI (two component response regulator) • ropB (outer membrane protein )

  45. Sample chvI motif Parsimony score: 2Span: 41.10Significance score: 4.22 B. henselae-151 GCTACAATTTR. etli -90 GCCACAATTTR. leguminosarum -106 GCCACAATTTS. meliloti -119 GCCACAATTTS. medicae -118 GCCACAATTTA. tumefaciens -105 GCCACAATTTM. loti -80 GCCACATTTTM. sp. -87 GCCACATTTTO. anthropi -158 GCCACATTTTB. suis -38 GCCACATTTTB. melitensis -156 GCCACATTTTB. abortus -156 GCCACATTTTB. ovis -156 GCCACATTTTB. canis -38 GCCACATTTT

  46. Sample ropB motif Parsimony score: 1Span: 20.70Significance score: 1.34 Jannaschia sp. -151CACATTTTGGR. etli -134CACAATTTGGR. leguminosarum -135CACAATTTGGA. tumefaciens -131CACATTTTGGS. meliloti -128CACATTTTGGS. medicae -128CACATTTTGG

  47. Combined ChvI Motif ropB: CACATTTTGG chvI: GCCACAATTT Atu1221: TTGTCACAAT ultimate: GYCACAWTTTGG Y={C,T} W={A,T}

  48. References and Acknowledgments • Amol Prakash & Martin Tompa, Measuring the Accuracy of Genome-Size Multiple Alignments. Genome Biology, June 2007, R124. • Mathieu Blanchette & Martin Tompa, Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting. Genome Research, May 2002, 739-748. • Shane Neph & Martin Tompa, MicroFootPrinter: a Tool for Phylogenetic Footprinting in Prokaryotic Genomes. Nucleic Acids Research, July 2006, W366-W368. • All software available atbio.cs.washington.edu/software.html

More Related