1 / 46

Tools for Comparative Sequence Analysis

Tools for Comparative Sequence Analysis. www.dcode.org. Ivan Ovcharenko Lawrence Livermore National Laboratory. A set of problems: http://www.dcode.org/bioquest.php. 1. Browsing genomes using synteny links 2. Aligning sequences to vertebrate genomes

teresa
Télécharger la présentation

Tools for Comparative Sequence Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools for Comparative SequenceAnalysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory

  2. A set of problems: http://www.dcode.org/bioquest.php 1. Browsing genomes using synteny links 2. Aligning sequences to vertebrate genomes 3. Aligning sequences to identify evolutionary conserved regions 4. Assigning function to regulatory elements 5. Decoding gene regulation using microarray data

  3. zPicture: Dynamic Alignment of Megabase-long Sequences and Genomes http://zpicture.dcode.org

  4. Automated sequence extraction and gene annotation zPicture http://zpicture.dcode.org/ I. Ovcharenko, G. Loots, R.C. Hardison, W. Miller, and Lisa Stubbs Genome Research, 14(3), 472-477 (2004)

  5. > 69149 115179 SLC6A2 69149 69197 UTR 69198 69471 exon 82066 82197 exon 84439 84676 exon 97643 97781 exon 104518 104652 exon 106610 106713 exon 107878 108002 exon 108825 108937 exon 110497 110625 exon 111069 111168 exon 112154 112254 exon 112739 112906 exon 114463 114534 exon 114923 114946 exon 114947 115179 UTR > 173279 186382 CESR 173279 173321 UTR 173322 173373 exon 177416 177623 exon 180095 180239 exon 182703 182836 exon 184865 185018 exon 185907 186077 exon 186078 186382 UTR > 173303 203537 CES1 173303 173321 UTR 173322 173373 exon 177419 177623 exon 180095 180239 exon 182703 182836 exon 184865 185018 exon 185907 186014 exon 186747 186851 exon 189424 189462 exon 193343 193483 exon 195380 195460 exon 195723 195870 exon 199927 200058 exon 202790 202862 exon 203159 203342 exon 203343 203537 UTR < 212212 242464 CES1 212212 212406 UTR 212407 212590 exon 212887 212959 exon 215691 215822 exon 219879 220026 exon 220289 220369 exon 222266 222406 exon 226287 226325 exon 228898 229002 exon 229735 229842 exon 230731 230884 exon 232913 233046 exon 235514 235658 exon 238133 238337 exon 242394 242445 exon 242446 242464 UTR < 229367 242488 CESR 229367 229671 UTR 229672 229842 exon 230731 230884 exon 232913 233046 exon 235514 235658 exon 238133 238340 exon 242394 242445 exon 242446 242488 UTR < 255598 284772 FLJ31547 255598 255832 UTR 255833 256064 exon 256150 256222 exon 262265 262412 exon 265761 265829 exon 268931 269071 exon 270794 270898 exon 272730 272834 exon 275344 275497 exon 279013 279146 exon 281027 281165 exon 283235 283439 exon >hg16_dna range=chr16:55400000-55800000 Tataatggctacctatttggagtgcctaccatgtattagtcattgtgcta actgatgtataggcatctcatttacagttcaactcatttgaacctaaatg aagaatagttgtttgtcccttattttatttaacaaaatttaaaactattt ctaagtcgctcattaaatgacaaagcttaaaccaaattttgtctgattgt aaaggccatacttttAATCATTTATATAAAACAACGCAGCCATATTTAAC TTCTGCCATATATTTTCTTACCGATGAATGATATATATCAAATGTTGACT TAGTTTTTAAATGGAAGACAGAAGCGGTTTAGAATGGCCTATTTTCAGTC AGCCAAAAATGTCAAAACCTTCTGTGAGTAGTCCAGGTACTGGAAATCAG ACAATTTGAACTTCAGGATACTACAATAATTTTTTCCTTTGTGGGTAGTG GTGGAGCATGAATTCTCTACTTCTTATTGGTCCTTCTGCTATGATGGCCC TTTCAGTCACACCTCTGTTCTCAAAATAAGAATATAATCAATAAAGTAGA GTTTGAGGGAACGGAGGACTAAGTCAAAAGTGGGATACCTAGGACTTCAT TCTAGttactgtggaattatctcctttgcttttcttcctgtttgtgcttt ttctatcctgttaattctcctgccttatggaaagcacagtgattgtttca cagcataaaccagacatcacttttccagtttaattttttttcaaaggccc ccattgcattttggaaaaaattcaaaatattcaacatggcctacaaagcc ctgtcacccttaaatagtgtgttgagtctggctcctacccacagtctaaa tctcaactgtctccaatcttctccctcactaaactcctaccagcaaatct tttcttcaaactggctaatgccctattctagcctcagagttttgtgctgc tgttctcttaggtacagtgtttttccccaagatttttatctggctttctc ttcttcatttagacttttaaacaaacagcttcatgaattacttgagatgt aattaatatacatacaatttacccatttaaggtatacattttaatgtttt tattatattcacagagttgtacaaccatcacactctaatttcagaacgtt ttcatcttgattcagattttaaatcaaatgtcacatcatccagtaggaac tccagtcactaattagaaatacccattatgtttttacacacattctcaat cccactacctgtttgttattgcacttgaacttacatgaaactatttactt gtttatacatttattgtctGTTATTCCTAGCACATAGAAGGTATGTCTGG CACATAGCAAACACTCGATCTTTGATGAATGAATGAATAATGATAACATT AACTTTTTTGCTTATTCTGCCTTGTATTGTGTAAGATTAGAGACaatcct tacaacaaacttgaaaacccagacttaacgatctctaaaactcacatgta agttaaggctcagagaagtttcatcacttgctcagagttacgtaactggt gaataccgaggctagatttcaaacccaaggctgcccggctctaaaTGAGG GGATATTTGATTAGGCCAAAGTAACCTGAACCCTTAAAATAACcaggctt taacttccagaaacatgggaactagataacctaagaacctgctggccacg aaacccctagaatactgaacacaatatcacaaacatattttgaaatgcat agatgagcatgtaaaatactgagggaactcctcaatggccaaaagtggaa agcagatgaaaaccagaactgtgtaaaagcctgaaagttacagtcgtcct gcagacatttgtcaatctcagtaacaaagggacttagtattttttggcta tggaagacaaaaacaagctttttgtataaggtgggaatgttgaactgaga cctcatgggagaaaaagcagatgaagggttagaggctcagtaaaagaatg aactggaaaaatccatcttctgacaaagaaagacaatgaggaaacttttc tgtcttgggctgggtgCTTGGTTGGAGCAGGGGGAAAGAATCTCTGATTT Automated sequence and gene annotation extraction http://zpicture.dcode.org/ chr16:55,400,000-…

  6. Dynamic rotation from Pip- to Smooth- plots Interactive parameter changes zPicture: dynamic & interactive alignments visualization tool. http://zpicture.dcode.org/

  7. zPicture: dynamic annotation

  8. zPicture: dynamic selection of conservation parameters 100bps/70% 500bps/85%

  9. zPicture: Aligning complete microbial genomes Mycobacterium leprae vs. Mycobacterium tuberculosis. Conservation of genes: NONhypothetical genes – 97% are conserved Hypothetical genes -- ∼20% are conserved

  10. rVista 2.0: Identification of Evolutionarily Conserved Transcription Factor Binding Sites http://rvista.dcode.org

  11. rVista 2.0 http://rvista.dcode.org/ Identification of Evolutionarily Conserved Transcription Factor Binding Sites http://globin.cse.psu.edu/gala http://zpicture.dcode.org http://ecrbrowser.dcode.org

  12. Human ACTTTGATACATCTATCTATA ||||||||||||||:||||||Mouse ACTTTGATACATCTCTCTATA Human ACTTTGATACATCTATCTATA |||||Mouse ACTTT---------------- Human ACTTTCCTACATCTATCTATA |||||::|||||||:||||||Mouse ACTTTGATACATCTCTCTATA Human -----GATACATCTATCTATA ||||| Mouse ACTTTGATAC-----------

  13. zPicture-rVista 2.0 interconnection zPicture rVista 2.0

  14. ECR Browser: Tool for Browsing Genome Conservation Profiles http://ecrbrowser.dcode.org

  15. http://ecrbrowser.dcode.org

  16. http://ecrbrowser.dcode.org

  17. http://ecrbrowser.dcode.org

  18. http://ecrbrowser.dcode.org

  19. http://ecrbrowser.dcode.org

  20. http://ecrbrowser.dcode.org

  21. http://ecrbrowser.dcode.org

  22. http://ecrbrowser.dcode.org Grab ECR :: direct access to a conserved element

  23. Genome Alignment: Align your sequence to a vertebrate genome

  24. AC146831 Genome Alignment

  25. Genome alignment: Output page

  26. ECR Browser contains rVista portal

  27. eShadow: Phylogenetic Shadowing of Closely Related Speicies http://eshadow.dcode.org

  28. http://eshadow.dcode.org eShadow: Phylogenetic Shadowing

  29. Phylogenetic shadowing on multiple (10-14) primate sequences Apo-B Plasminogen LXR-alpha CETP Boffelli et al., Science, 2003

  30. CREME: Using Microarray Data to Decode Genome Regulation http://crem.dcode.org

  31. TFBS in Promoter ECRs of RefSeq genes ~13k RefSeq loci ~8k Conserved promoters 414 TRANSFAC PWMs ~ 3M predicted TFBS

  32. TFBS in Promoter ECRs of RefSeq genes Testing Motif Abundances • Identify enriched motifs in a gene set relative to a background set. • Take into account length of promoters Filtering Similar PWMs • TRANSFAC contains many redundancies: • Different PWMs for the same TF. • Similar PWMs for TFs from the same family. • Filtering strategy: • For two PWMs that tend to co-occur in a very small window (4bp), remove the less enriched one.

  33. 5 coherently expressed Human Cell Cycle 336 genes, Whitfield et al. 02. 16 enriched PWMs 1089 modules E2F, NFY, CREB… 7 significant modules

  34. Human Cell Cycle DELTAEF1, EVI1, GR: 11 genes, p=0.01

  35. Validation on a known module • NFAT-AP1: • 10 known genes containing multiple regulatory elements. In all NFAT is upstream of AP1. • CREME reported the correct module only (p=0.01). • CREME correctly identified the correct orientation of the TFBS. • The module was identified even after adding 10 random promoters to the gene set.

  36. Colleagues and collaborators Lawrence Livermore National Laboratory UC, Berkeley Roded Sharan Gaby Loots Lisa Stubbs Stanford Asa Ben-Hur Sha Hammond Pennsylvania State University Lawrence Berkeley National Laboratory Marcelo Nobrega Ross Hardison Webb Miller Dario Boffelli www.dcode.org

More Related