1 / 29

Sequence Comparison

Sequence Comparison. Introduction Comparison Homogy -- Analogy Identity -- Similarity Pairwise -- Multiple Scoring Matrixes Gap -- indel Global -- Local M anual alignment , dot plot v isual inspection Dynamic programming Needleman-Wunsch exhaustive global alignment Smith-Waterman

vaughan
Télécharger la présentation

Sequence Comparison

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Comparison • Introduction • Comparison • Homogy -- Analogy • Identity -- Similarity • Pairwise -- Multiple • Scoring Matrixes • Gap -- indel • Global -- Local • Manual alignment, dot plot • visual inspection • Dynamic programming • Needleman-Wunsch • exhaustive global alignment • Smith-Waterman • exhaustive local alignment • Multiple alignment • Database search • BLAST • FASTA

  2. Sequence Comparison Multiple alignment (Multiple sequence alignment: MSA)

  3. Sequence Comparison Multiple alignment Multiple sequence alignment - Computational complexity V S N S _ S N A A N S V S N S

  4. Sequence Comparison Multiple alignment Multiple sequence alignment - Computational complexity Alignment of protein sequences with 200 amino acids using dynamic programming # of sequences CPU time (approx.) 2 1 sec 4 104 sec – 2,8 hours 5 106 sec – 11,6 days 6 108 sec – 3,2 years 7 1010 sec – 371 years

  5. Sequence Comparison Multiple alignment Approximate methods for MSA • Multidimensional dynamic programming(MSA, Lipman 1988) • Progressive alignments(Clustalw, Higgins 1996; PileUp, Genetics Computer Group (GCG)) • Local alignments(e.g. DiAlign, Morgenstern 1996; lots of others) • Iterative methods (e.g. PRRP, Gotoh 1996) • Statistical methods (e.g. Bayesian Hidden Markov Models)

  6. Sequence Comparison Multiple alignment Multiple sequence alignment - Programs Progressive Multidimentional Dynamic programming Clustal Tree based T-Coffee DCA MSA Combalign Dalign OMA Interalign Prrp Non tree based GA SAGA Sam HMMER GAs Iterative HMMS

  7. Sequence Comparison Multiple alignment Multiple sequence alignment - Computational complexity Program Seq type Alignment Methode Comment ClustalW Prot/DNA Global Progressive No format limitation Run on Windows too! PileUp Prot/DNA Global Progressive Limited by the format and UNIX based MultAlin Prot/DNA Global Progressive/Iterativ Limited by the format T-COFFEE Prot/DNA Global/local Progressive Can be slow

  8. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) • ClustalW uses a progressive algorithm. Instead of aligning all sequences at once, it adds them little by little. • Pairwise comparison of all sequences to align. • „Clustering by similarity“ resulting in a dendrogram. • Following the dendrogram topology, ClustalW aligns most similar pairs. • Each alignment is replaced by a consensus sequence and • further aligned as if it was a single sequence. • ClustalW treats multiple alignments like single sequences and aligns them progressively two-by-two. • Thus, alignment errors early in the procedure propagate throughout the whole MSA.

  9. 1 + 2 1 + 3 1 + 4 2 + 3 2 + 4 3 + 4 Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Principle: Pairwise Alignment Guide Tree Multiple Alignment by adding sequences 1 2 3 4 2 3 4 1 1 2 3

  10. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Pairwise Comparison of all sequences 1 : 2 1 : 3 1 : 4 1 : 5 2 : 3 2 : 4 2 : 5 3 : 4 3 : 5 4 : 5 Similarity score of every pair distance score of every pair

  11. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Sequence 1 2 3 4 5 Guide Tree 1 1 2 3 4 5 Distance Matrix: displays distances of all sequence pairs. 5 2 3 4

  12. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Guide Tree 1 5 2 3 4

  13. G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) G T C C G - C A G G T T - C G C C - G G T T A C T T C C A G G

  14. G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G . . . . and new gaps are inserted. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) G T C C G - C A G G T T - C G C C - G G T T A C T T C C A G G

  15. G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G A T C - T - - C A A T C T G - T C C C T A G Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) G T C C G - - C A G G T T - C G C - C - G G T T A C T T C C A G G A T C T - - C A A T C T G T C C C T A G

  16. core loops Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) CLUSTAL W (1.74) multiple sequence alignment sp|P20472|PRVA_HUMAN EDIKKAVGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFILKG sp|P32848|PRVA_MOUSE EDIKKAIGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSILKG sp|P18087|PRVA_RANCA GDISKAVEAFAAPDS--FNHKKFFEMCG------LKSKGPDVMKQVFGILDQDRSGFIEEDELCLMLKG sp|P02629|PRVA_LATCH EDIDKALNTFKEAGS--FDHHKFFNLVG------LKGKPDDTLKEVFGILDQDKSGYIEEEELKFVLKG sp|P02616|PRVB_AMPME KDIEAALSSVKAAES--FNYKTFFTKCG------LAGKPTDQVKKVFDILDQDKSGYIEEDELQLFLKN sp|P51879|ONCO_MOUSE DDIAAALQECQDPDT--FEPQKFFQTSG------LSKMSASQLKDIFQFIDNDQSGYLDEDELKYFLQR sp|P56503|PRVB_MERBI ADVAAALKACEAADS--FNYKAFFAKVG------LTAKSADDIKKAFFVIDQDKSGFIEEDELKLFLQV sp|P59747|PRVB_SCOJP AEVTAALDGCKAAGS--FDHKKFFKACG------LSGKSTDEVKKAFAIIDQDKSGFIEEEELKLFLQN sp|P02620|PRVB_MERME ADITAALAACKAEGS--FKHGEFFTKIG------LKGKSAADIKKVFGIIDQDKSDFVEEDELKLFLQN sp|P02630|PRVA_RAJCL ADITKALEQCAAG----FHHTAFFKASG------LSKKSDAELAEIFNVLDGDQSGYIEVEELKNFLKC sp|P02586|TPCS_RABIT EELDAIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEIFR- :: : :. *: : . * .:* : ..::: :** .:: * A star indicates an entirely conserved column. : A colon indicates columns, where all residues have roughly the same size and hydropathy. ● A period indicates columns, where the size or the hydropathy has been preserved in the course of evolution.

  17. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  18. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  19. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  20. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  21. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  22. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  23. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  24. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) >hso MNTWKEAIGQEKQQPYFQHILQQVQQARQSGRTIYPPQEEVFSAFRLTEFDQVRVVILGQDPYHGV NQAHGLAFSVKPGIAPPPSLVNIYKELSTDIMGFQTPSHGYLVGWAKQGVLLLNTVLTVEQGLAHSHANF GWETFTDRVIHVLNEQRDHLVFLLWGSHAQKKGQFIDRTKHCVLTSPHPSPLSAHRGFFGCRHFSKTNQY LRHHNLTEINWQLPMTI >pmu MKTWKDVIGTEKTQPYFKHILDQVHQARASGKIVYPPPQEVFSAFQLTEFEAVKVVIIGQDPYHGPNQAH GLAFSVKPGVVPPPSLMNMYKELTQDIEGFQIPNHGYLVPWAEQGVLLLNTVLTVEQGKAHSHASFGWET FTDRVIAALNAQREKLVFLLWGSHAQKKGQFIDRQKHCVFTAPHPSPLSAHRGFLGCRHFSKTNAYLMAQ GLSPIQWQLASL >hdu MNSWTEAIGEEKVQPYFQQLLQQVYQARASGKIIYPPQHEVFSAFALTDFKAVKVVILGQDPYHGPNQAH GLAFSVKPSVVPPPSLVNIYKELAQDIAGFQVPSHGYLIDWAKQGVLLLNTVLTVQQGMAHSHATLGWEI FTDKVIAQLNDHRENLVFLLWGSHAQKKGQFINRSRHCVLTAPHPSPLSAHRGFFGCQHFSKANAYLQSK GIATINWQLPLVV >apl MNNWTEALGEEKQQPYFQHILQQVHQERMNGVTVFPPQKEVFSAFALTEFKDVKVVILGQDPYHGPNQAH GLAFSVKPPVAPPPSLVNMYKELAQDVEGFQIPNHGYLVDWAKQGVLLLNTVLTVRQGQAHSHANFGWEI FTDKVIAQLNQHRENLVFLLWGSHAQKKGQFIDRSRHCVLTAPHPSPLSAYRGFFGCKHFSKTNRYLLSK GIAPINWQLRLEIDY >hin MKNWTDVIGTEKAQPYFQHTLQQVHLARASGKTIYPPQEDVFNAFKYTAFEDVKVVILGQDPYHGPNQAH GLAFSVKPEVAIPPSLLNIYKELTQDISGFQMPSNGYLVKWAEQGVLLLNTVLTVERGMAHSHANLGWER FTDKVIAVLNEHREKLVFLLWGSHAQKKGQMIDRTRHLVLTAPHPSPLSAHRGFFGCRHFSKTNSYLESH GIKPIDWQI >sfl MANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPG QAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLG WETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWL EQRGETPIDWMPVLPAECE

  25. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X)

  26. Sequence Comparison Multiple alignment Multiple sequence alignment – ClustalW (X) Clustal file format vibrio.aln Clustal file format vibrio.dnd CLUSTAL X (1.81) multiple sequence alignment hdu ----------------------MN---SWTEAIGEEKVQPYFQQLLQQVYQARASGKIIY apl ----------------------MN---NWTEALGEEKQQPYFQHILQQVHQERMNGVTVF hso ----------------------MN---TWKEAIGQEKQQPYFQHILQQVQQARQSGRTIY pmu ----------------------MK---TWKDVIGTEKTQPYFKHILDQVHQARASGKIVY hin ----------------------MK---NWTDVIGTEKAQPYFQHTLQQVHLARASGKTIY sfl ----------------------MANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIY eco -----------------------ANELTWHDVLAEEKQQPHFLNTLQTVASERQSGVTIY sen ----------------------MATELTWHDVLADEKQQPYFINTLHTVAGERQSGITVY vvu ----------------------MTQQLTWHDVIGAEKEQSYFQQTLNFVEAERQAGKVIY vpa ----------------------MNQSPTWHDVIGEEKKQSYFVDTLNFVEAERAAGKAIY vch ----------------------MSESLTWHDVIGNEKQQAYFQQTLQFVESQRQAGKVIY ype ----------------------MSPSLTWHDVIGQEKEQPYFKDTLAYVAAERRAGKTIY vfi ----------------------MA--LTWNSIISAEKKKAYYQSMSEKIDAQRSLGKSIF vsa ----------------------MN--TSWNDILETEKEKPYYQEMMTYINEARSQGKKIF son --------------------------MTWPAFIDHQRTQPYYQQLIAFVNQERQVGKVIY cbl --------------------MPK---LTWQLLLSQEKNLPYFKNIFTILNQQKKSGKIIY bap --------------------MDNRTLLNWSSILKNEKKKYYFINIINHLFFERQK-KMIF cbu -------------------MTTMAETQTWQTVLGEEKQEPYFQEILDFVKKERKAGKIIY dra --MTDQPDLFGLAPDAPRPIIPANLPEDWQEALLPEFSAPYFHELTDFLRQERKE-YTIY xax --MTE-------------GEGRIQLEPSWKARVGDWLLRPQMRELSAFLRQRKAAGARVF xca --MTE-------------GEGRIQLEPSWKARVGEWLLQPQMQELSAFLRQRKAANARVF xfa --MNEQGKAINSS-----AESRIQLESSWKAHVGNWLLRPEMRDLSSFLRARKVAGVSVY pfl MTMTA--------------DDRIKLEPSWKEALRAEFDQPYMTELRTFLQQERAAGKEIY psy --MTS--------------DDRIKLEPSWKEALRDEFEQPYMAQLREFLRQEHAAGKEIY ppu --MTD--------------DDRIKLEPSWKAALRGEFDQPYMHQLREFLRGEYAAGKEIY pae --MTDN-------------DDRIKLEASWKEALREEFDKPYMKQLGEFLRQEKAAGKAIF avi --MGRV-------------EDRVRLEASWKEALHDEFEKPYMQELSDFLRREKAAGKEIY mde --MQPN-------------GKHVQLCESWMQQIGQEFEQPYMAELKAFLLREKKAGKTIY * : : :: ( hso:0.11940, ( hdu:0.08584, apl:0.08905) :0.03531) :0.00478, pmu:0.11739) :0.00668, hin:0.10800) :0.04106, ( ( ( sfl:0.00482, eco:0.00833) :0.03744, sen:0.05007) :0.11285, ( ype:0.12645, ( ( vvu:0.07310, vpa:0.07734) :0.03829, vch:0.09446) :0.02842) :0.00533) :0.01680) :0.01604,

  27. Sequence Comparison Multiple alignment Multiple sequence alignment – T-Coffee • T_Coffee uses a principle that‘s a bit similar to ClustalW. • Yields more accurate alignments at the cost of computing time. • Builds a progressive alignment as ClustalW, but • Creates a library containing a complete collection of global (ClustalW) and local (Lalign) alignments and thus • Compares segments across the entire data set

  28. Sequence Comparison Multiple alignment Multiple sequence alignment - T-Coffee

  29. Sequence Comparison Multiple alignment Multiple sequence alignment - T-Coffee RED high-quality segments YELLOW GREEN BLUE regions, that you have no reasons to trust

More Related