1 / 31

Multiple sequence alignment

Tutorial 5. Multiple sequence alignment. A. C. D. B. Multiple Sequence Alignment – When?. More than two sequences DNA Protein Evolutionary relation Homology  Phylogenetic tree Detect motif. GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC.

nerys
Télécharger la présentation

Multiple sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial 5 Multiple sequence alignment

  2. A C D B Multiple Sequence Alignment – When? • More than two sequences • DNA • Protein • Evolutionary relation • Homology  Phylogenetic tree • Detect motif GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

  3. A C D B Multiple Sequence Alignment – How? • Dynamic Programming • Optimal alignment • Exponential in #Sequences • Progressive • Efficient • Heuristic GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

  4. Hierarchical Clustering • A way to represent similarities graphically. • Sums up a pairwise distance matrix as a dendrogram. • Not all matrices can be embedded in a tree without error. TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC

  5. ClustalW “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

  6. ClustalW • Progressive (incremental) • At each step align two existing alignments or sequences. • Gaps present in older alignments remain fixed. • Uses the Neighbor Joining algorithm.

  7. Neighbor Joining Algorithm An agglomerative hierarchical clustering method. Constructs unrooted tree. 7

  8. Neighbor Joining (Not assuming equal divergence) • Step by step summary: • Calculate all pairwise distances. • Pick two nodes (i and j) for which the relative distance is minimal (lowest). • Define a new node (x). • Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. • Continue until two nodes remain – connect with edge.

  9. Step 1. Calculate all pairwise distances. E A D B C

  10. Measuring Distance • Problem: unrelated sequences approach a fraction of difference expected by chance  The distance measure converges. • Jukes-Cantor

  11. Measuring Distance (cont) • Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences • the score increases proportionally to the extent of dissimilarity between residues

  12. Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Relative distance between i and j Distance between i and j from the distance table Distance of i from all other sequences Number of leaves (=sequences) left in the tree

  13. Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).

  14. Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Etc. A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances

  15. Step 3. Define a new node (x) E A D B C X

  16. Step 4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Now we’ll calculate the distance from X to all other nodes.

  17. Step 5 - Continue until two nodes remain E A New Mi,j table D B C Y X

  18. E A New Di,j table D Only 2 nodes are left. Let’s calculate all the distances to Z B C Z Y X

  19. The tree 5 Z 9 C Y 20 X And in newick tree format 6 12 E B 4 10 D A ((C(D,E))(A,B))

  20. ClustalW - Input http://www.ebi.ac.uk/Tools/clustalw2/index.html Input sequences Scoring matrix Gap scoring Output format Email address

  21. ClustalW - Output Match strength in decreasing order: * : .

  22. ClustalW - Output

  23. ClustalW - Output

  24. ClustalW - Output

  25. ClustalW - Output Pairwise alignment scores Building tree Building alignment Final score

  26. ClustalW - Output

  27. ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .

  28. ClustalW - Output

  29. ClustalW - Output Branch length

  30. ClustalW - Output

  31. ClustalW - Output

More Related