1 / 19

Multiple Sequence Alignment

Multiple Sequence Alignment. Dynamic Programming. Multiple Sequence Alignment. VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS  YAMYWVRQAPG LSLTCTVSGTSFDD  YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG  ATLVCLISDFYPGA  VTVAWKADS  ATLVCLISDFYPGA  VTVAWKADS 

Télécharger la présentation

Multiple Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Sequence Alignment Dynamic Programming

  2. Multiple Sequence Alignment VTISCTGSSSNIGAGNHVKWYQQLPG VTISCTGTSSNIGSITVNWYQQLPG LRLSCSSSGFIFSSYAMYWVRQAPG LSLTCTVSGTSFDDYYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG ATLVCLISDFYPGAVTVAWKADS ATLVCLISDFYPGAVTVAWKADS AALGCLVKDYFPEPVTVSWNSG- VSLTCLVKGFYPSDIAVEWESNG- • Goal: Bring the greatest number of similar characters into the same column of the alignment • Similar to alignment of two sequences.

  3. CLUSTALW MSA MSA of four oxidoreductase NAD binding domain protein sequences. Red: AVFPMILW. Blue: DE. Magenta: RHK. Green: STYHCNGQ. Grey: all others. Residue ranges are shown after sequence names. Chenna et al. Nucleic Acids Research, 2003, Vol. 31, No. 13 3497-3500

  4. Multiple Sequence Alignment: Motivation • Correspondence. Find out which parts “do the same thing” • Similar genes are conserved across widely divergent species, often performing similar functions • Structure prediction • Use knowledge of structure of one or more members of a protein MSA to predict structure of other members • Structure is more conserved than sequence • Create “profiles” for protein families • Allow us to search for other members of the family • Genome assembly: Automated reconstruction of “contig” maps of genomic fragments such as ESTs • MSA is the starting point for phylogenetic analysis

  5. Multiple Sequence Alignment: Approaches • Optimal Global Alignments -Dynamic programming • Generalization of Needleman-Wunsch • Find alignment that maximizes a score function • Computationally expensive: Time grows as product of sequence lengths • Global Progressive Alignments - Match closely-related sequences first using a guide tree • Global Iterative Alignments - Multiple re-building attempts to find best alignment • Local alignments • Profiles, Blocks, Patterns

  6. Scoring a multiple alignment A A A A C A C A C A C C A C A Sum of pairs Star Tree

  7. A AAA AAA AAA AAC ACC A C A A A A A A A C 10α + (6α - 4β) + (4α - 6β) A A A C Sum of Pairs = 20α - 10β

  8. Sum-of-Pairs Scoring Function Score of multiple alignment = ∑i <j score(Si,Sj) where score(Si,Sj) = score of induced pairwise alignment

  9. Induced Pairwise Alignment S1 S - T I S C T G - S - N I S2 L - T I – C N G S S - N I S3 L R T I S C S G F S Q N I Induced pairwise alignment of S1,S2: S1 S T I S C T G - S N I S2 L T I – C N G S S N I

  10. MSA: Dynamic Programming • The two-sequence alignment algorithm can be generalized to any number of sequences. • E.g., for three sequences X, Y, W defineC[i,j,k] = score of optimum alignment among X[1..i], Y[1..j], W[1..k] • As for two sequences, divide possible alignments into different classes, depending on how they end. • Use to devise recurrence relations for C[i,j,k] • C[i,j,k] is the maximum out of all possibilities

  11. MSA: 7 ways alignment can end for 3 sequences Xi Yj Wk X1 . . . Xi-1 Xi Y1 . . . Yj-1 Yj W1 . . . Wk-1 Wk - Yj Wk Xi - Wk Xi - - Xi Yj - - Yj - - - Wk

  12. V S N — S — S N A — — — — A S Dynamic programming for three sequences Each alignment is a path through the dynamic programming matrix S A A N S V S N S Start

  13. For 3 seqs. of length n, time is proportional to n3 Dynamic Programming for Three Sequences There are 7 ways to get to C[i,j,k] C[i,j,k] C[i-1,j,k-1] C[i-1,j-1,k-1] C[i-1,j,k-1] Enumerate all possibilities and choose the best one

  14. Dynamic Programming MSA: General Case • For k sequences of length n, dynamic programming algorithm does (2k-1)nkoperations • Example: 6 sequences of length 100 require6.4X1013 calculations • Space for table is nk • Implementations (e.g., WashU MSA 2.1) use tricks and only search subset of dynamic programming table • Even this is expensive. E.g., Baylor CM Search launcher limits MSA to 8 sequences of 800 characters and 10 minutes processing time

  15. Problems with SP scoring • Pair-wise comparisons can over-score evolutionarily distant pairs. • Reason: For 3 or more sequences, SP scoring does not correspond to any evolutionary tree But not:

  16. Overcoming problems with SP scoring • Use weights to incorporate evolution in sum of pairs scoring: • Some pair-wise alignments are more important than others • E.g., more important to have a good alignment between mouse and human sequences than mouse and bird • Assign different weights to different pair-wise alignments. • Weight decreases with evolutionary distance. • Use star tree approach • one sequence is assigned as the ancestor and all others are contrasted it.

  17. Star Alignments • Construct multiple alignments using pair-wise alignment relative to a fixed sequence • Out of a set S = {S1, S2, . . . , Sr} of sequences, pick sequence Sc that maximizesstar_score(c) = ∑ {sim(Sc, Si) : 1 ≤ i ≤ r, i ≠ c}where sim(Si, Sj) is the optimal score of a pair-wise alignment between Si and Sj

  18. Algorithm • Compute sim(Si, Sj) for every pair (i,j) • Compute star_score(i) for every i • Choose the index c that minimizes star_score(c) and make it the center of the star • Produce a multiple alignment M such that, for every i, the induced pairwise alignment of Sc and Si is the same as the optimum alignment of Sc and Si.

  19. Step 4: Detail ScA-ACC-TT S2AGACCGT- ScAA--CCTT S1AATGCC-- ScA-A--CC-TT S1A-ATGCC--- S2AGA--CCGT-

More Related