1 / 33

Developing Pairwise Sequence Alignment Algorithms

Developing Pairwise Sequence Alignment Algorithms. Dr. Nancy Warter-Perez. Outline. Group assignments for project Overview of global and local alignment References for sequence alignment algorithms Discussion of Needleman-Wunsch iterative approach to global alignment

Lucy
Télécharger la présentation

Developing Pairwise Sequence Alignment Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez

  2. Outline • Group assignments for project • Overview of global and local alignment • References for sequence alignment algorithms • Discussion of Needleman-Wunsch iterative approach to global alignment • Discussion of Smith-Waterman recursive approach to local alignment • Discussion Discussion of LCS Algorithm and how it can be extended for • Global alignment (Needleman-Wunsch) • Local alignment (Smith-Waterman) • Affine gap penalties Developing Pairwise Sequence Alignment Algorithms

  3. Overview of Pairwise Sequence Alignment • Dynamic Programming • Applied to optimization problems • Useful when • Problem can be recursively divided into sub-problems • Sub-problems are not independent • Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty). • Smith-Waterman is a local alignment technique that uses a recursive algorithm and can use alternative gap penalties (such as affine). Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment. • Note: Needleman-Wunsch is usually used to refer to global alignment regardless of the algorithm used. Developing Pairwise Sequence Alignment Algorithms

  4. Project References • http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html • Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner • Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman • Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield Developing Pairwise Sequence Alignment Algorithms

  5. Classic Papers • Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. (http://www.cs.umd.edu/class/spring2003/cmsc838t/papers/needlemanandwunsch1970.pdf) • Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://www.cmb.usc.edu/papers/msw_papers/msw-042.pdf) Developing Pairwise Sequence Alignment Algorithms

  6. Needleman-Wunsch (1 of 3) Match = 1 Mismatch = 0 Gap = 0 Developing Pairwise Sequence Alignment Algorithms

  7. Needleman-Wunsch (2 of 3) Developing Pairwise Sequence Alignment Algorithms

  8. Needleman-Wunsch (3 of 3) From page 446: It isapparent that the above array operation can begin at any of anumber of points along the borders of the array, which is equivalent to a comparison of N-terminal residues or C-terminal residues only. Aslong as the appropriate rules for pathways are followed, the maximum match willbe the same. The cells of the array which contributed to the maximum match, may be determined by recording the origin of the number that was added to each cell when the array was operated upon. Developing Pairwise Sequence Alignment Algorithms

  9. Smith-Waterman (1 of 3) Algorithm The twomolecular sequences will be A=a1a2. . . an, and B=b1b2. . . bm. A similarity s(a,b) isgiven between sequence elements a and b. Deletions oflength k are given weight Wk. To find pairs of segments with high degrees ofsimilarity, we set up amatrix H . First set Hk0 = Hol= 0 for 0 <= k <= nand 0 <= l <= m. Preliminary values ofH have the interpretation that H i jis the maximum similarity of twosegments ending in aiandbj. respectively. These values are obtained from the relationship Hij=max{Hi-1,j-1+ s(ai,bj), max {Hi-k,j – Wk}, max{Hi,j-l - Wl }, 0}( 1 ) k >= 1 l >= 1 1 <= i <= n and 1 <= j <= m. Developing Pairwise Sequence Alignment Algorithms

  10. Smith-Waterman (2 of 3) • The formula for Hijfollows byconsidering the possibilities forending the segments at any ai and bj. • If aiand bj are associated, the similarity is • Hi-l,j-l + s(ai,bj). • (2) If aiis at the end of a deletion of length k, the similarity is • Hi – k, j - Wk . • (3) If bjis at the end of a deletion of length 1, the similarity is • Hi,j-l - Wl. (typo in paper) • (4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up toai and bj. Developing Pairwise Sequence Alignment Algorithms

  11. Smith-Waterman (3 of 3) The pair of segments with maximum similarity is found by first locating the maximum element of H. The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero. This procedure identifies the segments as well as produces the corresponding alignment. The pair of segments with the next best similarity is found by applying the traceback procedure tothe second largest element of H not associated with the first traceback. Developing Pairwise Sequence Alignment Algorithms

  12. Longest Common Subsequence (LCS) Problem • Reference: Pevzner • Can have insertion and deletions but no substitutions (no mismatches) • Ex: V: ATCTGAT W: TGCATA LCS: TCTA Developing Pairwise Sequence Alignment Algorithms

  13. LCS Problem (cont.) • Similarity score si-1,j si,j = max { si,j-1 si-1,j-1 + 1, if vi = wj • On board example: Pevzner Fig 6.1 Developing Pairwise Sequence Alignment Algorithms

  14. Indels – insertions and deletions (e.g., gaps) • alignment of V and W • V = rows of similarity matrix (vertical axis) • W = columns of similarity matrix (horizontal axis) • Space (gap) in W  (UP) • insertion • Space (gap) in V  (LEFT) • deletion • Match (no mismatch in LCS) (DIAG) Developing Pairwise Sequence Alignment Algorithms

  15. LCS(V,W) Algorithm for i = 1 to n si,0 = 0 for j = 1 to n s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT Developing Pairwise Sequence Alignment Algorithms

  16. Print-LCS(b,V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, I, j-1) Developing Pairwise Sequence Alignment Algorithms

  17. Extend LCS to Global Alignment si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = fixed gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM • Modify LCS and PRINT-LCS algorithms to support global alignment (On board discussion) Developing Pairwise Sequence Alignment Algorithms

  18. Extend to Local Alignment 0 (no negative scores) si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = fixed gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM Developing Pairwise Sequence Alignment Algorithms

  19. Gap Penalties • Gap penalties account for the introduction of a gap - on the evolutionary model, an insertion or deletion mutation - in both nucleotide and protein sequences, and therefore the penalty values should be proportional to the expected rate of such mutations. http://en.wikipedia.org/wiki/Sequence_alignment#Assessment_of_significance Developing Pairwise Sequence Alignment Algorithms

  20. Discussion on adding affine gap penalties • Affine gap penalty • Score for a gap of length x -( + x) • Where •  > 0 is the insert gap penalty •  > 0 is the extend gap penalty Developing Pairwise Sequence Alignment Algorithms

  21. Alignment with Gap PenaltiesCan apply to global or local (w/ zero) algorithms si,j = max { si-1,j -  si-1,j - ( + ) si,j = max { si1,j-1 -  si,j-1 - ( + ) si-1,j-1 + (vi, wj) si,j = max { si,j si,j Note: keeping with traversal order in Figure 6.1,  is replaced by , and  is replaced by  Developing Pairwise Sequence Alignment Algorithms

  22. Developing Pairwise Sequence Alignment Algorithms

  23. Source: http://www.apl.jhu.edu/~przytyck/Lect03_2005.pdf Developing Pairwise Sequence Alignment Algorithms

  24. Developing Pairwise Sequence Alignment Algorithms

  25. Developing Pairwise Sequence Alignment Algorithms

  26. Developing Pairwise Sequence Alignment Algorithms

  27. Developing Pairwise Sequence Alignment Algorithms

  28. Developing Pairwise Sequence Alignment Algorithms

  29. Scopes • Scopes divine the “visibility” of a variable • Variables defined outside of a function are visible to all of the functions within a module (file) • Variables defined within a function are local to that function • To make a variable that is defined within a function global, use the global keyword Ex 2: x = 5 def fnc(): global x x = 2 print x, fnc() print x >>> 2 2 Ex 1: x = 5 def fnc(): x = 2 print x, fnc() print x >>> 2 5 Developing Pairwise Sequence Alignment Algorithms

  30. Modules • Why use? • Code reuse • System namespace partitioning (avoid name clashes) • Implementing shared services or data • How to structure a Program • One top-level file • Main control flow of program • Zero or more supplemental files known as modules • Libraries of tools Developing Pairwise Sequence Alignment Algorithms

  31. Modules - Import • Import – used to gain access to tools in modules Ex: contents of file b.py def spam(text): print text, 'spam' contents of file a.py import b b.spam('gumby') Developing Pairwise Sequence Alignment Algorithms

  32. Programming Workshop and Homework – Implement LCS • Workshop – Write a Python script to implement LCS (V, W). Prompt the user for 2 sequences (V and W) and display b and s • Homework (due Tuesday, May 20th) – Add the Print-LCS(V, i, j) function to your Python script. The script should prompt the user for 2 sequences and print the longest common sequence. Developing Pairwise Sequence Alignment Algorithms

  33. Project Teams and Presentation Assignments • Pre-Project (Pam/Blosum Matrix Creation) • Ricardo Galdamez and Heather Ashley • Base Project (Global Alignment): • Maria Ortega and Winta Stefanos • Extension 1 (Ends-Free Global Alignment): • Mohammed Ali and Bingyan Wang • Extension 2 (Local Alignment): • DeWayne Anderson and Yisel Tobar • Extension 3 (Database): • John Tran and Tan Truong • Extension 4 (Local Alignment, print all alignments): • Maria Ho and Aras Pirbadian • Extension 5 (Affine Gap Penalty): • Jun Nakano and David Pachiden Developing Pairwise Sequence Alignment Algorithms

More Related