Smith-Waterman Algorithm for Local Sequence Alignments
240 likes | 348 Vues
The Smith-Waterman Algorithm is used for local sequence alignments without affine gap penalties. It follows a dynamic programming approach to determine the optimal alignment between sequences. This method fills up a matrix to find the best local score, maximizing matches and penalizing mismatches and gaps. By preparing a trace-back, it identifies sequence alignment variants and offers various scoring schemes for matches, mismatches, and indels. Additionally, it discusses the significance of affine gap penalties and the Gotoh Algorithm in evolutionary biology.
Smith-Waterman Algorithm for Local Sequence Alignments
E N D
Presentation Transcript
Local Alignments Without Affine Gap penalties Smith and Waterman
The Smith and Waterman Algorithm 1…i 1…j-1 - X + F(i-1,j) + Gep 1…i-1 1…j-1 X X F(i-1,j-1) + Mat[i,j] + F(i,j-1) + Gep 1…i-1 1…j X - + 0 F(i,j)= best
The Smith and Waterman Algorithm 0 Ignore The rest of the Matrix Terminate a local Aln
Filling up a SW matrix: borders Easy:Local alignments NEVER start/end with a gap… * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 0A 0 T 0A 0 N 0 D 0 O 0 G 0
Filling up a SW matrix * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 00 0 0 2 0 2 0 0 A 02 0 0 0 0 0 4 0T 00 0 0 0 0 0 2 6A 0 2 0 0 0 0 0 0 4N 0 0 4 2 0 0 0 0 2D 0 0 2 2 0 0 0 0 0O 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0 0 Best Local score Beginning of the trace-back Match=2 MisMatch=-1 Gap=-1
for ($i=1; $i<=$len0; $i++) { for ($j=1; $j<=$len1; $j++) { if ($res0[0][$i-1] eq $res1[0][$j-1]){$s=2;} else {$s=-1;} $sub=$mat[$i-1][$j-1]+$s; $del=$mat[$i ][$j-1]+$gep; $ins=$mat[$i-1][$j ]+$gep; if ($sub>$del && $sub>$ins && $sub>0) {$smat[$i][$j]=$sub;$tb[$i][$j]=$subcode;} elsif($del>$ins && $del>0 ) {$smat[$i][$j]=$del;$tb[$i][$j]=$delcode;} elsif( $ins>0 ) {$smat[$i][$j]=$ins;$tb[$i][$j]=$inscode;} else {$smat[$i][$j]=$zero;$tb[$i][$j]=$stopcode;} if ($smat[$i][$j]> $best_score) { $best_score=$smat[$i][$j]; $best_i=$i; $best_j=$j; } } } TurningNW into SW PrepareTrace back
Sequence Alignment Variants Two basic variants of sequence alignment: • Global alignment (Needelman-Wunsch) • Local alignment (Smith-Waterman) • Overlap alignment • Affine cost for gaps We’ll use ideas of dynamic programming presented in the lecture
Overlap Alignment Consider the following problem: • Find the most significant overlap between two sequences S,T ? • Possible overlap relations: a. b. Difference from local alignment: Here we require alignment between the endpoints of the two sequences.
Overlap Alignment Formally: given S[1..n] , T[1..m] find i,j such that: d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) } is maximal. Solution: Same asGlobal alignment except we don’t not penalise overhanging ends.
Overlap Alignment • Initialization:F[i,0]=0,F[0,j]=0 Recurrence:as in global alignment Score:maximum value at the bottom line and rightmost line
Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5
Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5
Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme: Match: +4 Mismatch: -1 Indel: -5
Scoring scheme : Match: +4 Mismatch: -1 Indel: -5 -2 Overlap Alignment (Example) The best overlap is: PAWHEAE------ ---HEAGAWGHEE Pay attention! A different scoring scheme could yield a different result, such as: ---PAW-HEAE HEAGAWGHEE-
Adding Affine Gap Penalties The Gotoh Algorithm
Why Affine gap Penalties are Biologically better GOP Cost Cost=gop+L*gep GOP GOP Or Cost=gop+(L-1)*gep GOP GEP Parsimony: Evolution takes the simplest path (So We Think…) L Afine Gap Penalty
But Harder To compute… Opening Extension ? ? + Opening Extension More Than 3 Ways to extend an Alignment X - Deletion X-XX XXXX X X Alignment - X Insertion
More Questions Need to be asked For instance, what is the cost of an insertion ? 1…I-1 ??X 1…J-1 ??X 1…I ??- 1…J-1 ??X GEP GOP 1…I ??- 1…J ??X
Solution:Maintain 3 Tables Ix: Table that contains the score of every optimal alignment 1…i vs 1…j that finishes with an Insertion in sequence X. Iy: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an Insertion in sequence Y. M: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an alignment between sequence X and Y
M(i-1,j-1) + Mat(i,j) 1…i-1 1…j-1 X X + M(i,j)= best Ix(i-1,j-1) + Mat(i,j) Iy(i-1,j-1) + Mat(i,j) + 1…i-1 X 1…j X X - M(i-1,j) + gop Ix(i,j)= best + 1…i-1 X 1…j - X - Ix(i-1,j) + gep + 1…i X 1…j-1 X - X M(i,j-1) + gop Iy(i,j)= best + 1…i - 1…j-1 X - X Iy(i,j-1) + gep The Algorithm (Global Alignment) Initialization: M(0, 0) = 0, Ix(0, 0) = Iy(0, 0) = −∞ M(i, 0) = Ix(i, 0) = −Gop − (i − 1)Gep, Iy(i, 0) = −∞, for i = 1, . . . , n, and M(0, j) = Iy(0, j) = −Gop − (j − 1)Gep, Ix(0, j) = −∞, for j = 1, . . . ,m.
Linear-Space Alignment Banded Global Alignment, K-band Algorithm, …