Smith-Waterman Algorithm for Local Sequence Alignments

Local Alignments Without Affine Gap penalties Smith and Waterman

The Smith and Waterman Algorithm 1…i 1…j-1 - X + F(i-1,j) + Gep 1…i-1 1…j-1 X X F(i-1,j-1) + Mat[i,j] + F(i,j-1) + Gep 1…i-1 1…j X - + 0 F(i,j)= best

The Smith and Waterman Algorithm 0  Ignore The rest of the Matrix  Terminate a local Aln

Filing Up a SW Matrix 0

Filling up a SW matrix: borders Easy:Local alignments NEVER start/end with a gap… * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 0A 0 T 0A 0 N 0 D 0 O 0 G 0

Filing Up a SW Matrix 0

Filling up a SW matrix * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 00 0 0 2 0 2 0 0 A 02 0 0 0 0 0 4 0T 00 0 0 0 0 0 2 6A 0 2 0 0 0 0 0 0 4N 0 0 4 2 0 0 0 0 2D 0 0 2 2 0 0 0 0 0O 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0 0 Best Local score  Beginning of the trace-back Match=2 MisMatch=-1 Gap=-1

for ($i=1; $i<=$len0; $i++) { for ($j=1; $j<=$len1; $j++) { if ($res0[0][$i-1] eq $res1[0][$j-1]){$s=2;} else {$s=-1;} $sub=$mat[$i-1][$j-1]+$s; $del=$mat[$i ][$j-1]+$gep; $ins=$mat[$i-1][$j ]+$gep; if ($sub>$del && $sub>$ins && $sub>0) {$smat[$i][$j]=$sub;$tb[$i][$j]=$subcode;} elsif($del>$ins && $del>0 ) {$smat[$i][$j]=$del;$tb[$i][$j]=$delcode;} elsif( $ins>0 ) {$smat[$i][$j]=$ins;$tb[$i][$j]=$inscode;} else {$smat[$i][$j]=$zero;$tb[$i][$j]=$stopcode;} if ($smat[$i][$j]> $best_score) { $best_score=$smat[$i][$j]; $best_i=$i; $best_j=$j; } } } TurningNW into SW PrepareTrace back

Sequence Alignment Variants Two basic variants of sequence alignment: • Global alignment (Needelman-Wunsch) • Local alignment (Smith-Waterman) • Overlap alignment • Affine cost for gaps We’ll use ideas of dynamic programming presented in the lecture

Overlap Alignment Consider the following problem: • Find the most significant overlap between two sequences S,T ? • Possible overlap relations: a. b. Difference from local alignment: Here we require alignment between the endpoints of the two sequences.

Overlap Alignment Formally: given S[1..n] , T[1..m] find i,j such that: d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) } is maximal. Solution: Same asGlobal alignment except we don’t not penalise overhanging ends.

Overlap Alignment • Initialization:F[i,0]=0,F[0,j]=0 Recurrence:as in global alignment Score:maximum value at the bottom line and rightmost line

Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5

Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme: Match: +4 Mismatch: -1 Indel: -5

Scoring scheme : Match: +4 Mismatch: -1 Indel: -5 -2 Overlap Alignment (Example) The best overlap is: PAWHEAE------ ---HEAGAWGHEE Pay attention! A different scoring scheme could yield a different result, such as: ---PAW-HEAE HEAGAWGHEE-

Adding Affine Gap Penalties The Gotoh Algorithm

Why Affine gap Penalties are Biologically better GOP Cost Cost=gop+L*gep GOP GOP Or Cost=gop+(L-1)*gep GOP GEP Parsimony: Evolution takes the simplest path (So We Think…) L Afine Gap Penalty

But Harder To compute… Opening Extension ? ? + Opening Extension More Than 3 Ways to extend an Alignment X - Deletion X-XX XXXX X X Alignment - X Insertion

More Questions Need to be asked For instance, what is the cost of an insertion ? 1…I-1 ??X 1…J-1 ??X 1…I ??- 1…J-1 ??X GEP GOP 1…I ??- 1…J ??X

Solution:Maintain 3 Tables Ix: Table that contains the score of every optimal alignment 1…i vs 1…j that finishes with an Insertion in sequence X. Iy: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an Insertion in sequence Y. M: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an alignment between sequence X and Y

M(i-1,j-1) + Mat(i,j) 1…i-1 1…j-1 X X + M(i,j)= best Ix(i-1,j-1) + Mat(i,j) Iy(i-1,j-1) + Mat(i,j) + 1…i-1 X 1…j X X - M(i-1,j) + gop Ix(i,j)= best + 1…i-1 X 1…j - X - Ix(i-1,j) + gep + 1…i X 1…j-1 X - X M(i,j-1) + gop Iy(i,j)= best + 1…i - 1…j-1 X - X Iy(i,j-1) + gep The Algorithm (Global Alignment) Initialization: M(0, 0) = 0, Ix(0, 0) = Iy(0, 0) = −∞ M(i, 0) = Ix(i, 0) = −Gop − (i − 1)Gep, Iy(i, 0) = −∞, for i = 1, . . . , n, and M(0, j) = Iy(0, j) = −Gop − (j − 1)Gep, Ix(0, j) = −∞, for j = 1, . . . ,m.

Linear-Space Alignment Banded Global Alignment, K-band Algorithm, …

Smith-Waterman Algorithm for Local Sequence Alignments

Smith-Waterman Algorithm for Local Sequence Alignments

Presentation Transcript

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties

Large-Scale Global Alignments Multiple Alignments

Affine Transformations

Alignments

Semi-Local Affine Parts for Object Recognition

Affine Transformations

Learning Local Affine Representations for Texture and Object Recognition

Affine Gap Penalties

Sequence Alignments

Affine Transformation

Bell’s theorem without inequalities and without alignments

Space Efficient Alignment Algorithms and Affine Gap Penalties

Dot Matrices Global Alignments Local Alignment

Affine

Global/Local/Multiple Alignments

Alignments

Penalties for Distributed Local Search

Time Penalties Delayed Penalties

Affine Transformations

Affine Transformations

Affine Transformations

Big Penalties For Companies Without Workers Comp Insurance