200 likes | 453 Vues
Global Alignment Summary. If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score( i , j) = max Score(i-1, j-1) + m if A[i] == B[j ]
E N D
Global Alignment Summary • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] Score(i, 0) = i * g Score(j, 0) = j * g • Identifying the actual alignment is done by tracing back the pointers starting at lower-right corner
Global Alignment Algorithm To compute GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize the cells of row 0 and column 0 only 2. for each column c, set cell(0, c) to i*gap 3. for each row r, set cell(r, 0) to i*gap 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score)
Global Alignment Example • Align CACTAG and GATTACA using g = -2, s = -1, m = 2
Semi-Global Alignment • Motivation CAGCACTTGGATTCTCGG (global alignment) CAGC––––G––T––––GG CAGCA-CTTGGATTCTCGG (semi-global alignment) –––CAGCGTGG–––––––– • Second alignment may be preferable despite the lower score • Modify the algorithm so that terminal gaps are not penalized (i.e. gaps at both ends)
Semi-Global Alignment • Modify the algorithm so that terminal gaps are not penalized
Semi-Global Alignment Summary • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] Score(i, 0) = 0 Score(j, 0) = 0 Gap cost g is set to 0 for last row and last column • Identifying actual alignment same as global alignment
Semi-Global Alignment Algorithm To compute SEMI-GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize to 0 the cells of row 0 and column 0 2. for each column c, set cell(0, c) to 0 (no gap pen.) 3. for each row r, set cell(r, 0) to 0 (no gap penalty) 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 using gap penalty of 0 for last row and for last columns 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score)
Semi-Global Alignment Example • Align GACTATGA andATTAusing g = -2, s = -1, m = 2
Local Alignment • Goal is to find two substrings (common regions) from the two sequences that have the highest global alignment score AAAACCCCCGGGGTTA TTCCCGGGAACCAACC • Similar to previous two methods, but stops extending the current sub-alignment until its score becomes negative
Local Alignment • Modify the algorithm to identify high score common fragment
Local Alignment • Align GACTATGA and ATTA using g = -2, s = -1, m = 2
Local Alignment • Align GACTATGA and ATTA using g = -2, s = -1, m = 2
Local Alignment • Align GACTATGA and ATTA using g = -2, s = -1, m = 2
Local Alignment Example T C CCC T G G A A C C A A C C ------------------------------------------------- |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| A|0 0 0 0 0 0 0 0 0 2 2 0 0 2 2 0 0| A|0 0 0 0 0 0 0 0 0 2 4 2 0 2 4 2 0| A|0 0 0 0 0 0 0 0 0 2 4 3 1 2 4 3 1| A|0 0 0 0 0 0 0 0 0 2 4 3 2 3 4 3 2| C|0 0 2 2 2 2 0 0 0 0 2 6 5 3 2 6 5| C|0 0 2 4 4 4 2 0 0 0 0 4 8 6 4 4 8| C|0 0 2 4 6 6 4 2 0 0 0 2 6 7 5 6 6| C|0 0 2 4 6 8 6 4 2 0 0 2 4 5 6 7 8| C|0 0 2 4 6 8 7 5 3 1 0 2 4 3 4 8 9| G|0 0 0 2 4 6 7 9 7 5 3 1 2 3 2 6 7| G|0 0 0 0 2 4 5 9 11 9 7 5 3 1 2 4 5| G|0 0 0 0 0 2 3 7 11 10 8 6 4 2 0 2 3| G|0 0 0 0 0 0 1 5 9 10 9 7 5 3 1 0 1| T|0 2 0 0 0 0 2 3 7 8 9 8 6 4 2 0 0| T|0 2 1 0 0 0 2 1 5 6 7 8 7 5 3 1 0| A|0 0 1 0 0 0 0 1 3 7 8 6 7 9 7 5 3|
Local Alignment Summary • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] 0 Score(i, 0) = 0 Score(j, 0) = 0 Gap cost g is set to 0 for last row and last column • Recovering Alignment: Find the entry with highest value anywhere in the matrix and use that as the starting point for tracing back until a 0 is found
Local Alignment Algorithm To compute LOCAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize to 0 the cells of row 0 and column 0 2. for each column c, set cell(0, c) to 0 3. for each row r, set cell(r, 0) to 0 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 using gap penalty of 0 for last row and for last columns 7. set the current cell to the largest value of 0, option1, option2, option3 8. return the Matrix (or highest score)
global alignment Needleman SB, Wunsch CD. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". J Mol Biol48 (3): 443-53. semiglobal alignment local alignment Smith TF, Waterman MS (1981). "Identification of Common Molecular Subsequences". J Mol Biol147: 195–197 Images from from UMN CS5481
Gap Penalty Revisited • So far used uniform gap penalty, i.e. k gaps = k*g penalty • Another possibility is to use two types of gap penalty • gap opening penalty (go) – for starting a gapped region • gap extension penalty (ge) – for continuing a gap region • typically gap opening penalty set higher (biased against gaps) and gap extension penalty is lower (once gap region started, ok to extend) • Gap penalty G for k gaps now becomes G(k) = go + (k-1)*ge (also called affine gap penalty)
Affine Gap Penalty • Modify the algorithm to support gap open/extension penalty
Global Alignment, Affine Gap • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-k, j) + G(k) 1 ≤ k ≤ i Score(i, j-k) + G(k) 1 ≤ k ≤ i Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] Score(i, 0) = G(i) Score(j, 0) = G(j) • Horizontally and Vertically now need to try all cells for possible source of gap opening