Alignment II Dynamic Programming

Alignment IIDynamic Programming

Pair-wise sequence alignments A:C A T - T C A - C | | | | | B:C - T C G C A G C Idea: Display one sequence above another with spaces inserted in both to reveal similarity

Two types of alignment S =CTGTCGCTGCACG T =TGCCGTG Global alignment Local alignment CTGTCG-CTGCACG -TGC-CG-TG---- CTGTCGCTGCACG-- -------TGC-CGTG

Global alignment: Scoring CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches:  Mismatch penalty:  Space penalty:  score(A) = w – x - y w = #matchesx = #mismatches y = #spaces

Global alignment: Scoring Reward for matches: 10 Mismatch penalty: 2 Space penalty: 5 C T G T C G – C T G C - T G C – C G – T G - -5 10 10-2 -5 -2 -5 -5 10 10 -5 Total = 11

Optimum Alignment • The score of an alignment is a measure of its quality • Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score • The similarity between X and Y, denoted sim(X,Y), is the maximum score of an alignment of X and Y

Alignment algorithms • Global: Needleman-Wunsch • Local: Smith-Waterman • NW and SW use dynamic programming • Variations: • Gap penalty functions • Scoring matrices

Global Alignment: Algorithm

Theorem.C(i,j) satisfies the following relationships: Initial conditions: Recurrence relation: For 1  i n, 1  j m:

S1 S2 . . . Si-1 Si S1 S2 . . . Si-1 Si T1 T2 . . . Tj-1 Tj T1 T2 . . . Tj — C(i-1,j-1) + w(Si,Tj) C(i-1,j)  S1 S2 . . . Si — T1 T2 . . . Tj-1 Tj C(i,j-1)  Justification

Example Case 1: Line up Si with Tj i i - 1 S: C A T T C A C T: C - T T C A G j j -1 Case 2: Line up Si with space i - 1 i S: C A T T C A - C T: C - T T C A G - j Case 3: Line up Tj with space i S: C A T T C A C - T: C - T T C A - G j -1 j

C(i-1,j-1) C(i-1,j) C(i,j-1) Computation Procedure C(0,0) C(i,j) C(n,m)

-5 -10 -15 -20 -25 -30 -35 λ C T C G C A G C 0 -5 -10 -15 -20 -25 -30 -35 -40 λ 10 5 C A T T C A C +10 for match, -2 for mismatch, -5 for space

* * λ C T C G C A G C λ C A T T C A C Traceback can yield both optimum alignments

End-gap free alignment • Gaps at the start or end of alignment are not penalized Match: +2 Mismatch and space: -1 Best global Best end-gap free Score = 1 Score = 9

Motivation: Shotgun assembly • Shotgun assembly produces large set of partially overlapping subsequences from many copies of one unknown DNA sequence. • Problem: Use the overlapping sections to ”paste” the subsequences together. • Overlapping pairs will have low global alignmentscore, but high end-space free score because of overlap.

Motivation: Shotgun assembly

Algorithm • Same as global alignment, except: • Initialize with zeros (free gaps at start) • Locate max in the last row/column (free gaps at end)

0 0 0 0 0 0 0 0 0 0 0 5 8 5 8 5 20 15 10 0 0 15 10 5 6 15 18 13 0 -2 10 13 8 3 10 13 16 0 10 5 20 15 18 13 8 23 5 8 15 18 13 28 23 18 0 0 0 3 10 25 20 23 38 33 λ C T C G C A G C λ 10 5 10 5 10 5 0 10 C A T T C A G +10 for match, -2 for mismatch, -5 for gap

Local Alignment: Motivation • Ignoring stretches of non-coding DNA: • Non-coding regions are more likely to be subjected to mutations than coding regions. • Local alignment between two sequencesis likely to be between two exons. • Locating protein domains: • Proteins of different kind and of different species often exhibit local similarities • Local similarities may indicate ”functional subunits”.

Local alignment: Example S =g g t c t g a g T =a a a c g a Match: +2 Mismatch and space: -1 Best local alignment: g g tc t g ag a a ac – g a - Score = 5

Local Alignment: Algorithm C [i, j] = Score of optimally aligning a suffix of s with a suffix of t. Initialize top row and leftmost column to zero.

λ C T C G C A G C λ C A T T C A C +1 for a match, -1 for a mismatch, -5 for a space

Some Results • Most pairwise sequence alignment problems can be solved in O(mn) time. • Space requirement can be reduced to O(m+n), while keeping run-time fixed [Myers88]. • Highly similar sequences can be aligned in O(dn) time, where d measures the distance between the sequences [Landau86].

Reducing space requirements • O(mn) tables are often the limiting factor in computing large alignments • There is a linear space technique that only doubles the time required [Hirschberg77]

0 5 8 5 8 5 20 15 10 λ C T C G C A G C 0 0 0 0 0 0 0 0 0 λ 0 10 5 10 5 10 5 0 10 C A T T C A G IDEA: We only need the previous row to calculate the next

Linear-space Alignments mn + ½ mn + ¼ mn + 1/8 mn + 1/16 mn + … = 2 mn

Affine Gap Penalty Functions Gap penalty = h + gk where k = length of gap h = gap opening penalty g = gap continuation penalty Can also be solved in O(nm) time using dynamic programming

Alignment II Dynamic Programming

Alignment II Dynamic Programming

Presentation Transcript

Dynamic Programming: Sequence alignment

Dynamic Programming II

Dynamic Programming Algorithms II

Dynamic Programming (II)

Dynamic Programming for Pairwise Alignment

Dynamic Programming for Sequence alignment

Advanced Dynamic Programming II

Sequence Alignment and Dynamic Programming

Dynamic Programming (II)

Dynamic Programming for Pairwise Alignment 2

Global Sequence Alignment by Dynamic Programming

Dynamic programming part II

Sequence Alignment by Dynamic Programming

Dynamic Programming (II)(III)

Dynamic Programming II

Dynamic Programming II

Alignment II Dynamic Programming

Dynamic programming part II

Dynamic Programming: Sequence alignment

Dynamic Programming (II)

Dynamic Programming II

Dynamic Programming II