1 / 20

Space Efficient Alignment Algorithms and Affine Gap Penalties

Space Efficient Alignment Algorithms and Affine Gap Penalties. Dr. Nancy Warter-Perez. Outline. Algorithm complexity Complexity of dynamic programming alignment algorithms Memory efficient algorithms Hirschberg’s Divide and Conquer algorithm Affine gap penalty. Algorithm Complexity.

patsy
Télécharger la présentation

Space Efficient Alignment Algorithms and Affine Gap Penalties

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez

  2. Outline • Algorithm complexity • Complexity of dynamic programming alignment algorithms • Memory efficient algorithms • Hirschberg’s Divide and Conquer algorithm • Affine gap penalty Space Efficient Alignment Algorithms

  3. Algorithm Complexity • Indicates the space and time (computational)efficiency of a program • Space complexity refers to how much memory is required to execute the algorithm • Time complexity refers to how long it will take to execute (compute) the algorithm • Generally written in Big-O notation • O represents the complexity (order) • n represents the size of the data set • Examples • O(n) – “order n”, linear complexity • O(n2) – “order n squared”, quadratic complexity • Constants and lower orders ignored • O(2n) = O(n) and O(n2 + n + 1) = O(n2) Space Efficient Alignment Algorithms

  4. Complexity of Dynamic Programming Algorithms for Global/Local Alignment • Time complexity – O(m*n) • For each cell in the score matrix, perform 3 operations • Compute Up, Left, and Diagonal scores • O(3*m*n) = O(m*n) • Space complexity – O(m*n) • Size of scoring matrix = m*n • Size of trace back matrix = m*n • O(2*m*n) = O(m*n) • Where, m and n are the lengths of the sequences being aligned. • Since m  n, O(n2 ) – quadratic complexity! Space Efficient Alignment Algorithms

  5. Memory Requirements • For a sequence of 200-500 amino acids or nucleotides • O(n2) = 5002 = 250,000 • If store each score as a 32-bit value = 4 bytes, it requires 1,000,000 bytes to represent the scoring matrix! • If store each trace back symbol as a character (8-bit value), it requires 250,000 bytes to represent the trace back matrix Space Efficient Alignment Algorithms

  6. Simple Improvement for Scoring Matrix • In reality, the space complexity of the scoring matrix is only linear, i.e., O(2*min(m,n)) = O(min(m,n)) • O(min(m,n))  O(n) for sequences of comparable lengths • 2,000 bytes (instead of 1 million) • But, trace back still quadratic space complexity Space Efficient Alignment Algorithms

  7. Source m/2 m m/2 m (0,0) (0,0) middle i n n (n,m) (n,m) Sink m/2 m m (0,0) (0,0) middle middle middle n (n,m) n (n,m) m m (0,0) (0,0) n (n,m) n (n,m) Hirschberg’s “Divide and Conquer” Space Efficient Algorithm • Compute the score matrix(s) between the source (0,0) and (n, m/2). Save m/2 column of s. Compute the reverse score matrix (sreverse) between the sink (n, m) and (0,m/2). Save the m/2 column of sreverse. • Find middle (i, m/2) satisfies max 0 in {s(i, m/2) + sreverse(n-i, m/2)} • Recursively partition problem into 2 subproblems Space Efficient Alignment Algorithms

  8. Pseudo Code of Space-Efficient Alignment Algorithm Path (source, sink) If source and sink are in consecutive columns output the longest path from the source to the sink Else middle middle vertex between source and sink Path (source, middle) Path (middle, sink) Space Efficient Alignment Algorithms

  9. Complexity of Space-Efficient Alignment Algorithm • Time complexity • Equal to the sum of the areas of the rectangles Area + ½ Area + ¼ Area + …  2*Area where, Area = n*m • O(2n*m) = O(n*m) • Quadratic time/computation complexity (same as before) • Space complexity • Need to save a column of s and sreverse for each computation (but can discard after computing middle) • O(min(n,m)) – if m < n, switch the sequences (or save a row of s and sreverse instead) • Linear space complexity!! • Reference:http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/ Space Efficient Alignment Algorithms

  10. Gap Penalties • Gap penalties account for the introduction of a gap - on the evolutionary model, an insertion or deletion mutation - in both nucleotide and protein sequences, and therefore the penalty values should be proportional to the expected rate of such mutations. http://en.wikipedia.org/wiki/Sequence_alignment#Assessment_of_significance Space Efficient Alignment Algorithms

  11. Space Efficient Alignment Algorithms

  12. Source: http://www.apl.jhu.edu/~przytyck/Lect03_2005.pdf Space Efficient Alignment Algorithms

  13. Space Efficient Alignment Algorithms

  14. Space Efficient Alignment Algorithms

  15. Space Efficient Alignment Algorithms

  16. Space Efficient Alignment Algorithms

  17. Space Efficient Alignment Algorithms

  18. Space Efficient Alignment Algorithms

  19. Project Verification - Use EMBOSS Pairwise Alignment Tool http://www.ebi.ac.uk/Tools/emboss/align/index.html Space Efficient Alignment Algorithms

  20. Project Verification – LALIGNhttp://www.ch.embnet.org/software/LALIGN_form.html Space Efficient Alignment Algorithms

More Related