1 / 15

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties

Saurabh Sinha 02/05/2008 Department of Computer Science University of Illinois Urbana-Champaign Scribed By: Chandrasekar Ramachandran. Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties. Contents. Introduction Interpretations Types of Alignments

nicodemus
Télécharger la présentation

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Saurabh Sinha 02/05/2008 Department of Computer Science University of Illinois Urbana-Champaign Scribed By: Chandrasekar Ramachandran Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties

  2. Contents Introduction Interpretations Types of Alignments Techniques for Solving Dynamic Programming Probabilistic Methods Scoring Functions N-W and S-W Affine Gap Penalties

  3. Introduction Sequence Alignment: Ways of Arranging one sequence(DNA,RNA,Protein) on another to determine whether a region has been conserved in evolution or has a common evolutionary origin Strings of Letters Matrix Representation: - G G C C A G G A T T G G G C C - G G - T T

  4. Interpretations Mismatches? Point Mutations: Replacement of a Single Base Nucleotide Categorized as Transitions and Transversions Gaps? Indels or Insertion/Deletion Mutations Can produce Frameshift Mutations Unless Multiple of 3 Introduced in one or both lineages

  5. Interpretations(Contd.) What about Amino Acids? Degree of Similarity Estimates Conservation If Conservation is Less: Indicates Region of High Importance Estimating Similar Functional Roles: By Assessing Similarity of Base Pairing

  6. Solving Sequence Alignment Problems Dynamic Programming Initialization Matrix Fill or Scoring Traceback Probabilistic Methods Bayesian Methods for HMM Likelihood Derivatives and Fisher Scores Training and Model Comparison

  7. Needleman-Wunsch Algorithm(Global Alignment)‏ Scores for Aligned Functions Specified by a Similarity Matrix Example: Sequence 1: -CCGCTTACCTA Sequence 2: TTCCGCTTATTA Possible Alignments: Sequence 1:-CCGCTTACCTA Sequence 2:-CCGCTTA- - - - Score Matches,Gaps and Indels Separately

  8. Global Alignment(contd.)‏ The Scoring Matrix is Called F-Matrix Each (I,j) entry denoted by Fij Running Time: For Sequences of size a and b, O(ab)‏ Summary: Initialization: Fill in Base Cases in Topmost Row and Leftmost Column Filling Partial Alignments: Traceback: Trace back to Initial Pointer Matrix to get best solution

  9. Smith-Waterman Algorithm(Local Alignment)‏ Involving Stretches Shorter than the Entire Sequence Length Generally involves Sequences which are significantly dissimilar Negative Scoring Matrix Cells are Set to Zero Backtracking starts at highest scoring cell and continues to a cell with zero score Prerequisite: Negative Expectation Score

  10. Scoring Functions - Overview Given sequences, a number is associated with each alignment E.g Matches : +x, Mismatches: -y,Gaps: -z Scoring Function: (x X #Matches) –(y X #mismatches) – (z X #Gaps)‏ Alignment Scores: Sum of Substitution Scores and Gap Penalties Residue-Based Substitution Matrices: Protein Evolutionary

  11. Simple Substitution Matrices Expresses How one Character in a Sequence Changes with Other Character States N X N Matrix where: N=4 for DNA and 20 for Amino Acids Another way would be to consider A,G as Purines and T,C as Pyrimidines Purines less likely to occur than Pyrimidines

  12. Minimum Entropy Scoring Function Minimum Entropy Score: Sum of Entropy Scores Computed For Each Column Here, i is a column ciathe counts of letter a at column I piathe inferred probability Gap Characters: Residue Symbols

  13. Gap Functions • Gaps More Likely to Occur in Groups • Examples: • Convex Gap Scoring Functions • Affine Gap Functions • Convex Gap Scoring Functions: • Penalties decrease as Gaps Get Longer • γ(n):for all n, γ(n + 1) - γ(n) ≤ γ(n) - γ(n – 1) • Now F(i,j) = max { F(i-1,j-1) + s(xi,yj) maxk=0...i-1 F(k,j) –γ(i-k) maxk=0...j-1 F(i,k) –γ(j-k)

  14. Affine Gap Functions • Shortcomings of a general gap penalty function: • Different Penalties for Additional Gaps • Cubic Time for Updating Entries • Example: • First Gap Penalized Differently, Subsequent Gaps Penalized Linearly • 3 Matrices Computed Simultaneously

  15. References • http://webcourse.cs.technion.ac.il/236522/Winter2005-2006/ho/WCFiles/tutorial03.ppt • http://engr.smu.edu/~saad/courses/cse8354/lectures/lecture6.pdf • http://www.bioinfo.org.cn/lectures/index-13.html • Needleman, S.B. and Wunsch, Ch.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443-453. • Smith, T.F. and Waterman, M.S. (1981) Comparison of Biosequences. Adv. appl. Math., 2, 482-489. • Dayhoff,M.O., Barker,W.C. and Hunt,L.T. (1983) Establishing Homologies in Protein Sequences. Methods Enzymol., 91, 524-545. • Gotoh, O. (1982) An Improved Algorithm for Matching Biological Sequences. J. Mol. Biol., 162, 705-708.

More Related