1 / 14

Dynamic Programming 6.5-6.9

Dynamic Programming 6.5-6.9. Brandon Andrews. Topics. Longest Common Subsequences Global Sequence Alignment Scoring Alignments Local Sequence Alignment Alignment with Gap Penalties Questions. Longest Common Subsequences (LCS). Goal: Looking for sequence similarity between two sequences

inez
Télécharger la présentation

Dynamic Programming 6.5-6.9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Programming6.5-6.9 Brandon Andrews

  2. Topics Longest Common Subsequences Global Sequence Alignment Scoring Alignments Local Sequence Alignment Alignment with Gap Penalties Questions

  3. Longest Common Subsequences (LCS) • Goal: Looking for sequence similarity between two sequences • Sequences can vary in length between each other • Sequences are denoted as v and w and are viewed as strings of characters. v = ATTGCTA

  4. Subsequences • Subsequences are an ordered sequence of characters in v or w • For example: v = ATTGCTA then AGCA and ATTA are subsequences • AGCA: ATTGCTA • ATTA: ATTGCTA

  5. Operations • The only operations we can perform is insertion and deletion • Insertion: ATCTGAT -> A-TCTGAT • The hyphen represents inserting anything • Deletion: Insertion into the other sequence to offset the characters to line up the longest common subsequences • v=AT-C-TGAT • w=-TGCAT-A- • How do we find TCTA using dynamic programming?

  6. Review: Edit Distance • Turning one sequence into another with the least number of operations. • Allowed insertion, deletion, and substitutions • The longest common subsequences problem is basically identical with only insertion and deletion and the weights are 0 for a non-match and 1 for a match in the grid (basically Manhattan with fixed weights)

  7. Example • Example: Other slides • Chapter 6: Edit Distance, Slides 54-58,

  8. Global Sequence Alignment Chapter 6: Alignment

  9. Scoring Alignments • Scoring matrices are based on biological evidence. • Certain amino acid mutations are more common than others. • For instance, Asn, Asp, Glu, and Ser are the most mutable amino acids • The probability that Ser mutates into Phe is approximately three times as likely as Trp mutating into the same amino acid Phe

  10. PAM • 1 mutation for every 100 amino acids • Required condition that ensures proteins that are being analyzed are closely related. • The scoring matrix uses probabilities that can change if the proteins are not closely related. • The probability that one amino acid can mutate into another is different essentially • 1 PAM is the average time for the “average” protein to mutate 1% • You end up with PAM 1, PAM 2 type scoring matrices

  11. Local Sequence Alignment • Global alignment looked at two entire strings • Local alignment attempts to only look for local alignments • That is look for small sequences that are similar in larger sequences

  12. Smith-Waterman Local Alignment Algorithm Set an edge weight of 0 from the source to every other vertex.

  13. Alignment with Gap Penalties • Gaps are expected in the sequences. • However, very small gaps could indicate dissimilarity, so a penalty is given for gaps that meet a criteria

  14. References An Introduction to Bioinformatics Algorithms Related Slides

More Related