1 / 20

Advancements in Multiple Sequence Alignment Algorithms for DNA and Protein Sequences

This paper discusses innovative algorithms for Multiple Sequence Alignment (MSA) aimed at optimizing the alignment of multiple DNA and protein sequences. It emphasizes the quantification of similarities, detection of conserved motifs, and the genesis of evolutionary insights. Using a score matrix and dynamic programming, these algorithms explore extreme paths and segment matching. The study ultimately contributes to a deeper understanding of protein families and enhances annotation transferability, making significant strides in genomic research methodologies.

spence
Télécharger la présentation

Advancements in Multiple Sequence Alignment Algorithms for DNA and Protein Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford

  2. Multiple Sequence Alignment • Quantifies similarities among [DNA, Protein] sequences • Detects highly conserved motifs & remote homologues • Evolutionary insights • Transfer of annotation • Representation of protein families

  3. (1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------ Multiple Sequence Alignment • Input: k sequences • Output: optimal alignment • Gap infused sequences (-), one per row. • Restrictions column pattern

  4. Multiple Sequence Alignment • Input: k sequences • Output: optimal alignment • Minimal width • Score function • Columns summation • e.g. sum of pairs (1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------

  5. GARFIELDMETNERMAL GARFIELDANDHISASSOCIATENERMAL num of nodes num neighbors per node DP solves MSA • Build a score matrix • k-dimensional hypercube • An alignment is a path • Time: GARFIELDMET---------------NERMAL GARFIELD---ANDHISASSOCIATENERMAL

  6. Previous Work

  7. GARFIELDANDHISASSOCIATENERMAL Pairwise Restriction • The “true” information: the aligned subsequences and their relative positioning • Study pairwise alignment first and restrict the alignment • Time: • Focus efforts on “true” tradeoffs GARFIELDMETNERMAL

  8. ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE nodes • Edges: • self edges • between 2-equal-lengths-segments of different sequences • have scores GARFIELD NERMAL ANDHISASSOCIATE GARFIELD MET NERMAL Segments Matching Graph (SMG) • Sequences are partitioned into segments Defines allowed paths and their score

  9. ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL

  10. ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL

  11. ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL Extreme paths:

  12. ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL Extreme paths:

  13. Lemma: there is an optimal path that is extreme Optimalpaths All paths Extreme paths

  14. GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE Improved algorithm: DP on the segments

  15. Transitive PR-MSA DNA sequences *no scores in SMG, only matches

  16. Maximal Directions • Transitivity implies that for any point in the hypercube, the directions are partitioned into cliques • Defines maximal directions • The shortest path can be taken over maximal directions. • Pushes down the work per node

  17. ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELD NERMAL ANDHISASSOCIATE ? ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELD NERMAL ANDHISASSOCIATE GARFIELD GARFIELD MET MET NERMAL NERMAL Obvious Directions Obvious: Non-Obvious:

  18. Obvious Directions • Lemma:Optimal pathis found, evenwhen making obvious decisions • Not all nodes are relevant • Work for every node increases to

  19. Straightjunction Corner junction (0,0) Special Vertices

  20. Thank you

More Related