200 likes | 310 Vues
This paper discusses innovative algorithms for Multiple Sequence Alignment (MSA) aimed at optimizing the alignment of multiple DNA and protein sequences. It emphasizes the quantification of similarities, detection of conserved motifs, and the genesis of evolutionary insights. Using a score matrix and dynamic programming, these algorithms explore extreme paths and segment matching. The study ultimately contributes to a deeper understanding of protein families and enhances annotation transferability, making significant strides in genomic research methodologies.
E N D
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Agarwal @Duke Yonatan Bilu @Hebrew University Rachel Kolodny @Stanford
Multiple Sequence Alignment • Quantifies similarities among [DNA, Protein] sequences • Detects highly conserved motifs & remote homologues • Evolutionary insights • Transfer of annotation • Representation of protein families
(1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------ Multiple Sequence Alignment • Input: k sequences • Output: optimal alignment • Gap infused sequences (-), one per row. • Restrictions column pattern
Multiple Sequence Alignment • Input: k sequences • Output: optimal alignment • Minimal width • Score function • Columns summation • e.g. sum of pairs (1) GARFIELD MET NERMAL(2) ODIE AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE(3) GARFIELD AND HIS ASSOCIATE NERMAL ----GARFIELD MET----------------- NERMAL ------------------------------ODIE------------AND HIS ASSOCIATE NERMAL MET GARFIELD AND HIS ASSOCIATE----GARFIELD ---AND HIS ASSOCIATE NERMAL ------------------------------
GARFIELDMETNERMAL GARFIELDANDHISASSOCIATENERMAL num of nodes num neighbors per node DP solves MSA • Build a score matrix • k-dimensional hypercube • An alignment is a path • Time: GARFIELDMET---------------NERMAL GARFIELD---ANDHISASSOCIATENERMAL
GARFIELDANDHISASSOCIATENERMAL Pairwise Restriction • The “true” information: the aligned subsequences and their relative positioning • Study pairwise alignment first and restrict the alignment • Time: • Focus efforts on “true” tradeoffs GARFIELDMETNERMAL
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE nodes • Edges: • self edges • between 2-equal-lengths-segments of different sequences • have scores GARFIELD NERMAL ANDHISASSOCIATE GARFIELD MET NERMAL Segments Matching Graph (SMG) • Sequences are partitioned into segments Defines allowed paths and their score
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL Extreme paths:
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE GARFIELD ANDHISASSOCIATE NERMAL Extreme paths:
Lemma: there is an optimal path that is extreme Optimalpaths All paths Extreme paths
GARFIELDANDHISASSOCIATENERMAL ODIEANDHISASSOCIATENERMALMETGARFIELDANDHISASSOCIATE Improved algorithm: DP on the segments
Transitive PR-MSA DNA sequences *no scores in SMG, only matches
Maximal Directions • Transitivity implies that for any point in the hypercube, the directions are partitioned into cliques • Defines maximal directions • The shortest path can be taken over maximal directions. • Pushes down the work per node
ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELD NERMAL ANDHISASSOCIATE ? ODIE ANDHISASSOCIATE NERMAL MET GARFIELD ANDHISASSOCIATE GARFIELD NERMAL ANDHISASSOCIATE GARFIELD GARFIELD MET MET NERMAL NERMAL Obvious Directions Obvious: Non-Obvious:
Obvious Directions • Lemma:Optimal pathis found, evenwhen making obvious decisions • Not all nodes are relevant • Work for every node increases to
Straightjunction Corner junction (0,0) Special Vertices