390 likes | 494 Vues
This research focuses on developing a memory-efficient algorithm for multiple sequence alignment with constraints. By incorporating biological structures and consensuses into sequence alignment, the algorithm ensures no overlapping between them. The problem formulation involves defining constraints related to the alignment of conserved sites in protein or DNA/RNA families. The constrained multiple sequence alignment (CMSA) aims for an optimal sum-of-pair score by progressively aligning more sequences using a divide-and-conquer approach. The algorithm overview includes notations for alignment scores, substitutions, deletions, insertions, semi-constrained alignment, and recurrence of scores. The implementation and experimental results are discussed, highlighting the algorithm's efficiency and performance guarantee.
E N D
A memory-efficient algorithm for multiple sequence alignment with constraintsChin Lung Lu and Yen Pin HuangNational Chiao Tung UniversityTaiwan, Republic of ChinaBioinformatics, Vol. 21 no. 1 2005 Yutu Liu -- CPSC 689 Algorithmic Techniques for Biology Spring 2005
Motivation • Incorporate the biological structures and consensuses into sequence alignment • Memory efficient
No overlapping between them A T C T C G C T A T -- C -- T C G C T -- -- -- A T C T C G C T T G C A T A T -- T G C A T -- -- A T T G C A T A T -- -- -- -- AT T Problem Formulation -- Constraints • What is the multiple sequence alignment with constraints ? Conserved sites of a protein or DNA/RNA family
G A Hamming Distance Approximately appears 0.5 Problem Formulation -- Constraints T G C A T A T
L Band L’ Given S={s1,s2,…,sx}, and Subseq(S2, L’) string T={t1,t2,..tk}, for T G C C C Problem Formulation -- Constraints A T G C A T C G C T -- T G C A T -- -- A T T T G C A T C A T C T approximately appears inL
C2 C3 C1 S1 CPSA S2 S3 Problem Formulation Constrained Multiple Sequence Alignment (CMSA) Optimal Sum-of-Pair Score
CMSA • Pick two sequences • Find the CPSA • Use it as a kernel to progressively align more sequences [1] Progressive Multiple Alignment with Constraints, Gene Myers et al. [2] MuSiC: A Tool for Multiple Sequence Alignment with Constraints Yin Te Tsai Chin Lung LuChing Ta Yu Yen Pin Huang
Divide-and-Conquer Find recursive relationship ai bj ai-1 bj-1 Algorithm Overview M(i,j) M(i-1,j-1)
… … C1 Ck Cγ Notation
A B ... C1 C2 Ck Alignment Score
A ai B ... bj C1 C2 Ck Alignment Score - Substitution
A ai ... B -- C1 C2 Ck Alignment Score -- Deletion
A -- ... B b j C1 C2 Ck Alignment Score -- Insertion
h A B ... C1 C2 Ck-1 Ck Semi-Constrained Alignment
Ck Recurrence of Scores
a i-1 -- a i-1 b j b j --
Constraints ( i-1, j-1, k) ( i-1, j, k) ( i, j, k) ( m, n, γ) ( 0, 0, 0) ( i, j-1, k) Sequence A Sequence B
Assignment Email: alinux@tamu.edu
pref(Ck,h) suff(Ck, λk - h) h … … C1 Ck Cγ
… … C1 Ck Cγ
Discussion • Lack of proof of consistency of constraints • Optimal pair-wise subsequences alignment might cause the failure of the overall optimal alignment
Discussion http://genome.life.nctu.edu.tw:8080/MUSICME/index.html
Reference Efficient Constrained Multiple Sequence Alignment with Performance Guarantee Francis Y.L. Chin N.L. Ho T.W. Lamy Prudence W.H. Wong M.Y. Chan Divide-and-conquer multiple alignment with segment-based constraints Michael Sammeth1,∗, Burkhard Morgenstern2 and Jens Stoye 1 Multiple sequence alignment with the divide-and-conquer method Jens Stoye MuSiC: A Tool for Multiple Sequence Alignment with Constraints Yin Te Tsai1 Chin Lung Lu2∗ Ching Ta Yu1 Yen Pin Huang