80 likes | 197 Vues
SATCHMO, developed by Edgar and Sjölander, is a bioinformatics algorithm designed for aligning unaligned sequences and constructing phylogenetic trees using profile Hidden Markov Models (HMMs). It addresses challenges related to small counts using Dirichlet mixture densities and employs profile-profile scoring to identify and align the closest sequence pairs. The algorithm effectively handles structural divergence, making it robust for proteins with varying folds. Evaluation of its performance includes alignment accuracy via 3D structural comparisons, enhancing predictive power in phylogenetic analysis.
E N D
SATCHMO: Simultaneous Alignment and Tree Construction using Hidden Markov mOdels Edgar, R., and Sjölander, K., Bioinformatics 2003
SATCHMO algorithm • Input: unaligned sequences, each forming a separate subtree (of a single sequence each) • Initialize: a profile HMM is constructed for each sequence using Dirichlet mixture densities. • Dirichlet mixture densities avoid the problems of small counts • While (#subtrees > 1) { • Use profile-profile scoring to select closest pair to join • Relative entropy between columnar distributions • Align pair to each other, keeping columns fixed within each subtree • Mask columns with many gaps or high positional relative entropy. • Construct a profile HMM for the new masked MSA • Use Dirichlet mixture densities. } • Output: Tree and MSA
SATCHMO performance evaluation • Evaluating the phylogenetic tree accuracy is difficult • Simulation studies are used to evaluate evolutionary tree methods • These rarely attempt to model the effects of duplication and structural and functional changes • We don’t know the evolutionary history of multi-gene families, so benchmark datasets of real protein family phylogenies are not available • However, we can directly assess the alignment accuracy by way of 3D structure • The structural alignment of two proteins is accepted as “ground truth” by the computational structural biology community • We can also assess the functional predictive power of a phylogenetic tree against what is known about the functions of proteins • This approach is not universally accepted
SATCHMO is more robust to extreme structural divergence than other methods
SATCHMO succeeds at alignment of proteins with different overall folds MAFFT SATCHMO