60 likes | 163 Vues
This document explores the process of aligning a profile of a protein family to sequences, which is crucial for identifying new members in biological databases. Using tools such as ClustalW and MUSCLE, we compute frequency profiles of nucleotides or residues from family alignments. We then utilize dynamic programming to align these profile vectors to individual sequences. The description includes the substitution scoring matrix and the calculation of match/mismatch probabilities. Understanding these methods advances research in protein family identification and sequence analysis.
E N D
Pairwise profile alignment Usman Roshan BNFO 601
Protein families • PFAM: http://pfam.sanger.ac.uk/ • Family alignments can be used to search for new members in a database
Profile-sequence alignment • Given a family alignment, how can we align it to a sequence? • First, we compute a profile of the alignment. • We then align the profile to the sequence using standard dynamic programming. • However, we need to describe how to align a profile vector to a nucleotide or residue.
Profile • A profile can be described by a set of vectors of nucleotide/residue frequencies. • For each position i of the alignment, we we compute the normalized frequency of nucleotides A, C, G, and T
Aligning a profile vector to a nucleotide • ClustalW/MUSCLE • Let f be the profile vector • Score(f,j)= • where S(i,j) is substitution scoring matrix
Aligning a profile vector to a nucleotide • PSI-BLAST • Score(f,i)=log(Qi/Pi) • Pi is the background probability of nucleotide i • qij is a matrix of match/mismatch probabilities • Define gi as • and Qi as