Aligning Profile Vectors to Sequences in Protein Family Analysis
This guide discusses the process of aligning profile vectors to sequences using standard dynamic programming, focusing on protein families. It covers creating a profile from family alignments, which can be used to identify new members in databases. By computing nucleotide or residue frequencies for each position in an alignment, a profile vector is generated. Subsequently, the alignment of this profile vector to a specific sequence is performed using methods such as ClustalW, MUSCLE, and PSI-BLAST. The scoring of alignments relies on substitution scoring matrices and background probabilities.
Aligning Profile Vectors to Sequences in Protein Family Analysis
E N D
Presentation Transcript
Pairwise profile alignment Usman Roshan BNFO 601
Protein families • PFAM: http://pfam.sanger.ac.uk/ • Family alignments can be used to search for new members in a database
Profile-sequence alignment • Given a family alignment, how can we align it to a sequence? • First, we compute a profile of the alignment. • We then align the profile to the sequence using standard dynamic programming. • However, we need to describe how to align a profile vector to a nucleotide or residue.
Profile • A profile can be described by a set of vectors of nucleotide/residue frequencies. • For each position i of the alignment, we we compute the normalized frequency of nucleotides A, C, G, and T
Aligning a profile vector to a nucleotide • ClustalW/MUSCLE • Let f be the profile vector • Score(f,j)= • where S(i,j) is substitution scoring matrix
Aligning a profile vector to a nucleotide • PSI-BLAST • Score(f,i)=log(Qi/Pi) • Pi is the background probability of nucleotide i • qij is a matrix of match/mismatch probabilities • Define gi as • and Qi as