60 likes | 178 Vues
This document discusses the process of profile-sequence alignment used to search for new members in protein families, particularly through family alignments from databases like PFAM. The methodology involves computing a profile of the alignment and aligning it to a sequence using dynamic programming techniques. It details how to represent a profile vector through normalized nucleotide frequencies and provides insight into scoring methods used for alignment. Techniques such as ClustalW, MUSCLE, and PSI-BLAST are also highlighted.
E N D
Pairwise profile alignment Usman Roshan BNFO 601
Protein families • PFAM: http://pfam.sanger.ac.uk/ • Family alignments can be used to search for new members in a database
Profile-sequence alignment • Given a family alignment, how can we align it to a sequence? • First, we compute a profile of the alignment. • We then align the profile to the sequence using standard dynamic programming. • However, we need to describe how to align a profile vector to a nucleotide or residue.
Profile • A profile can be described by a set of vectors of nucleotide/residue frequencies. • For each position i of the alignment, we we compute the normalized frequency of nucleotides A, C, G, and T
Aligning a profile vector to a nucleotide • ClustalW/MUSCLE • Let f be the profile vector • Score(f,j)= • where S(i,j) is substitution scoring matrix
Aligning a profile vector to a nucleotide • PSI-BLAST • Score(f,i)=log(Qi/Pi) • Pi is the background probability of nucleotide i • qij is a matrix of match/mismatch probabilities • Define gi as • and Qi as