Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University
HTH Motifs • Protein sequences sharing low similarities can fold into a similar HTH structure. • Identifying HTH motifs from sequence is extremely challenging • 7 families containing HTH motifs from the Pfam database. Positive data set: 2,198 proteins. • Negative data set: 1,518 proteins.
Combination of Amino Acid Sequence and Predicted Secondary Structure LQQITHIANQL-GLE----KDVVRVWF LQQITHIANQL-GLE----KDVVRVWF HHHEEHEEEHMHE----HHEEMMEH HMM_AA HMM_AA_SS
Reduced Alphabets Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix elements as described in the text. (Murphy et al. 2000)
Results Table 1. Cross-Families Evaluations • True positive: HTH motifs that are correctly identified as such. • False positive: Non-HTH motifs that are identified as HTH motifs. • The alphabet used to encode amino acid sequences.
Results Table 2. Comparisons with a method based on profile-profile comparisons Table 3. Putative HTH motifs in Ureaplasma parvum