90 likes | 206 Vues
This study presents a model for understanding the specificity of human bHLH transcription factors (TFs) Mad, Max, and Myc, using data from the Dream5 dataset. We focus on the composition of shape features, employing both linear and bilinear terms such as MGWi, Rolli, ProTi, and HelTi. Various feature combinations are explored, including PWMs and shape features. Our findings reveal that a model using a combination of 1mer and shape features outperforms traditional feature combinations, while also demonstrating robustness against sample size variations.
E N D
Modeling Sequence Specificity of Transcription Factors with DNA structural features Tianyin Zhou 11/06/2013
Data set • Data: • 3 Human bHLH TFs: Mad, Max and Myc • 62 TFs from Dream5 • Composition of shape features • Linear terms: MGWi, Rolli, ProTi, HelTi, • Bilinear terms: MGWi*MGWi+1, Rolli*Rolli+1, ProTi*ProTi+1, HelTi *HelTi+1 • Feature combinations: • PWM (1mer) • 1mer + 2mer • 1mer + 2mer + 3mer • 1mer + shape features • shape features • Algorithms: • Support vector regression
1mer+shape outperforms other feature combinations. • Shape alone performs as well as 1mer+2mer and 1mer+2mer+3mer. • Performance of 1mer+shape is less sensitive to the sample size compared to 1mer+2mer and 1mer+2mer+3mer.
Dream5 data 1mer+shape outperforms 1mer.
Results 1mer+4shape has comparable performance to 1mer +2mer.