10 likes | 135 Vues
The Struct-SVM classifier significantly improves the identification of protein-RNA interface residues by incorporating domain knowledge and structural information. Our results demonstrate that the Struct-SVM outperforms traditional Support Vector Machine (SVM) classifiers, as reflected in the ROC curve analysis. The research highlights the critical role of RNA molecules in cellular processes and the importance of accurately predicting their interactions with specific proteins. The study utilizes an RNA-Protein interface dataset derived from known RNA-protein complexes, showcasing cutting-edge advancements in bioinformatics and computational biology.
E N D
Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues Results show that the ROC curve of Struct-SVM dominates the ROC curve of Support Vector Machine (SVM) classifier Xtest,j= surface no Learning System L Resulting Classifier Xtest,j yes Collection of Surface Windows Collection of Non-Surface Windows Test Data h(xtest,j)=y h(xtest,j)=-1 Training Data Final Predictions Seq2SeqWins SeqWins2TargetAA SeqWins2ZeroOne SeqWins2Blast SeqWins2SS SS2ZeroOne TargetAA2Struct Struct2Blast SeqWins2CXValue SeqWins2Roughness xi=(xi,1,…,xi,j-k,…,xi,j,…,xi,j+k,…,xi,m) Sequence: yi=(yi,1,…,yi,j-k,…,yi,j,…,yi,j+k,…,yi,m) Label: windowise … … x’i,j-1=(xi,j-1-k,…,xi,j-1,…,xi,j-1+k) x’i,j-1=(xi,j-1) x’i,j=(xi,j-k,…,xi,j,…,xi,j+k) x’i,j=(xi,j) x’i,j+1=(xi,j+1-k,…,xi,j+1,…,xi,j+1+k) x’i,j+1=(xi,j+1) … … Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Predicting Protein-RNA Binding Sites Using Structural Information Cornelia Caragea, Michael Terribilini, Jivko Sinapov, Jae-Hyung Lee, Fadi Towfic, Drena Dobbs and Vasant Honavar Introduction Struct-SVM Classifier A machine learning classifier that incorporates domain knowledge to improve classification (that is, the structure of the protein) RNA molecules play diverse functional and structural roles in cells: • messengers for transferring genetic information from DNA to proteins • primary genetic material in many viruses • enzymes important for protein synthesis and RNA processing • essential and ubiquitous regulators of gene expression in living organisms These functions depend on interactions between RNA molecules and specific proteins in cells. 1T0K_B SINQKLALVIKSGKYTLGYKSTVKSLRQGKSKLIIIAANTPVLRKSELEYYAMLSKTKVYYFQGGNNELGTAVGKLFRVGVVSILEAGDSDILTTLA Protein-RNA interface residue identification xi A N T P V L R K S 0 0 1 1 0 0 1 0 0 yi {0,1}* Results Dataset • RNA-Protein Interface dataset, RB181: consists of RNA-binding protein sequences extracted from structures of known RNA-protein complexes solved by X-ray crystallography in the Protein Data Bank Feature Extraction Seq2SeqWins Table 1. Accuracy, Correlation Coefficient and Area Under the ROC Curves for SVM and Struct-SVM Fig. 1. Receiver Operaring Characteristi (ROC) Curves for SVM and Struct-SVM classifiers on the protein-RNA dataset SeqWins2TargetAA Conclusions References [1] Chen, Y., Varani, G. (2005). Protein families and RNA recognition. Febs J 272:2088-2097. [2] Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998 [3] Towfic, F., Caragea, C., Dobbs, D., and Honavar, V. (2008). Struct-NB: Predicting protein-RNA binding sites using structural features. International Journal of Data Mining and Bioinformatics, In press. Acknowledgements: This work is supported in part by a grant from the National Institutes of Health (GM 066387) to Vasant Honavar & Drena Dobbs