1 / 1

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program

Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues Results show that the ROC curve of Struct-SVM dominates the ROC curve of Support Vector Machine (SVM) classifier. X test,j = surface. no. Learning

zarifa
Télécharger la présentation

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues Results show that the ROC curve of Struct-SVM dominates the ROC curve of Support Vector Machine (SVM) classifier Xtest,j= surface no Learning System L Resulting Classifier Xtest,j yes Collection of Surface Windows Collection of Non-Surface Windows Test Data h(xtest,j)=y h(xtest,j)=-1 Training Data Final Predictions Seq2SeqWins SeqWins2TargetAA SeqWins2ZeroOne SeqWins2Blast SeqWins2SS SS2ZeroOne TargetAA2Struct Struct2Blast SeqWins2CXValue SeqWins2Roughness xi=(xi,1,…,xi,j-k,…,xi,j,…,xi,j+k,…,xi,m) Sequence: yi=(yi,1,…,yi,j-k,…,yi,j,…,yi,j+k,…,yi,m) Label: windowise … … x’i,j-1=(xi,j-1-k,…,xi,j-1,…,xi,j-1+k) x’i,j-1=(xi,j-1) x’i,j=(xi,j-k,…,xi,j,…,xi,j+k) x’i,j=(xi,j) x’i,j+1=(xi,j+1-k,…,xi,j+1,…,xi,j+1+k) x’i,j+1=(xi,j+1) … … Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Predicting Protein-RNA Binding Sites Using Structural Information Cornelia Caragea, Michael Terribilini, Jivko Sinapov, Jae-Hyung Lee, Fadi Towfic, Drena Dobbs and Vasant Honavar Introduction Struct-SVM Classifier A machine learning classifier that incorporates domain knowledge to improve classification (that is, the structure of the protein) RNA molecules play diverse functional and structural roles in cells: • messengers for transferring genetic information from DNA to proteins • primary genetic material in many viruses • enzymes important for protein synthesis and RNA processing • essential and ubiquitous regulators of gene expression in living organisms These functions depend on interactions between RNA molecules and specific proteins in cells. 1T0K_B SINQKLALVIKSGKYTLGYKSTVKSLRQGKSKLIIIAANTPVLRKSELEYYAMLSKTKVYYFQGGNNELGTAVGKLFRVGVVSILEAGDSDILTTLA Protein-RNA interface residue identification xi A N T P V L R K S 0 0 1 1 0 0 1 0 0 yi {0,1}* Results Dataset • RNA-Protein Interface dataset, RB181: consists of RNA-binding protein sequences extracted from structures of known RNA-protein complexes solved by X-ray crystallography in the Protein Data Bank Feature Extraction Seq2SeqWins Table 1. Accuracy, Correlation Coefficient and Area Under the ROC Curves for SVM and Struct-SVM Fig. 1. Receiver Operaring Characteristi (ROC) Curves for SVM and Struct-SVM classifiers on the protein-RNA dataset SeqWins2TargetAA Conclusions References [1] Chen, Y., Varani, G. (2005). Protein families and RNA recognition. Febs J 272:2088-2097. [2] Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998 [3] Towfic, F., Caragea, C., Dobbs, D., and Honavar, V. (2008). Struct-NB: Predicting protein-RNA binding sites using structural features. International Journal of Data Mining and Bioinformatics, In press. Acknowledgements: This work is supported in part by a grant from the National Institutes of Health (GM 066387) to Vasant Honavar & Drena Dobbs

More Related