1 / 1

1. Aims of the Project

Sequence State Features. Hydrophobicity Features. Neighboring Features. Basic Amino Acid Features. 88%. 75%. 60%. 27%. Application of Conditional Random Fields In Bioinformatics Wei Liu and Sanjay Chawla School of Information Technologies, the University of Sydney.

dillon
Télécharger la présentation

1. Aims of the Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence State Features Hydrophobicity Features Neighboring Features Basic Amino Acid Features 88% 75% 60% 27% Application of Conditional Random Fields In Bioinformatics Wei Liu and Sanjay ChawlaSchool of Information Technologies, the University of Sydney • 5. Conditional Random Fields (CRFs) • CRFs are discriminative undirected model: • Accuracy comparison between CRF and other 28 on the same standard benchmark server: • 1. Aims of the Project • Predict the orientation and location of transmembrane helices in protein secondary structure • Explore the use of Conditional Random Fields (CRFs) to problems in Bioinformatics. • 2. Main Contributions • We defined 18 efficient features to make CRFs model fit to alpha-helices prediction problem. • We compared CRFs against other 28 methods on a standard benchmark server. The accuracy of CRFs are the highest among them. • We also compared CRFs with another conditional model – MEMM, and proved that CRFs is significantly better than MEMM. • Features are a way of incorporating domain knowledge in protein science and information in training data into the CRF model. We defined 18 types of features. Most of them are based on domain knowledge of protein science. • Each feature is conferred with a weight in training process by estimating the conditional maximum likelihood using Quasi-Newton Methods. • New test data sequence are labelled using these feature weights by Viterbi Algorithm. CRF is the highest in both pre-segment and per-residue accuracy. • Accuracy comparison between CRF and MEMM (take experiment 8 as an example): • 3. Introduction • Protein play a central role in all aspects of cell structure and function. The functionality of a protein is determined by its structure; There are 20 types of amino acids in protein primary sequences, and each amino acid belongs to a certain “secondary structure” type. One of the important “secondary structure” types is alpha-helix. For e.g.: • Alpha-helical segments are the targets of our prediction in this project. 6. Experiments • Experiment Design: we carried out 8 experiments, with increasing number of features from experiment 1 to experiment 8. • Accuracy of the 8 experiment: (Assessed by TMH benchmark server) Helices • A t-test between the accuracy of these two models on every experiment proves CRFs is significantly better than MEMM: The more types of features we turned on in experiments, the more accurate predictions we got. • Influence of Features on prediction accuracy (take per-segment accuracy as an example): • 4. Problem Definition • Given a primary sequence of a protein, we map each amino acid into either alpha-helix (1) or not (0). Then the research problem can be explained as follows. Given: Data X: LTTNMLTMYQWWRDVIR …… Data Y: 111111110000001111111 …… How to predict Y in a new sequence: Data X: GNLAVADLFMVFGGFTTT …... Data Y: ????????????????????? …… This is a Classification Problem • 7. Conclusions • Conditional Random Fields (CRFs) are graphical undirected learning models, which allow domain knowledge to be incorporated as constraints (features). • The phenomenon that CRF outperform other 28 methods and is significantly better than MEMM indicates that CRF is a rather competent tool, and it might also be able to solve other sequential classification problems.

More Related