1. Aims of the Project

Sequence State Features Hydrophobicity Features Neighboring Features Basic Amino Acid Features 88% 75% 60% 27% Application of Conditional Random Fields In Bioinformatics Wei Liu and Sanjay ChawlaSchool of Information Technologies, the University of Sydney • 5. Conditional Random Fields (CRFs) • CRFs are discriminative undirected model: • Accuracy comparison between CRF and other 28 on the same standard benchmark server: • 1. Aims of the Project • Predict the orientation and location of transmembrane helices in protein secondary structure • Explore the use of Conditional Random Fields (CRFs) to problems in Bioinformatics. • 2. Main Contributions • We defined 18 efficient features to make CRFs model fit to alpha-helices prediction problem. • We compared CRFs against other 28 methods on a standard benchmark server. The accuracy of CRFs are the highest among them. • We also compared CRFs with another conditional model – MEMM, and proved that CRFs is significantly better than MEMM. • Features are a way of incorporating domain knowledge in protein science and information in training data into the CRF model. We defined 18 types of features. Most of them are based on domain knowledge of protein science. • Each feature is conferred with a weight in training process by estimating the conditional maximum likelihood using Quasi-Newton Methods. • New test data sequence are labelled using these feature weights by Viterbi Algorithm. CRF is the highest in both pre-segment and per-residue accuracy. • Accuracy comparison between CRF and MEMM (take experiment 8 as an example): • 3. Introduction • Protein play a central role in all aspects of cell structure and function. The functionality of a protein is determined by its structure; There are 20 types of amino acids in protein primary sequences, and each amino acid belongs to a certain “secondary structure” type. One of the important “secondary structure” types is alpha-helix. For e.g.: • Alpha-helical segments are the targets of our prediction in this project. 6. Experiments • Experiment Design: we carried out 8 experiments, with increasing number of features from experiment 1 to experiment 8. • Accuracy of the 8 experiment: (Assessed by TMH benchmark server) Helices • A t-test between the accuracy of these two models on every experiment proves CRFs is significantly better than MEMM: The more types of features we turned on in experiments, the more accurate predictions we got. • Influence of Features on prediction accuracy (take per-segment accuracy as an example): • 4. Problem Definition • Given a primary sequence of a protein, we map each amino acid into either alpha-helix (1) or not (0). Then the research problem can be explained as follows. Given: Data X: LTTNMLTMYQWWRDVIR …… Data Y: 111111110000001111111 …… How to predict Y in a new sequence: Data X: GNLAVADLFMVFGGFTTT …... Data Y: ????????????????????? …… This is a Classification Problem • 7. Conclusions • Conditional Random Fields (CRFs) are graphical undirected learning models, which allow domain knowledge to be incorporated as constraints (features). • The phenomenon that CRF outperform other 28 methods and is significantly better than MEMM indicates that CRF is a rather competent tool, and it might also be able to solve other sequential classification problems.

1. Aims of the Project

1. Aims of the Project

Presentation Transcript

Project Aims

Project Aims

AIMS OF THE PROJECT

Project aims

AIMS OF THE PROJECT

SPECIFIC AIMS OF THE NIRT PROJECT

THE PROJECT AIMS

Project Aims The research aimed to:

Project aims

1 Aims of the Journal

Project aims:

THE AIMS OF THE ”SECURITAS MARE” EU PROJECT

Project 4 - aims

Scope and aims of the project

Aims of the Project

The aims of the Gene Ontology project are threefold:

Project Background & Aims

Specific Aims of Project

Project Aims The research aimed to:

Aims / Objectives of the project

Project Specific Aims :

THE PROJECT AIMS

1. Aims of the Project