Methods:

Training a Neural Network to Recognize Phage Major Capsid ProteinsAuthor: Michael Arnoult, San Diego State UniversityMentors: Victor Seguritan, Anca Segall, and Peter Salamon, Department of Biology, San Diego State University This research was funded in part by the NSF 0827278 UBM Interdisciplinary Training in Biology and Mathematics grant to AMS and PS. Background: Bacteriophages are the single most abundant biological entity on earth, and influence every environment in which bacteria exist. There are no current algorithms which reliably analyze phage structural protein sequences and predict their function. • The research conducted allows the classification of phage structural proteins using Artificial Neural Networks, a computational method of analysis inspired by biological neurons. • Features of phage protein sequences with known classifications were used to train neural networks, which then predict the specified function of unknown sequences. • Analysis of the predictions will allow biologists to decide, with some accuracy, which proteins are the most appropriate candidates for their research needs. Results: Methods: Major Capsid Proteins (MCPs) and Tail Proteins were obtained from the NCBI Refseq database using the keywords Phage, Proteins, and: Phage Major Capsid Protein Sequence Collection • Major • Shell • Coat • Capsid • Head • Prohead • Procapsid • Tail • Neck Non-MCP/Tail sequences were also downloaded as Negative examples. • Sequences with non-MCP/Tail annotations were removed from the positive data-set • Sequences with MCP/Tail annotations were removed from the negative set Annotation Data Filtering • Only positive MCP sequences greater than 300 Amino Acids in length and Tail sequences greater than 150 Amino Acids were used. Training Set Manipulation Conclusions: Input Layer • Phage Major Capsid Proteins and Tail Proteins are distinguishable from other Phage Proteins by trained Artificial Neural Networks. • One ANN input was one of four amino acid features that were translated into quantitative representations:Masses, Isoelectric Points, Hydrophobicity Ratings,and Volumes. • A second input was the feature described above divided by the sequence length. • 20 ANN inputs were represented by Amino Acid Percent Compositions. Conversion of Sequences to Quantitative Features • Classification of the test sets reveal the ANN's ability to distinguish phage Major Capsidproteins more accurately than Tail proteins. • Protein sequence sets may be contrasted by analyzing their combinations of physical features from the perspective of ANNs. Hidden Layer Output Layer • ANNs were trained according to each of the five features and combinations of two or more features. • Architecture included one hidden layer with 100 neurons. • One input neuron was used for each feature. Training of ANNs Example of T4 Bacteriophage Diagram of Artificial Neural Network with 5 input layer neurons, 5 hidden layer neurons, 1 output layer neuron. http://www.cosmosmagazine.com/node/1024 Future Directions: Find a distribution of positive and negative protein sequence examples appropriate for Phage or Bacterial genomes to improve classification ability ClustalW analysis of Tail/Major Capsid proteins against ANN-tested false positive proteins from Bacterial genomes Experimentally validate ANN predictions of unknown virus sequences a. Gene constructs of sequences predicted by ANNs to be MCPs or Tail proteins b. Gene expression in bacterial cells c. Visually verify potential Tail and Capsid proteins by Electron Microscopy Classification was executed on a test set containing a randomly selected 20% portion of Positive and Negative protein sequences, not included in training. Testing of ANNs using Phage Protein Sequences

Methods:

Methods:

Presentation Transcript

What is research methods?

METHODS OF PERSUASION

Nonparametric Methods II

Ch18. The Greedy Methods

Kaplan-Meier methods and Parametric Regression methods

SAMPLING METHODS

Chapter Eight: Quantitative Methods

Statistical Methods and SPSS Physical Therapy 34.616 Research Methods Robert Karasek and Sean Collins

Introduction to Finite Element Methods

How to do Experiments: Empirical Methods for AI & CS

Statistical Methods

HCI 510 : HCI Methods I

Methods of Sociological Inquiry

Constructivist Methods of Instruction

A Survey on Software Architecture Analysis Methods

Potentiometric Methods

Research methods for HCI, HCC

3. Optimization Methods for Molecular Modeling

Nuclear cardiology methods in routine clinical practice

Instrumental methods of analysis. Photometry.

Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 10 —

Factor Analysis:

Methods:

Methods:

Presentation Transcript

What is research methods?

METHODS OF PERSUASION

Nonparametric Methods II

Ch18. The Greedy Methods

Kaplan-Meier methods and Parametric Regression methods

SAMPLING METHODS

Chapter Eight: Quantitative Methods

Statistical Methods and SPSS Physical Therapy 34.616 Research Methods Robert Karasek and Sean Collins

Introduction to Finite Element Methods

How to do Experiments: Empirical Methods for AI &amp; CS

Statistical Methods

HCI 510 : HCI Methods I

Methods of Sociological Inquiry

Constructivist Methods of Instruction

A Survey on Software Architecture Analysis Methods

Potentiometric Methods

Research methods for HCI, HCC

3. Optimization Methods for Molecular Modeling

Nuclear cardiology methods in routine clinical practice

Instrumental methods of analysis. Photometry.

Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 10 —

Factor Analysis:

How to do Experiments: Empirical Methods for AI & CS