1 / 1

Methods:

Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca Segall, and Peter Salamon, Department of Biology, San Diego State University.

sirius
Télécharger la présentation

Methods:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training a Neural Network to Recognize Phage Major Capsid ProteinsAuthor: Michael Arnoult, San Diego State UniversityMentors: Victor Seguritan, Anca Segall, and Peter Salamon, Department of Biology, San Diego State University This research was funded in part by the NSF 0827278 UBM Interdisciplinary Training in Biology and Mathematics grant to AMS and PS.  Background: Bacteriophages are the single most abundant biological entity on earth, and influence every environment in which bacteria exist. There are no current algorithms which reliably analyze phage structural protein sequences and predict their function. • The research conducted allows the classification of phage structural proteins using Artificial Neural Networks, a computational method of analysis inspired by biological neurons. • Features of phage protein sequences with known classifications were used to train neural networks, which then predict the specified function of unknown sequences. • Analysis of the predictions will allow biologists to decide, with some accuracy, which proteins are the most appropriate candidates for their research needs. Results: Methods: Major Capsid Proteins (MCPs) and Tail Proteins were obtained from the NCBI Refseq database using the keywords Phage, Proteins, and: Phage Major Capsid Protein Sequence Collection • Major • Shell • Coat • Capsid • Head • Prohead • Procapsid • Tail • Neck Non-MCP/Tail sequences were also downloaded as Negative examples. • Sequences with non-MCP/Tail annotations were removed from the positive data-set • Sequences with MCP/Tail annotations were removed from the negative set Annotation Data Filtering • Only positive MCP sequences greater than 300 Amino Acids in length and Tail sequences greater than 150 Amino Acids were used. Training Set Manipulation Conclusions: Input Layer • Phage Major Capsid Proteins and Tail Proteins are distinguishable from other Phage Proteins by trained Artificial Neural Networks. • One ANN input was one of four amino acid features that were translated into quantitative representations:Masses, Isoelectric Points, Hydrophobicity Ratings,and Volumes. • A second input was the feature described above divided by the sequence length. • 20 ANN inputs were represented by Amino Acid Percent Compositions. Conversion of Sequences to Quantitative Features • Classification of the test sets reveal the ANN's ability to distinguish phage Major Capsidproteins more accurately than Tail proteins. • Protein sequence sets may be contrasted by analyzing their combinations of physical features from the perspective of ANNs. Hidden Layer Output Layer • ANNs were trained according to each of the five features and combinations of two or more features. • Architecture included one hidden layer with 100 neurons. • One input neuron was used for each feature. Training of ANNs Example of T4 Bacteriophage Diagram of Artificial Neural Network with 5 input layer neurons, 5 hidden layer neurons, 1 output layer neuron. http://www.cosmosmagazine.com/node/1024 Future Directions: Find a distribution of positive and negative protein sequence examples appropriate for Phage or Bacterial genomes to improve classification ability ClustalW analysis of Tail/Major Capsid proteins against ANN-tested false positive proteins from Bacterial genomes Experimentally validate ANN predictions of unknown virus sequences a. Gene constructs of sequences predicted by ANNs to be MCPs or Tail proteins b. Gene expression in bacterial cells c. Visually verify potential Tail and Capsid proteins by Electron Microscopy Classification was executed on a test set containing a randomly selected 20% portion of Positive and Negative protein sequences, not included in training. Testing of ANNs using Phage Protein Sequences

More Related