PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih

Proteins And Secondary Structure Sudhakar Reddy

Project Goals • To predict the secondary structure of a protein using artificial neural networks.

STRUCTURES • Primary structure: linear arrangement of amino acid (a.a) residues that constitute the polypeptide chain.

SECONDARY STRUCTURE • Localized organization of parts of a polypeptide chain, through hydrogen bonds between different residues. • Without any stabilizing interactions , a polypeptide assumes random coil structure. • When stabilizing hydrogen bond forms, the polypeptide backbone folds periodically in to one of two geometric arrangements viz. • ALPHA HELIX • BETA SHEET • U-TURNS

ALPHA HELIX • A polypeptide back bone is folded in to spiral that is held in place by hydrogen bonds between backbone oxygen atoms and hydrogen atoms. • The carbonyl oxygen of each peptide bond is hydrogen bonded to the amide hydrogen of the a.a 4 residues toward the C-terminus • Each alpha helix has 3.6 a.a per turn • From the backbone side chains point outward • Hydrophobic/hydrophilic quality of the helix is determined entirely by side chains, because polar groups of the peptide backbone are already involved H-bonding in the helix and thus are unable to affect its hydrophobic/hydrophilic.

ALPHA HELIX

THE BETA SHEET • Consists of laterally packed beta strands • Each beta strand is a short (5-8 residues), nearly fully extended polypeptide chain • Hydrogen bonding between backbone atoms in a adjacent beta strands, within either the same or different polypeptide chains forms a beta sheet. • Orientation can be either parallel or anti-parallel. In both arrangements side chains project from both faces of the sheet.

THE BETA SHEET

TURNS • Composed of 3-4 residues , are compact, U-shaped secondary structures stabilized by H-bonds between their end residues. • Located on the surface of the protein, forming a sharp bend that redirects the polypeptide backbone back toward the interior. • Glycine and proline are commonly present. • Without these turns , a protein would be large, extended and loosely packed.

TURNS

MOTIFS • MOTIFS: regular combinations of secondary structure. • Coiled coil motif • Helix-loop-helix(Ca+) • Zinc finger motif.

COILED-COIL MOTIF

HELIX-LOOP-HELIX (CA+)

ZINC-FINGER MOTIF

FUTURE • Protein structure identification is key to understanding biological function and its role in health and disease • Characterizing a protein structure helpful in the development of new agents and devices to treat disease • Challenge of unraveling the structure lies in developing methods for accurately and reliably understanding this relationship • Most of the current protein structures have been characterized by NMR and X-Ray diffraction • Revolution in sequencing studies-growing data base-only 3000 known structures

ADVANTAGE • Very few confirmations of protein are possible and structure and sequence are directly related to each other, we can unravel the secondary structure by developing an efficient algorithm, which compares new sequences with the ones available, and use them in health care industry.

WHY SECONDARY STRUCTURE? • Prediction of secondary structure is an essential intermediate step on the way to predicting the full 3-D structure of a protein • If the secondary structure of a protein is known, it is possible to derive a comparatively small number of possible tertiary structures using knowledge about the ways that secondary structural elements pack

Artificial Neural Network (ANN) Peichung Shih

Biological Neural Network

X1k : Input from X1 X2k : Input from X2 W1k : Weight of X1 W2k : Weight of X2 1 -1 -1 X0k : Bias term W0k : Weight of bias term : Threshold : Threshold S Q S Q : Nonlinear function : Nonlinear function qk : Output of node k Artificial Neural Network X1k : Input from X1 X2k : Input from X2 W1k : Weight of X1 W2k : Weight of X2 X0k : Bias term W0k : Weight of bias term qk : Output of node k

Artificial Neural Network - Example X0 = 1 W0 = 2 W1 = 1 F(x) = ( 1 + e-x )-1 X1= 1 + + -1 Q = 6 -1 W2 = 2 X2 = 2 7 1 Output 1

Paradigms of ANN - Overview • Perceptron • Adaline & Madaline • Backpropagation (BP)

Paradigms of ANN - Feedforward

Paradigms of ANN - feedback

Paradigms of ANN - supervised

Paradigms of ANN - Unsupervised

Paradigms of ANN - Overview • Perceptron • Adaline & Madaline • Backpropagation (BP)

Perceptron • One of the earliest learning networks was proposed by Rosenblatt in the late 1950's. RULE: net = w1I1 + w2I2 if net > Q then output = 1, otherwise o = 0. MODEL:

Output correct? y Q = Q W = W N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 - 0.5 I = 0 I = 1 I = 0 I = 1 + 0.5 W = W W = W - 1 W = W 1 1 • Perceptron Example : AND Operation Initial Network: W = W + 1

Output correct? y Q = Q W = W 0 N 0.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 - 0.5 + 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

Output correct? y Q = Q W = W 1 N 1.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 0.5 I = 0 I = 1 I = 0 I = 1 0 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

Output correct? y Q = Q W = W 0 N 0.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 1.5 0.5 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

Output correct? y Q = Q W = W 1 N 1.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 1.5 I = 0 I = 1 I = 0 I = 1 1 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

1 (0, 1) (1, 1) 1 0 (0, 0) (1, 0) XOR 0 AND OR • Hidden Layer

0 ?? • Hidden Layer

Hidden Layer 0.5 1 1 - 2 1 1.5 1 1 1 1 1 1 1

How Many Hidden Nodes? We have indicated the number of layers needed. However, no indication is provided as to the optimal number of nodes per layer. There is no formal method to determine this optimal number; typically, one uses trial and error.

Hidden Units Q3(%) 0 62.50 5 61.60 10 61.50 15 62.60 20 62.30 30 62.50 40 62.70 60 61.40

JNET AND JPRED CHRISSY ORIOL

JNET • Multiple Alignement • Neural Network • Consensus of methods

TRAINING AND TESTS • 480 proteins train (1996 PDB) • 406 proteins test (2000 PDB) • Blind test • 7-fold cross validation test

MULTIPLE ALIGNMENTS

ALIGNMENTS • Multiple sequence alignment constructed • Generation of profiles • Frequency counts of each residue / total residue in the column (expressed as percentage) • Each residue scored by its value from BLOSUM62 and the scores were averaged based on the number of sequence in that column • Profile HMM generated by HMMER2 • PSI-BLAST (Position Specific Iterative Basic Local Alignment Search Tool) • Frequency of residue • PSSM (Position Specific Scoring Matrix)

HMM PROFILE • Uses: • Statistical descriptions of a sequence family's consensus • Position-specific scores for residues, insertions and deletions • Profiles: • Captures important information about the degree of conservation at different positions • Varying degree to which gaps and insertions and deletions are permitted

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS

Presentation Transcript

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks for Secondary Structure Prediction

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial neural networks

PREDICTING INTERPLANETARY SHOCKS USING NEURAL NETWORKS

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Classifying Protein Secondary Structure Using Hybrid Neural Networks