810 likes | 990 Vues
PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS. Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih. Proteins And Secondary Structure. Sudhakar Reddy. Project Goals. To predict the secondary structure of a protein using artificial neural networks. STRUCTURES.
E N D
PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih
Proteins And Secondary Structure Sudhakar Reddy
Project Goals • To predict the secondary structure of a protein using artificial neural networks.
STRUCTURES • Primary structure: linear arrangement of amino acid (a.a) residues that constitute the polypeptide chain.
SECONDARY STRUCTURE • Localized organization of parts of a polypeptide chain, through hydrogen bonds between different residues. • Without any stabilizing interactions , a polypeptide assumes random coil structure. • When stabilizing hydrogen bond forms, the polypeptide backbone folds periodically in to one of two geometric arrangements viz. • ALPHA HELIX • BETA SHEET • U-TURNS
ALPHA HELIX • A polypeptide back bone is folded in to spiral that is held in place by hydrogen bonds between backbone oxygen atoms and hydrogen atoms. • The carbonyl oxygen of each peptide bond is hydrogen bonded to the amide hydrogen of the a.a 4 residues toward the C-terminus • Each alpha helix has 3.6 a.a per turn • From the backbone side chains point outward • Hydrophobic/hydrophilic quality of the helix is determined entirely by side chains, because polar groups of the peptide backbone are already involved H-bonding in the helix and thus are unable to affect its hydrophobic/hydrophilic.
THE BETA SHEET • Consists of laterally packed beta strands • Each beta strand is a short (5-8 residues), nearly fully extended polypeptide chain • Hydrogen bonding between backbone atoms in a adjacent beta strands, within either the same or different polypeptide chains forms a beta sheet. • Orientation can be either parallel or anti-parallel. In both arrangements side chains project from both faces of the sheet.
TURNS • Composed of 3-4 residues , are compact, U-shaped secondary structures stabilized by H-bonds between their end residues. • Located on the surface of the protein, forming a sharp bend that redirects the polypeptide backbone back toward the interior. • Glycine and proline are commonly present. • Without these turns , a protein would be large, extended and loosely packed.
MOTIFS • MOTIFS: regular combinations of secondary structure. • Coiled coil motif • Helix-loop-helix(Ca+) • Zinc finger motif.
FUTURE • Protein structure identification is key to understanding biological function and its role in health and disease • Characterizing a protein structure helpful in the development of new agents and devices to treat disease • Challenge of unraveling the structure lies in developing methods for accurately and reliably understanding this relationship • Most of the current protein structures have been characterized by NMR and X-Ray diffraction • Revolution in sequencing studies-growing data base-only 3000 known structures
ADVANTAGE • Very few confirmations of protein are possible and structure and sequence are directly related to each other, we can unravel the secondary structure by developing an efficient algorithm, which compares new sequences with the ones available, and use them in health care industry.
WHY SECONDARY STRUCTURE? • Prediction of secondary structure is an essential intermediate step on the way to predicting the full 3-D structure of a protein • If the secondary structure of a protein is known, it is possible to derive a comparatively small number of possible tertiary structures using knowledge about the ways that secondary structural elements pack
Artificial Neural Network (ANN) Peichung Shih
X1k : Input from X1 X2k : Input from X2 W1k : Weight of X1 W2k : Weight of X2 1 -1 -1 X0k : Bias term W0k : Weight of bias term : Threshold : Threshold S Q S Q : Nonlinear function : Nonlinear function qk : Output of node k Artificial Neural Network X1k : Input from X1 X2k : Input from X2 W1k : Weight of X1 W2k : Weight of X2 X0k : Bias term W0k : Weight of bias term qk : Output of node k
Artificial Neural Network - Example X0 = 1 W0 = 2 W1 = 1 F(x) = ( 1 + e-x )-1 X1= 1 + + -1 Q = 6 -1 W2 = 2 X2 = 2 7 1 Output 1
Paradigms of ANN - Overview • Perceptron • Adaline & Madaline • Backpropagation (BP)
Paradigms of ANN - Overview • Perceptron • Adaline & Madaline • Backpropagation (BP)
Perceptron • One of the earliest learning networks was proposed by Rosenblatt in the late 1950's. RULE: net = w1I1 + w2I2 if net > Q then output = 1, otherwise o = 0. MODEL:
Output correct? y Q = Q W = W N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 - 0.5 I = 0 I = 1 I = 0 I = 1 + 0.5 W = W W = W - 1 W = W 1 1 • Perceptron Example : AND Operation Initial Network: W = W + 1
Output correct? y Q = Q W = W 0 N 0.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 - 0.5 + 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
Output correct? y Q = Q W = W 1 N 1.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 0.5 I = 0 I = 1 I = 0 I = 1 0 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 0.5 I = 0 I = 1 I = 0 I = 1 0 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
Output correct? y Q = Q W = W 0 N 0.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 1.5 0.5 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
Output correct? y Q = Q W = W 1 N 1.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 1.5 I = 0 I = 1 I = 0 I = 1 1 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 1.5 I = 0 I = 1 I = 0 I = 1 0 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 1.5 I = 0 I = 1 I = 0 I = 1 0 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation
1 (0, 1) (1, 1) 1 0 (0, 0) (1, 0) XOR 0 AND OR • Hidden Layer
0 ?? • Hidden Layer
Hidden Layer 0.5 1 1 - 2 1 1.5 1 1 1 1 1 1 1
How Many Hidden Nodes? We have indicated the number of layers needed. However, no indication is provided as to the optimal number of nodes per layer. There is no formal method to determine this optimal number; typically, one uses trial and error.
Hidden Units Q3(%) 0 62.50 5 61.60 10 61.50 15 62.60 20 62.30 30 62.50 40 62.70 60 61.40
JNET AND JPRED CHRISSY ORIOL
JNET • Multiple Alignement • Neural Network • Consensus of methods
TRAINING AND TESTS • 480 proteins train (1996 PDB) • 406 proteins test (2000 PDB) • Blind test • 7-fold cross validation test
ALIGNMENTS • Multiple sequence alignment constructed • Generation of profiles • Frequency counts of each residue / total residue in the column (expressed as percentage) • Each residue scored by its value from BLOSUM62 and the scores were averaged based on the number of sequence in that column • Profile HMM generated by HMMER2 • PSI-BLAST (Position Specific Iterative Basic Local Alignment Search Tool) • Frequency of residue • PSSM (Position Specific Scoring Matrix)
HMM PROFILE • Uses: • Statistical descriptions of a sequence family's consensus • Position-specific scores for residues, insertions and deletions • Profiles: • Captures important information about the degree of conservation at different positions • Varying degree to which gaps and insertions and deletions are permitted