1 / 80

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS. Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih. Proteins And Secondary Structure. Sudhakar Reddy. Project Goals. To predict the secondary structure of a protein using artificial neural networks. STRUCTURES.

cais
Télécharger la présentation

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih

  2. Proteins And Secondary Structure Sudhakar Reddy

  3. Project Goals • To predict the secondary structure of a protein using artificial neural networks.

  4. STRUCTURES • Primary structure: linear arrangement of amino acid (a.a) residues that constitute the polypeptide chain.

  5. SECONDARY STRUCTURE • Localized organization of parts of a polypeptide chain, through hydrogen bonds between different residues. • Without any stabilizing interactions , a polypeptide assumes random coil structure. • When stabilizing hydrogen bond forms, the polypeptide backbone folds periodically in to one of two geometric arrangements viz. • ALPHA HELIX • BETA SHEET • U-TURNS

  6. ALPHA HELIX • A polypeptide back bone is folded in to spiral that is held in place by hydrogen bonds between backbone oxygen atoms and hydrogen atoms. • The carbonyl oxygen of each peptide bond is hydrogen bonded to the amide hydrogen of the a.a 4 residues toward the C-terminus • Each alpha helix has 3.6 a.a per turn • From the backbone side chains point outward • Hydrophobic/hydrophilic quality of the helix is determined entirely by side chains, because polar groups of the peptide backbone are already involved H-bonding in the helix and thus are unable to affect its hydrophobic/hydrophilic.

  7. ALPHA HELIX

  8. THE BETA SHEET • Consists of laterally packed beta strands • Each beta strand is a short (5-8 residues), nearly fully extended polypeptide chain • Hydrogen bonding between backbone atoms in a adjacent beta strands, within either the same or different polypeptide chains forms a beta sheet. • Orientation can be either parallel or anti-parallel. In both arrangements side chains project from both faces of the sheet.

  9. THE BETA SHEET

  10. THE BETA SHEET

  11. TURNS • Composed of 3-4 residues , are compact, U-shaped secondary structures stabilized by H-bonds between their end residues. • Located on the surface of the protein, forming a sharp bend that redirects the polypeptide backbone back toward the interior. • Glycine and proline are commonly present. • Without these turns , a protein would be large, extended and loosely packed.

  12. TURNS

  13. MOTIFS • MOTIFS: regular combinations of secondary structure. • Coiled coil motif • Helix-loop-helix(Ca+) • Zinc finger motif.

  14. COILED-COIL MOTIF

  15. HELIX-LOOP-HELIX (CA+)

  16. ZINC-FINGER MOTIF

  17. FUTURE • Protein structure identification is key to understanding biological function and its role in health and disease • Characterizing a protein structure helpful in the development of new agents and devices to treat disease • Challenge of unraveling the structure lies in developing methods for accurately and reliably understanding this relationship • Most of the current protein structures have been characterized by NMR and X-Ray diffraction • Revolution in sequencing studies-growing data base-only 3000 known structures

  18. ADVANTAGE • Very few confirmations of protein are possible and structure and sequence are directly related to each other, we can unravel the secondary structure by developing an efficient algorithm, which compares new sequences with the ones available, and use them in health care industry.

  19. WHY SECONDARY STRUCTURE? • Prediction of secondary structure is an essential intermediate step on the way to predicting the full 3-D structure of a protein • If the secondary structure of a protein is known, it is possible to derive a comparatively small number of possible tertiary structures using knowledge about the ways that secondary structural elements pack

  20. Artificial Neural Network (ANN) Peichung Shih

  21. Biological Neural Network

  22. X1k : Input from X1 X2k : Input from X2 W1k : Weight of X1 W2k : Weight of X2 1 -1 -1 X0k : Bias term W0k : Weight of bias term : Threshold : Threshold S Q S Q : Nonlinear function : Nonlinear function qk : Output of node k Artificial Neural Network X1k : Input from X1 X2k : Input from X2 W1k : Weight of X1 W2k : Weight of X2 X0k : Bias term W0k : Weight of bias term qk : Output of node k

  23. Artificial Neural Network - Example X0 = 1 W0 = 2 W1 = 1 F(x) = ( 1 + e-x )-1 X1= 1 + + -1 Q = 6 -1 W2 = 2 X2 = 2 7 1 Output 1

  24. Paradigms of ANN - Overview • Perceptron • Adaline & Madaline • Backpropagation (BP)

  25. Paradigms of ANN - Feedforward

  26. Paradigms of ANN - feedback

  27. Paradigms of ANN - supervised

  28. Paradigms of ANN - Unsupervised

  29. Paradigms of ANN - Overview • Perceptron • Adaline & Madaline • Backpropagation (BP)

  30. Perceptron • One of the earliest learning networks was proposed by Rosenblatt in the late 1950's. RULE: net = w1I1 + w2I2 if net > Q then output = 1, otherwise o = 0. MODEL:

  31. Output correct? y Q = Q W = W N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 - 0.5 I = 0 I = 1 I = 0 I = 1 + 0.5 W = W W = W - 1 W = W 1 1 • Perceptron Example : AND Operation Initial Network: W = W + 1

  32. Output correct? y Q = Q W = W 0 N 0.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 - 0.5 + 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  33. Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  34. Output correct? y Q = Q W = W 1 N 1.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 0.5 I = 0 I = 1 I = 0 I = 1 0 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  35. Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 0.5 I = 0 I = 1 I = 0 I = 1 0 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  36. Output correct? y Q = Q W = W 0 N 0.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 1.5 0.5 0.5 1.5 I = 0 I = 1 I = 0 I = 1 1 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  37. Output correct? y Q = Q W = W 1 N 1.5 O = 1 ; T = 0 O = 0 ; T = 1 Q = 0.5 Q = Q + 1 Q = Q - 1 0.5 1.5 1.5 I = 0 I = 1 I = 0 I = 1 1 0 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  38. Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 1.5 I = 0 I = 1 I = 0 I = 1 0 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  39. Output correct? y Q = Q W = W 0 N O = 1 ; T = 0 O = 0 ; T = 1 Q = 1.5 Q = Q + 1 Q = Q - 1 0.5 1.5 I = 0 I = 1 I = 0 I = 1 0 1 W = W W = W - 1 W = W W = W + 1 • Perceptron Example : AND Operation

  40. 1 (0, 1) (1, 1) 1 0 (0, 0) (1, 0) XOR 0 AND OR • Hidden Layer

  41. 0 ?? • Hidden Layer

  42. Hidden Layer 0.5 1 1 - 2 1 1.5 1 1 1 1 1 1 1

  43. How Many Hidden Nodes? We have indicated the number of layers needed. However, no indication is provided as to the optimal number of nodes per layer. There is no formal method to determine this optimal number; typically, one uses trial and error.

  44. Hidden Units Q3(%) 0 62.50 5 61.60 10 61.50 15 62.60 20 62.30 30 62.50 40 62.70 60 61.40

  45. JNET AND JPRED CHRISSY ORIOL

  46. JNET • Multiple Alignement • Neural Network • Consensus of methods

  47. TRAINING AND TESTS • 480 proteins train (1996 PDB) • 406 proteins test (2000 PDB) • Blind test • 7-fold cross validation test

  48. MULTIPLE ALIGNMENTS

  49. ALIGNMENTS • Multiple sequence alignment constructed • Generation of profiles • Frequency counts of each residue / total residue in the column (expressed as percentage) • Each residue scored by its value from BLOSUM62 and the scores were averaged based on the number of sequence in that column • Profile HMM generated by HMMER2 • PSI-BLAST (Position Specific Iterative Basic Local Alignment Search Tool) • Frequency of residue • PSSM (Position Specific Scoring Matrix)

  50. HMM PROFILE • Uses: • Statistical descriptions of a sequence family's consensus • Position-specific scores for residues, insertions and deletions • Profiles: • Captures important information about the degree of conservation at different positions • Varying degree to which gaps and insertions and deletions are permitted

More Related