1 / 116

2d-3D Structure Modelling

2d-3D Structure Modelling. S. Shahriar Arab. Flow of information. DNA. RNA. PROTEIN SEQ. PROTEIN STRUCT. PROTEIN FUNCTION. ………. Prediction in bioinformatics. Important prediction problems: Protein sequence from genomic DNA Protein 3D structure from sequence

valmai
Télécharger la présentation

2d-3D Structure Modelling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2d-3D Structure Modelling • S. Shahriar Arab

  2. Flow of information DNA RNA PROTEIN SEQ PROTEIN STRUCT PROTEIN FUNCTION ……….

  3. Prediction in bioinformatics • Important prediction problems: • Protein sequence from genomic DNA • Protein 3D structure from sequence • Protein function from structure • Protein function from sequence

  4. Why predict protein structure? • The sequence structure gap • Over millions known sequences, 80 000 known structures • Structural knowledge brings understanding of function and mechanism of action • Can help in prediction of function

  5. Why predict protein structure? • Predicted structures can be used in structure based drug design • It can help us understand the effects of mutations on structure or function • It is a very interesting scientific problem • still unsolved in its most general form after more than 20 years of effort

  6. What is protein structure prediction? • In its most general form • a prediction of the (relative) spatial position of each atom in the tertiary structure generated from knowledge only of the primary structure (sequence)

  7. Methods of structure prediction • Ab initio protein folding approaches • Comparative (homology) modelling • Fold recognition/threading

  8. Prediction in one dimension • Secondary structure prediction • Surface accessibility prediction

  9. 2D Structure Identification • DSSP - Database of Secondary Structures for Ps (http://swift.cmbi.kun.nl/gv/dssp/) • VADAR - Volume Area Dihedral Angle Reporter (http://redpoll.pharmacy.ualberta.ca/vadar/) • PDB - Protein Data Bank (www.rcsb.org) QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCA HHHHHHCCEEEEEEEEEEECCHHHHHHHCCCCCCC

  10. - Secondary Structure • The DSSP code • H = alpha helix • B = residue in isolated beta-bridge • E = extended strand, participates in beta ladder • G = 3-helix (3/10 helix) • I = 5 helix (pi helix) • T = hydrogen bonded turn • S = bend • C= coil -

  11. Simplifications • Eight states from DSSP • H: α−helix • G: 310 helix • I: π-helix • E: β−strand • B: bridge • T: β−turn • S: bend • C: coil • CASP Standard • H = (H, G, I), • E = (E, B), • C = (C, T, S) • Identification of secondary structures focused on • α-helices • β -strands • others (turns, coils, other helices) are collectively called “coils”

  12. What is Secondary structure prediction? • Given a protein sequence (primary structure) GHWIATRGQLIREAYEDYRHFSSECPFIP • Predict its secondary structure content • (C=coils H=Alpha Helix E=Beta Strands) CEEEEECHHHHHHHHHHHCCCHHCCCCCC

  13. Why Secondary Structure Prediction? • Simply easier problem than 3D structure prediction • Accurate secondary structure prediction can be an important information for the tertiary structure prediction • Improving alignment accuracy • Protein function prediction • Protein classification

  14. secondary structure prediction • less detailed results • only predicts the H (helix), E (extended) or C (coil/loop) state of each residue, does not predict the full atomic structure • Accuracy of secondary structure prediction • The best methods have an average accuracy of just about 73% (the percentage of residues predicted correctly)

  15. History of protein secondary structure prediction • First generation • How: single residue statistics • Example: Chou-Fasman method, LIM method, GOR I, etc • Accuracy: low • Secondary generation • How: segment statistics • Examples: ALB method, GOR III, etc • Accuracy: ~60% • Third generation • How: long-range interaction, homology based • Examples: PHD • Accuracy: ~70%

  16. Chou-Fasman Method • Developed by Chou & Fasman in 1974 & 1978 • Based on frequencies of residues in α-helices, β-sheets and turns • Accuracy ~50 - 60% Q3

  17. Chou-Fasman statistics • R – amino acid, S- secondary structure • f(R,S) – number of occurrences of R in S • Ns – total number of amino acids in conformation S • N – total number of amino acids • P(R,S) – propensity of amino acid R to be in structure S • P(R,S) = (f(R,S)/f(R))/(Ns/N)

  18. Example • #residues=20,000, • #helix=4,000, • #Ala=2,000, • #Ala in helix=500 • f(Ala, α) = 500/20,000, • f(Ala) = 2,000/20,000 • p(α) = Να/Ν=4,000/20,000 • P = (500/2000) / (4,000/20000) = 1.25

  19. Chou-Fasman Statistics

  20. Amino acid propensities

  21. Scan peptide for α−helix regions 2. Identify regions where 4/6 have a P(H) >100 “alpha-helix nucleus”

  22. Extend α-helix nucleus 3. Extend helix in both directions until a set of four residues have an average P(H) <100. Repeat steps 1 – 3 for entire peptide

  23. Scan peptide for β-sheet regions 4. Identify regions where 3/5 have a P(E) >100 “β-sheet nucleus” 5. Extend β-sheet until 4 continuous residues an have an average P(E) < 100 6. If region average > 105 and the average P(E) > average P(H) then “β-sheet”

  24. The GOR method • developed by Garnier, Osguthorpe& Robson • build on Pij values based on information theory • evaluate each residue PLUS adjacent 8 N-terminal and 8 carboxyl-terminal residues • sliding window of 17 • GOR III method accuracy ~64% Q3

  25. Second generation

  26. GOR idea: Statistics that take into account the whole window • Each residue caries two different types of information: • Intra-residue information – information about it’s own secondary structure • Inter-residue information – the influence of this residue on other residue

  27. GOR….continued • Individual propensity of amino acid R to be in secondary structure S.– same idea as in Chou – Fasman • Contribution of 16 neighbors. • - take the window of radius 8 around the residue in question (8 before and 8 after the residue) • - for each residue in the window consider it’s contribution to the conformation of the middle residue and this it’s value to PH, PS, PC. • -Like in Chou-Fasman the values of all contributions are based on statistics.

  28. Third generation

  29. Nearest Neighbour Method • Idea: similar sequences are likely have same secondary structure. • Take a window around amino acid the conformation of which is to be predicted • Find several, say k, closest sequences (with respect to a similarity measure defined differently depending on the variant of the method) of known structure. • Assign secondary structure based on conformation of the sequence neighbours. • Use max (nα, nβ, nc) or max(sα, sβ, sc) • Key: Scoring measure of evolutionary similarity. • Salamov, Solovyev NNSSP (1995) accuracy above 70%

  30. Neighbours 1 - LH H H H H HL L - S1 2 - LL H H H H HL L - S2 3 - L E E E E E E L L - S3 4 - L E E E E E E L L - S4 n - LL L L E E E E E - Sn n+1 - HH H L L LE E E - Sn+1 : • max (nα, nβ, nL) or max (Σsα, Σsβ, ΣsL) or something else…

  31. Advantages • Information from structural neighbours can be used to provide details to predicted secondary structure (phi,psi angles) • Much higher accuracy than previous methods.

  32. Neural network models • machine learning approach • provide training sets of structures (e.g. α-helices, non α -helices) • computers are trained to recognize patterns in known secondary structures • provide test set(proteins with known structures) • accuracy ~ 70 –75%

  33. Neural Network Method Recall artificial neurone:

  34. How PHD works • Step 1. BLAST search with input sequence • Step 2. Perform multiple seq. alignment and calculate aa frequencies for each position

  35. How PHD works (cont.) • Step3. Level 1: sequence to structure • Take window of 13 adjacent residues • Scores for helix, strand, loop in the output layer, for each residue

  36. Prediction tools that use NNs • MACMATCH • (Presnell et al., 1993) • for Macintoch • PHD • - (Rost & Sander, 1993) • http://www.predictprotein.org/ • NNPREDICT • (Kneller et al. 1990) • http://www.cmpharm.ucsf.edu/nomi/nnpredict.html

  37. PHD Prediction of rCD2

  38. Prediction Accuracy

  39. Best of the Best • PredictProtein-PHD (72%) • http://www.predictprotein.org/ • Jpred (73-75%) • http://jura.ebi.ac.uk:8888/ • PREDATOR (75%) • http://www.embl-heidelberg.de/cgi/predator_serv.pl • PSIpred (77%) • http://insulin.brunel.ac.uk/psipred

  40. Solvent Probe Accessible Surface Reentrant Surface Van der Waals Surface Accessible Surface Area

  41. ASA Calculation • DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp) • VADAR - Volume Area Dihedral Angle Reporter (http://redpoll.pharmacy.ualberta.ca/vadar/) • GetArea - www.scsb.utmb.edu/getarea/area_form.html QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAMD BBPPBEEEEEPBPBPBPBBPEEEPBPEPEEEEEEEEE 1056298799415251510478941496989999999

  42. Other ASA sites • Connolly Molecular Surface Home Page • http://www.biohedron.com/ • Naccess Home Page • http://sjh.bi.umist.ac.uk/naccess.html • ASA Parallelization • http://cmag.cit.nih.gov/Asa.htm • Protein Structure Database • http://www.psc.edu/biomed/pages/research/PSdb/

  43. Accessibility • Accessible Surface Area (ASA) • in folded protein • Accessibility = • Maximum ASA • Two state = b(buried) ,e(exposed) • e.g. b<= 16% e>16% • Three state = b(buried),I(intermediate), e(exposed) • e.g. b<=16% 16%>i,<36% e>36%

  44. QHTAW... QHTAWCLTSEQHTAAVIW BBPPBEEEEEPBPBPBPB Accessibility Prediction • PredictProtein-PHDacc(58%) • http://cubic.bioc.columbia.edu/predictprotein • PredAcc (70%?) • http://condor.urbb.jussieu.fr/PredAccCfg.html

  45. PHD Prediction of rCD2

  46. 3D structure prediction

  47. New folds Existing folds Building by homology Ab initio prediction Threading 0 10 20 30 40 50 60 70 80 90 100 similarity (%) 3D structure prediction of proteins

  48. Choice of prediction methods • If you can find similar sequences of known structure then comparative modelling is the best way to predict structure • all other methods are less reliable • Of course, you can’t always find similar sequences of known structure.

  49. When you can’t do comparative modelling? • Secondary structure prediction • Fold recognition/threading • Ab initio protein folding approaches

  50. Divergent evolution • Different proteins in different organisms have diverged from a common ancestor protein • Each copy of this ancestor in various organisms has been subject to mutations, deletions, and insertions of amino acids in its sequence • In general, its 3-D fold and function have remained similar

More Related