1 / 75

Jürgen Sühnel jsuehnel@fli-leibniz.de

- 2010- 3D Structures of Biological Macromolecules Part 5: Protein Structure Prediction. Jürgen Sühnel jsuehnel@fli-leibniz.de. Leibniz Institute for Age Research, Fritz Lipmann Institute ( FLI ) Jena Centre for Bioinformatics ( JCB )

kesia
Télécharger la présentation

Jürgen Sühnel jsuehnel@fli-leibniz.de

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. -2010- 3D Structuresof Biological Macromolecules Part 5: Protein Structure Prediction Jürgen Sühnel jsuehnel@fli-leibniz.de Leibniz Institute for Age Research, Fritz Lipmann Institute (FLI) Jena CentreforBioinformatics (JCB) Jena Centrefor Systems BiologyofAgeing (JenAge) Jena / Germany Supplementary Material: http://www.fli-leibniz.de/www_bioc/3D/

  2. PDB Content Growth 1993: 697 =1584 structures (~ 2 structures per day) 2003: 4168 = 23605 structures (~ 11 structures per day) 2005: 5366 = 34163 structures (~ 15 structures per day) 2008: 6986 = 54844 structures (~ 19 structures per day) 2009: 7419 = 62265 structures (~ 20 structures per day) (experimental structures only)

  3. PDB Content Statistics December 7, 2010 (Last Update: November 24)

  4. PDB Content Statistics January 19, 2010 (Last Update: January 5)

  5. UniProt/SwissProt: Growth Rate 19.01.2011

  6. UniProt/TrEMBL: Growth Rate 19.01.2011

  7. Swiss-Prot/TrEMBL: AminoAcidComposition Swiss-Prot TrEMBL 15-Jan-2008

  8. StructuralGenomics Structural genomics consists in the determination of the three dimensional structure of all proteins of a given organism, by experimental methods such as X-ray crystallography, NMR spectroscopy or computational approaches such as homology modelling. As opposed to traditional structural biology, the determination of a protein structure through a structural genomics effort often (but not always) comes before anything is known regarding the protein function. This raises new challenges in structural bioinformatics, i.e. determining protein function from its 3D structure. One of the important aspects of structural genomics is the emphasis on high throughput determination of protein structures. This is performed in dedicated centers of structural genomics. While most structural biologists pursue structures of individual proteins or protein groups, specialists in structural genomics pursue structures of proteins on a genome wide scale. This implies large scale cloning, expression and purification. One main advantage of this approach is economy of scale. On the other hand, the scientific value of some resultant structures is at times questioned. en.wikipedia.org/wiki/Structural_genomics

  9. StructuralGenomics

  10. Protein StructurePrediction

  11. Protein StructurePrediction

  12. A Good Protein Structure • Minimizes disallowed torsion angles • Maximizes number of hydrogen bonds • Minimizes interstitial cavities or spaces • Minimizes number of “bad” contacts • Minimizes number of buried charges

  13. Protein StructurePrediction – CAFASP Contest http://www.cs.bgu.ac.il/~dfischer/CAFASP5/

  14. Protein StructurePrediction – CASP Contest http://predictioncenter.gc.ucdavis.edu/

  15. Protein StructurePrediction – CASP Contest http://predictioncenter.gc.ucdavis.edu/

  16. Secondary structure • 3D structure • Modeling by homology (Comparative modeling) • Fold recognition (Threading) • Ab initio prediction • Rule-based approaches • Lattice models • Simulating the time dependence of folding • Refinement • Exploring the effect of single amino acid substitutions • Ligand effects on protein structure and dynamics (induced fit) Protein StructurePrediction

  17. Lysozyme

  18. Lysozyme – 5lyz

  19. Lysozyme – 5lyz

  20. Lysozyme – 5lyz: Information fromtheJenaLib Atlas Page

  21. Lysozyme – 5lyz: Information fromtheJenaLib Atlas Page

  22. Lysozyme – 5lyz: Information fromtheJenaLib Atlas Page

  23. Lysozyme – 5lyz: Information fromtheJenaLib Atlas Page - ProSite

  24. Lysozyme – 5lyz: PROSITESignature

  25. PROMOTIFSecondaryStructure Analysis – 5lyz . .

  26. Protein Backbone Torsion Angles D. W. Mount: Bioinformatics, Cold Spring Harbor Laboratory Press, 2001.

  27. Sidechain Torsion/DihedralAngles

  28. PROMOTIFSecondaryStructure Analysis – 5lyz

  29. PROMOTIFSecondaryStructure Analysis – 5lyz

  30. PROMOTIFSecondaryStructure Analysis – 5lyz

  31. Chou-FasmanSecondaryStructurePrediction

  32. AminoAcidPropensities From a database of experimental 3D structures, calculate the propensity for a given amino acid to adopt a certain type of secondary structure • Example: N(Ala)=2.000; N(tot)=20.000; N(Ala, helix)=568;N(helix)=4,000. P(Ala,helix) = [N(Ala,helix)/N(helix)] / [N(Ala)/N(tot)] P(Ala,helix) = [568/4.000]/[2.000/20.000] = 1.42 Used in Chou-Fasman algorithm

  33. Chou-FasmanSecondaryStructurePrediction • Assign all of the residues in the peptide the appropriate set of parameters. • Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(a-helix) > 100. • That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous • residues that have an average P(a-helix) < 100 is reached. That is declared the end of the helix. • If the segment defined by this procedure is longer than 5 residues and the average • P(a-helix) > P(b-sheet) for that segment, the segment can be assigned as a helix. • Repeat this procedure to locate all of the helical regions in the sequence. • Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of • P(b-sheet) > 100. That region is declared as a beta-sheet. Extend the sheet in both directions • until a set of four contiguous residues that have an average P(b-sheet) < 100 is reached. • That is declared the end of the beta-sheet. Any segment of the region located by this procedure • is assigned as a beta-sheet if the average P(b-sheet) > 105 and the average P(b-sheet) > P(a-helix) • for that region. • Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the • average P(a-helix) > P(b-sheet) for that region. It is a beta sheet if the average • P(b-sheet) > P(a-helix) for that region. • To identify a bend at residue number j, calculate the following value • p(t) = f(j)f(j+1)f(j+2)f(j+3) • where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and • the f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for • P(turn) > 1.00 in the tetrapeptide; and (3) the averages for the tetrapeptide obey the inequality • P(a-helix) < P(turn) > P(b-sheet), then a beta-turn is predicted at that location.

  34. Lysozyme – 5lyz: Chou-FasmanSecondaryStructurePrediction http://fasta.bioch.virginia.edu/fasta_www/chofas.htm

  35. Lysozyme – 5lyz: Chou-FasmanSecondaryStructurePrediction GRCE (0.57|0.98|0.70|1.39) 0.91 RCEL (0.98|0.70|1.39|1.41)1.12 CELA (0.70|1.39|1.41|1.42)1.23 ELAA (1.39|1.41|1.42|1.42) 1.41 http://fasta.bioch.virginia.edu/fasta_www/chofas.htm

  36. Lysozyme – 5lyz: PhD/PROF StructurePrediction PROF_sec: PROF predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop) PROF = PROF: Profile network prediction Heidelberg Rel_sec reliability index for PROF_sec prediction (0=low to 9=high) SUB_sec subset of the PROFsec prediction, for all residues with an expected average accuracy > 82% (tables in header) NOTE: for this subset the following symbols are used: L: is loop (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 5 O3_acc observed relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%. P3_acc PROF predicted relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%. Rel_acc reliability index for PROFacc prediction (0=low to 9=high) SUB_acc subset of the PROFacc prediction, for all residues with an expected average correlation > 0.69 (tables in header) NOTE: for this subset the following symbols are used: I: is intermediate (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 4 http://cubic.bioc.columbia.edu/predictprotein/submit_def.html#top

  37. Lysozyme – 5lyz: PhD/PROF StructurePrediction, BLAST http://cubic.bioc.columbia.edu/predictprotein/submit_def.html#top

  38. Lysozyme – 5lyz: PhD/PROF StructurePrediction, BLAST http://cubic.bioc.columbia.edu/predictprotein/submit_def.html#top

  39. Lysozyme – 5lyz: PhD/PROF StructurePrediction • Perform BLAST search to find local alignments • Remove alignments that are “too close” • Perform multiple alignments of sequences • Construct a profile (PSSM) of amino-acid frequencies at each residue • Use this profile as input to the neural network • A second network performs “smoothing” • The third level computes jury decision of several different instantiations of the first two levels. http://cubic.bioc.columbia.edu/predictprotein/submit_def.html#top

  40. Lysozyme – 5lyz: PsiPredStructurePrediction http://bioinf.cs.ucl.ac.uk/psipred/psiform.html

  41. PsiPred PSIPRED is a  simple and reliable secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST). Version 2.0 of PSIPRED includes a new algorithm which averages the output from up to 4 separate neural networks in the prediction process to further increase prediction accuracy. Using a very stringent cross validation method to evaluate the method's performance, PSIPRED 2.0 is capable of achieving an average Q3 score of nearly 78%. Predictions produced by PSIPRED were also submitted to the CASP4 server and assessed during the CASP4 meeting, which took place in December 2000 at Asilomar. PSIPRED 2.0 achieved an average Q3 score of 80.6% across all 40 submitted target domains with no obvious sequence similarity to structures present in PDB, which placed PSIPRED in first place out of 20 evaluated methods (an earlier version of PSIPRED was also ranked first in CASP3 held in 1998). http://bioinf.cs.ucl.ac.uk/psipred/psiform.html

  42. PSI-BLAST Position specific iterative BLAST (PSI-BLAST) refers to a feature of BLAST 2.0 in which a profile (or position specific scoring matrix, PSSM) is constructed (automatically) from a multiple alignment of the highest scoring hits in an initial BLAST search. The PSSM is generated by calculating position-specific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. The profile is used to perform a second (etc.) BLAST search and the results of each "iteration" are used to refine the profile. This iterative searching strategy results in increased sensitivity.

  43. ComparingSecondaryStructurePredictionResults PsiPred Chou-Fasman Phd/PROF

  44. ComparingSecondaryStructurePredictionResults

  45. Protein SecondaryStructurePrediction - Summary • 1st Generation - 1970s • Chou & Fasman, Q3 = 50-55% • 2nd Generation -1980s • Qian & Sejnowski, Q3 = 60-65% • 3rd Generation - 1990s • PHD, PSI-PRED, Q3 = 70-80% • Features of the new methods: • Taking into account evolutionary information • Neural networks • Failures: • Nonlocal sequence interactions • Wrong prediction at the ends of H/E Q3 – Percentage of correctly assigned amino acids in a test set

  46. Protein StructurePrediction http://speedy.embl-heidelberg.de/gtsp/flowchart2.html

  47. Modeling byHomology (Comparative Modeling) http://salilab.org/modeller/

  48. Modeling byHomology (Comparative Modeling) http://modbase.compbio.ucsf.edu/modbase-cgi-new/search_form.cgi

  49. Modeling byHomology (Comparative Modeling) http://modbase.compbio.ucsf.edu/modbase-cgi-new/search_form.cgi

  50. Modeling byHomology (Comparative Modeling) http://modbase.compbio.ucsf.edu/modbase-cgi-new/search_form.cgi

More Related