1 / 37

Quality of Protein Crystal Structures in the PDB

Quality of Protein Crystal Structures in the PDB. Eric. N Brown, Lokesh Gakhar and S. Ramaswamy. Between objectivity and subjectivity Carl-Ivar Bränd´en & T. Alwyn Jones Department of Molecular Biology, Uppsala Biomedical Center, PO Box 590, S-751 24 Uppsala, Sweden.

chas
Télécharger la présentation

Quality of Protein Crystal Structures in the PDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quality of Protein Crystal Structures in the PDB Eric. N Brown, Lokesh Gakhar and S. Ramaswamy.

  2. Between objectivity and subjectivity Carl-Ivar Bränd´en & T. Alwyn Jones Department of Molecular Biology, Uppsala Biomedical Center, PO Box 590, S-751 24 Uppsala, Sweden. Protein crystallography is an exacting trade, and the results may contain errors that are difficult to identify. It is the crystallographer's responsibility to make sure that incorrect protein structures do not reach the literature. Nature 343, 687 - 689 (22 February 1990)

  3. Amplitudes and Phases - Bias.Animal stories - by Kevin Cowtan

  4. Amplitudes and Phases - Bias.More animal stories.

  5. Stolen from Bernhard Rupp website without permission

  6. How much of what we think? Stolen from --- James Holton, Berkeley, without permission.

  7. STRUCTURE VALIDATION Validation based on fit to DATAR-factor/R-free Real space fit, Etc. Problem: Data to parameter ratio. ADD GeometricRestraints - or Chemical Knowledge VALIDATION Based on Geometry WHATIF PROCHECK MOLPROBITY RAMACHANDRAN PLOT. COMPOSITE VALIDATION: ASTRAL - SPACI http://astral.Berkeley.edu/spaci.html

  8. WHY MORE? DON’T WE HAVE ENOUGH VALIDATION TOOLS? WHAT IS COMMON BETWEEN ALL EXISTING VALIDATION TECHNIQUES? THERE IS AN ABSOLUTE CORRECT ANSWER WE KNOW THERE IS NO CORRECT ANSWER

  9. THINK DIFFERENTLY • All crystallographers want to deposit the correct structure. • There is subjectivity and bias - all of which are random AVERAGE IS BEST !!

  10. QUALITY & AVERAGE • How different are you from the average is a measure of quality HOW DO YOU DESCRIBE THE AVERAGE?

  11. Quality of Model • Dependent Variables • R-factor • R-free • Real-space R-value • Real-space CC • Outliers • Ramachandran Violations • Independent Variables • Date submitted to PDB • Maximum resolution • X-Ray Source • Number of atoms • Similarity Index • Cross Terms

  12. Predictive Models Example: How To determine weight for 5’7” male . . . . . . make up an equation . . . . . . choose a group of males . . . . . . fit the equation to their weight . . . . . . evaluate equation.

  13. Open problems • What independent variables?Quality = f(resolution)Quality = f(resolution, date, x-ray source)‏ • What equation?Quality = a x resolution + b x date + cQuality = a x res + logb2(date) + c • How to fit it to observations? - Least squares vs. Maximum likelihood - Outliers

  14. PICK ALL AVAILABLE METRICS (R-factor/R-free etc.. ) and FOR EACH METRIC • Choose model based on LL • Start with Metric = a x resolution + C • Add or remove terms iteratively to decrease LL • Use BIC to decide if a new parameter contributes to significant decrease in LL or not • RESULT: An equation that predicts a given metric… • Data is all structures in the PDB that have all independent and dependent variables (16,609)‏

  15. EQUATIONS FOR METRICS!

  16. INFORMATION INHERENT IN THE MODEL Model can tell us immediately What independent variables affect what metrics (dependent variables) and by how much? Example: R-factor Vs time R-factor Vs source & resolution

  17. UNEXPLORED QUESTIONSIN THE MODEL? Unexplored Independent Variables : • R-sym and Redundancy • Space group and volume of unit cell? • Refinement protocol • Solvent modeling and B-factor modeling. • Temperature of data collection. • Complexity - as a function of number of chains of macromolecules.

  18. Nine - metrics to ONEPrincipal component analysis • We took the nine metrics and combined them to form one metric accounting for co-relations and redundancy. Now we have one metric which is what we can call Quality-values. • CONSTRUCTION of the Q-value of the average is zero. Negative numbers mean better than average - positive numbers worse than the average. Standard deviation is one.

  19. Q- value is now independent of all the independent variables used to make the model. (Resolution, number of atoms, date of data collection, novelty of structure etc..) Better indicator of quality than any one of the dependent variables. USE OF THE MODEL • COMPARE STRUCTURES WITH THE AVERAGE - INDIVIDUALLY AND AS A GROUP.

  20. STRUCTURAL GENOMICS (updated - Jan 2008)

  21. MCSG over Time!

  22. MORE-SG groups!

  23. Quality Vs. Journals

  24. WHAT CAN WE DO? • Beam lines. • Best practices. • Protocols and methodologies. • Countries. • Institutions. • Funding mechanisms. • Investigators.

  25. Is this the best we can do?

  26. WE CAN DO BETTER We improve quality of structures by better design of experiments and refinement protocols if we know what independent variables affect what dependent variables and how? • BEFORE WE DO THIS - FIX PROBLEMS THAT WE FOUND. • Too much dependence of external databases! • Problems with unknown atoms. • Develop methods for missing data correction.

  27. OTHER DATABASES - NMR Some thoughts on independent variables. • Spectrometers • Samples - size, tags, buffers etc.. • Completeness of Assignments - percentage of backbone assigned etc.. • Actual Data Used in Structural Calculations - NOE distance restraints, Hydrogen bond distance restraints (experimental vs. inferred), Torsion angle restraints, Dipolar coupling restraint, Paramagnetic restraint. • Structural Statistics • Date of structure determination. • Relaxation measurements?

  28. OTHER DATABASES - NMR DEPENDENT VARIABLES. • RMS deviation of Ensemble • Packing (Molprobity score?) • Ramachandran violations • Recall, Precision, F-measure (Huang, Powers and Montelione). • Agreement with high resolution X-ray structures • Other??

  29. AFTER Today's LECTURES HOW ABOUT THE MODEL DATABASE? I am sure out modeling experts can think of the dependent and independent variables….

  30. THANK YOU ACKNOWLEDGEMENT X-ray work - Eric N Brown and Lokesh Gakhar The R-statistical package! NMR work - Liping Yu and Andrew Fowler Thanks to Brian Fox for inviting me - though I am not a member of any SG initiative.

  31. Questions and Accusations.

More Related