1 / 43

Model Building, Refinement, and Validation

Model Building, Refinement, and Validation. What can one see?. will determine what can be ascertained will determine which parameters can be refined resolution-dependent note about maps: contoured in standard deviations ( s ) from the mean (which is 0.0)

Télécharger la présentation

Model Building, Refinement, and Validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Building, Refinement, and Validation

  2. What can one see? • will determine what can be ascertained • will determine which parameters can be refined • resolution-dependent • note about maps: • contoured in standard deviations (s) from the mean (which is 0.0) • experimental-type maps contoured at 1-1.25s • difference maps contoured at ~±3s

  3. 1.8Å

  4. Fitting a Model into Density • start by tracing the backbone • a-helices easiest to identify, b-sheets are harder • sometimes loops might be untraceable until later in the process (or never) • side chains come later • check known rotamers first • many rounds of rebuilding might be necessary

  5. Skeletonization of the Density

  6. Adding Side Chains

  7. How does this Process Progress? • first build is usually mostly backbone, some side chains • cycles of building and refinement until model ceases to improve • waters, ion, ligands are typically added towards the end of the process

  8. Refinement • based on what I see, what can I refine?

  9. Constraints and Restraints • used to overcome poor data:parameter • atoms not "free"  improved convergence • geometric restraints: • bond lengths and angles found in protein structures well-known from small molecule x-ray crystallography • penalize excessive deviations from these values • planarity restraints for rings and planar end groups

  10. Non-crystallographic Symmetry • restraint: each copy of molecule in asymmetric unit must have rmsd for all atoms below user-defined value when compared with each other • constraint: all copies in the asymmetric unit must be identical • are the molecules identical? • strong density in averaged map a clue • e.g. (1.5 + 1.8)/2 = 1.65; (1.5 + 0.0)/2 = 0.7

  11. B: The Temperature Factor • describes mean displacement from average position • higher B = more mobile = less well ordered • mx = my = mz if B is isotropic, need 3x3 matrix if B is anisotropic

  12. Evaluation of Refinement • R-factors (Rwork and Rfree): • Rfree is the same as Rwork but calculated for a percentage of the data (5-10%) not included in the refinement • if model really improves, Rfree should decrease along with Rwork

  13. Rules of Thumb for Rwork and Rfree • depends on resolution • for most structures, Rfree should be less than ~28%, and the spread between Rwork and Rfree should be ~5% or less • very low resolution structures (3.5Å and lower) might not conform to this • careful not to overstate conclusions

  14. Why are R-factors so High? • geometrical restraints not sophisticated enough to find true minimum (R ~ error in data, or 4-9%) • R-factors higher at lower resolution because less data but same number of geometrical parameters to be satisfied • result is numerous very small errors (0.01-0.1Å) in coordinate positions

  15. Mechanics of Refinement • perturb x, y, z, and B such that Fobs and Fcalc come into maximal agreement • the old way: least-squares minimization • assuming errors follow a Gaussian distribution, minimization would take following form: is the predicted value of xj is the standard deviation for the measurement xj

  16. More Least Squares • general form for refinement would be: • add geometrical constraints and in practice: • real space equivalent of the x-ray part is:

  17. Still More Least Squares • two ways to minimize : • improve the model • introduce systematic errors that obliterate difference density • note that the s2 weighting term is eliminated—empirically shown to converge poorly • sign that least squares not appropriate • also have to incorporate higher resolution data later in refinement

  18. Why not Least Squares? • phases of model in term treated as error-free • model completeness not taken into account • leads to bias towards existing model • all measurements treated as having equal information content • i.e. a F with F/s = 50 weighted the same as an F with F/s = 2 • additional phase information not easy to incorporate

  19. Maximum Likelihood to the Rescue • if we move an atom, how it is moved depends on the position of all other atoms • if they're not in the right place and we assume they are, our choice of move for the target atom will not place it correctly • we need an estimate of model accuracy and completeness to help guide this process • maximum likelihood allows us to explicitly account for errors in the model, completeness of model, errors in data • additional phase info easily incorporated

  20. Some Mathematical Background • in these slides "|" means "given", P is probability, and L is likelihood • assume errors in observations independent:

  21. Nasty Math Shown for Effect, not to be Fully Understood, let alone Memorized • need to know what joint conditional probability of observations given current model, "P(obs|mod)", looks like: • the above is for an acentric reflection • worked out in the '60s (i.e. before cable TV)

  22. Take-home Message from Previous Slide • Sq = amount of missing scattering matter • Sp = mass accounted for by current model • D reflects errors in current model • is the error of a given reflection •  P(obs|mod) depends upon the magnitude of Fo and Fc, the errors in the Fo, the completeness of the current model, and the accuracy of the current model

  23. Maximum Likelihood • want to maximize: • equivalent to minimizing negative logarithm(LLK is "log-likelihood"): • P0(mod) is our geometric restraints term ( ), and replaces • cast in similar form as LSQ, but with more complicated terms that reflect complexity of the problem • less biased towards model, data properly weighted

  24. Additional Phase Information (e.g. MIR or MAD) • easily incorporated in maximum likelihood (unlike least-squares) as experimental constraints in the refinement process: becomes !

  25. Aside: Scaling Fo to Fc • not as easy as you think • Fc calculated essentially in vacuum, whereas real crystal (source of Fo ) has bulk solvent (i.e. not ordered waters you can see) • bulk solvent tends to dampen low resolution reflections (pun intended) • poor scaling can mess up refinement • in olden times, would exclude all reflections below 8Å from refinement, despite fact they're the most accurately measured

  26. Bulk Solvent Correction: Exponential Scaling Model • assumes and have exactly opposite phases • only really true to ~15Å • for ksol=0.75, Bsol=200: • 15Å reflection — • 4Å reflection —

  27. Bulk Solvent Correction: The Mask Model • mask out the protein, calculate structure factors for everything outside the protein mask: • no assumption about solvent phases • ksolv and Bsolv determined by LSQ fit • mask also optimized • more robust than exponential scaling model

  28. Rebuilding • after a round of refinement, model phases should be markedly improved, need for rebuilding evident • side chains added • loops built • waters/ions/ligands added • incorrectly-built areas remodeled

  29. The "2Fo-Fc" Map

  30. The "2Fo-Fc" Map • is our approximation to:

  31. Maximum Likelihood Maps • 2Fo-Fc type map: • m = figure of merit of model phases • D = weight reflective of errors in model • difference map:

  32. Validation • most obvious validation is Rfree • SFCHECK checks structure against data • other methods are model-based • all involve comparing present structure to well-refined structures in a database • some deviations from "standard" parameters will be functionally and/or structurally necessary • others will be errors in building

  33. Procheck • very thorough check of a variety of geometry-based criteria • Ramachandran plot • main chain bond lengths and angles • planarity of rings and end groups (R,D,N,E,Q) • torsion angles, chirality • close non-bonded interactions, main chain H-bonds, disulfide bond geometry • residue by residue analysis of most of the above

  34. Errat • analyzes statistics of non-bonded interactions between different atom types • highlights unusual regions, giving "confidence level" that a region is in error • anything above the 99% confidence level in most cases needs to be rebuilt

  35. Verify3D • 3D-1D profile analysis of structure versus its own sequence • if residue is in an unusual chemical environment, it will receive a bad score and should be inspected • environment defined by: • area of residue buried • fraction covered by polar atoms • local secondary structure

  36. PROVE • analyzes departures from standard atomic volumes • presented as "Z-score" or RMS(Z-score): >3  BAD! ≥2  BAD!

More Related