1 / 21

Two cases of chemometrics application in protein crystallography

Two cases of chemometrics application in protein crystallography. European Molecular Biology Laboratory (EMBL), Hamburg, Germany. Andrey Bogomolov. Outline. Protein crystallography: a brief introduction

blaise
Télécharger la présentation

Two cases of chemometrics application in protein crystallography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov

  2. Outline • Protein crystallography: a brief introduction • Case I: determination of protein secondary structure from the raw diffraction data using PLS-R • Case II: modeling of crystal radiation damage • Potential applications of chemometric techniques to crystallography (of biological macromolecules)

  3. Protein crystallography: introduction • Protein (macromolecular) crystallography is a scientific discipline that studies… • biological objects: proteins, DNA, RNA etc. … • by physical means: X-ray diffraction, synchrotron radiation … • on the chemical level: 3D-structure, complexes, interactions … • with the extensive use of mathematics: data analysis, modeling • The main objectives: • solve 3D-structure of a molecule • explain its biological function at the atomic level • Today’s hot topic: • drug design • part of the global “-omics” project (genomics/proteomics)

  4. Protein crystallography workflow protein (DNA, RNA) solution expression& purification crystallization data collection phasing structure solution

  5. Protein crystallography workflow protein crystal expression& purification crystallization data collection phasing structure solution

  6. Protein crystallography workflow diffraction pattern expression& purification crystallization data collection phasing structure solution

  7. Protein crystallography workflow electron density map expression& purification crystallization data collection phasing structure solution

  8. Protein crystallography workflow 3D structure expression& purification crystallization data collection phasing structure solution

  9. Protein Data Bank (PDB) Global data collection (>30000 records) • www.pdb.org • 3D structures • experimental data • biological and chemical information

  10. control optimization theoretical experimental Crystallographic data collection: Wilson plot X-ray beam

  11. Case I: Determination of protein secondary structure Problem: • determine the contents (fractions of the polypeptide chain) of secondary structure elements in a protein molecule from the raw diffraction data (Wilson plot) • well established method for CD and IR spectra of protein solutions • PLS regression – one of the best methods • Wilson plot: only qualitative data on existing correlation for “theoretical” data α-helix β-sheet

  12. theoretical experimental *) experimental data only Secondary structure determination: data Data Preprocessing: • averaging with an optimal bin size* • special scaling (correction for anisotropic B-factor)* • taking the natural logarithm • conversion into the matrix (Wilson plots in rows)* • auto-scaling • outliers detection and removal*

  13. theoretical 1hq3 (α) 1at0 (β) experimental 1d5t (α+β) Secondary structure determination: data (2)

  14. Secondary structure determination: calibration results RMSEP & correlation coefficients for different methods α-helix (theoretical) *) Resolution (1/d) = 0.52Å-1 (~1.9 Å) • S. Navea, R. Tauler, A. de Juan, Elucidation of protein secondary structure, Anal. Biochem. 336 (2005) 231–242 • K.A. Oberg, J.-M. Ruysschaert, and E. Goormaghtigh, The optimization of protein secondary structure determination with infrared and circular dichroism spectra, Eur. J. Biochem. 271 (2004) 2937-2948

  15. Case II: Modeling radiation damage • Biological crystal exposed to X-rays undergoes radiation damage: • Modeling of radiation damage is important • understanding of the effect on the protein • optimization of data collection • Problem present state • no comprehensive theory of RD • specific effects are well-known, but it the main changes are non-specific • Suggestion by Gleb Bourenkov: • radiation dose has linear effect on atom’s B-factors • Task • check for linearity, find reason(s) of deviation

  16. Radiation damage modeling: data (trypsin)

  17. Radiation damage modeling: results r=0.999 RMSEP=9.4×10-3

  18. Conclusions • Multivariate data analysis has a great potential for protein crystallography • currently it is application is episodic • rarely goes beyond PCA • Method-centric approach would be beneficial: • “I have a method, I am looking for problems”

  19. X-files PCA, Factor Analysis crystallization, HTPC SIMCA, PLSD crystal screening Multivariate Regression crystal auto-mounting MSPC, Design Of Experiment data collection Curve Resolution data reduction Multivariate Image Analysis radiation damage Target Factor Analysis phasing PARAFAC, 3(multi)-way structure solution Wavelet Transform structure refinement

  20. Challenge Critical re-assessment of the entire protein crystallographic workflow with multivariate approach in mind– an ambitious project for chemometricians?

  21. Acknowledgements • Alexander Popov • Gleb Bourenkov • Victor Lamzin

More Related