1 / 31

In silico Protein Design: Implementing Dead-End Elimination algorithm

In silico Protein Design: Implementing Dead-End Elimination algorithm. CS273 Tyrone Anderson, Yu Bai & Caroline E. Moore-Kochlacs May 31 st 2005. Computational protein design. Backbone scaffold. New sequence. Iterative refinement. Native structure.

Télécharger la présentation

In silico Protein Design: Implementing Dead-End Elimination algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In silico Protein Design: Implementing Dead-End Elimination algorithm CS273 Tyrone Anderson, Yu Bai & Caroline E. Moore-Kochlacs May 31st 2005

  2. Computational protein design Backbone scaffold New sequence Iterative refinement Native structure • Given backbone • coordinates, find the best sequence(s) with which the protein is stable.

  3. Components of the problem The protein design problem can be roughly divided into searching procedure and scoring function. • The searching procedure samples the sequence space AND side-chain conformational space to create conformations. • The scoring function evaluates each conformation created by the searching procedure. The evaluation scores are used to rank the conformations (and therefore the sequences) and pick the best one to be the final model.

  4. Why is searching procedure difficult ? • Consider a short protein with 20 amino acids. Possible sequence: • S = 2020 ~1026 • Each side chain has on average 2 dihedral angles (χ angles). Assuming that we will sample every 40º in the dihedral angle space, N = (360/40)(202) ~1038 • This number S*N is too large to be naively sampled • Algorithms that find good solutions by screening only parts of the search space are needed

  5. Rotamer libraries • Already in the 70s, Janin et al. showed that different side chain conformations are not found in equal distribution over the dihedral angle space but tend to cluster at specific regions of the space, much as in the Ramachandran plot. • In the 80’s, this observation was used to improve modeling of side chain conformations. • Today, essentially all programs that model side chain conformations use rotamer libraries.

  6. What do rotamer libraries provide? • Rotamer libraries reduce significantly the number of conformations that need to be evaluated during the search. • This is done with almost no risk of missing the real conformations. • Even small libraries of about 100-150 rotamers cover about 96-97% of the conformations actually found in protein structures. • The probabilities of each rotamer in the librarycan be applied to estimate the potential energy due to interactions within the side chain and with the local backbone atoms, using the Boltzmann distribution. (Not applied in this project) E  ln(P)

  7. Rotamer Library Creation • Source: • http://honiglab.cpmc.columbia.edu/programs/sidechain/rotamers.html • Parsing: • Select all Nitrogens (N), Oxygens (O), Alpha Carbons (CA) , & all other Carbons i.e CD, CZ, etc. • Exclude all other elements and the end of file • Store in a 3D array: Residues (1D)  Rotamers (2D)

  8. Rotamer Library Creation • Example: • Black: Include in array • Red: Exclude from array • Blue: *Not part of the array

  9. Aligning with the Backbone • Translate backbone and rotamer to origin • CA atom of ‘R’, 1 and backbone = (0, 0, 0) • Rotate rotamer around X-axis • Rotate rotamer around Z-axis • Translate rotamer back to original position based on original position of CA atom • i.e. CA atom of ‘R’, 1 = (3.99, -5.511 , 11.369)

  10. Rotamer Library Manipulation • Retrieve a specific rotamer: • Provide the residue and the rotamer number • i.e. ‘R’, 1  Gives you the 1st rotamer related to the Arginine residue • Rotamer is already aligned with the backbone • Only the coordinates of the atoms are returned in a 2D array

  11. Now, • Consider again our protein of 20 amino acids. Each side chain has on average 9 rotamers. Assuming that we search now in the space of rotamers: N = 920 ≈ 1019 • The searching space is restricted and oriented but the number of conformations is still too large for a naive search

  12. Algorithms in searching (side-chain) conformational space • Greed search(systematically scans the search space) • DEE (Algorithmic approaches to reduce the search space) • Self consistent algorithms(iterative sequential procedure) • Monte Carlo algorithms (random search)

  13. DEE (Dead-End Elimination) Aims to safely eliminates (clusters of)rotamers without loosing the GMEC (Global Minimum Energy Conformation). rotamer ir in force field of backbone only rotamer ir with rotamer(s) of other residues • Given residue i, eliminate a rotamer irif the minimum energy it can obtain by interaction with conformational background (js) is higher (worse) than the maximum possible energy that another rotamer it (of the same residue) can have

  14. E(i,j) is it js rotamer background Desmet et al., 1992

  15. The Goldstein improvement • Rotamer ir can be safely eliminated when some other rotamer itexists with lower (better) energy for a certain environment that mostly favors ir. • This criteria is much less restrictive and therefore more powerful. It requires though more computational time.

  16. The Goldstein improvement is E(i,j) it js rotamer background

  17. Scoring function: Energy function Terms: • Van der Waals • represents packing specificity • Hydrogen bonding • typically represented by an angle dependent, 12-10 hydrogen bond potential • Electrostatics • Guard against destabilizing interactions between like charged residues • Internal coordinate terms • ‘bonded’ energies • Solvation energy • Protein-solvent interactions • Entropy • Assumes conformational space is completely restricted in the folded state Gordon et al, 1999

  18. Scoring function: Energy function Terms: • Van der Waals • represents packing specificity • Hydrogen bonding • typically represented by an angle dependent, 12-10 hydrogen bond potential • Electrostatics • Guard against destabilizing interactions between like charged residues • Internal coordinate terms • ‘bonded’ energies • Solvation energy • Protein-solvent interactions • Entropy • Assumes conformational space is completely restricted in the folded state Gordon et al, 1999

  19. Van der Waals • Interaction between two uncharged atoms • Mildly attractive as two atoms approach from a distance • Repulsive as they approach too close • Represents packing specificity • Prefers native-like folded states with well-organized cores over disordered or molten-globule states Gordon et al, 1999

  20. Van der Waals http://employees.csbsju.edu/hjakubowski/classes/ch331/protstructure/ilennardjones2.gif • 12-6 Lennard-Jones potential • Standard approximation • R = distance between atoms • R0 = van der Waals radii • Dij = well depth • Variation from Kuhlman and Baker, 2004 • Erep is dampened to account for the fixed backbone and rotamer set being used.

  21. Electrostatics • Stability • Moderate temperatures: favorable electrostatic interactions not thought to be strong enough to compensate for the energy of desolvation • Extreme conditions: salt bridges may stabilize • Specificity • folding and functional interactions • maybe the more significant role of electrostatics • Currently, term guards against destabilizing interactions between like-charged residues Gordon et al, 1999

  22. Electrostatics • Approximations: • Coulomb’s Law (Gordon et al, 1999) • Qi,Qj = charge on amino acid • R = distance • ε= dielectric constant = 40 • Bayesian version (Kuhlman & Baker, 2004) • Probability of two amino acids close together given environment and distance (from PDB) • aa=amino acid, d = distance, env =environment

  23. Solvation • Hydrophobic effects drive folding, modeling solvation effects is critical to a protein design force field • Computationally expensive • Solvent model from Lazaridis and Karplus, 1999 • dij = distance between atoms, rij = van der Waals radii, Vi = atomic volume • ΔGref = reference solvation free energy, ΔGfree = solvation free energy of free (isolated) group • λ = correlation length

  24. Energy Function: Incomplete model • Current standard models include Bayesian terms based on PDB statistics • Several terms have not been thoroughly validated as useful for design (Gordon et al, 1999) • Hydrogen bonding • Electrostatics • Internal coordinates • Current standard models are ad hoc, physical quantities and variables are weighted based on “what works best”

  25. Integrated algorithm schema N1 N2 N3 ..N1I2L3D2E1F2. .. . . . . .D1. . . ..N1L2L3K2N1V1. .. ..W7L3D2K9K10G1. .. Best seq . . . . .D2. . . 2nd DEE Exhaustive search 1st order DEE ..N1 . . .D2... D … N … . . . N1 . . .

  26. Design cold-shock protein (core) & Trp-Cage protein Trp-Cage(1L2Y.pdb) 20 residues Cold-shock protein (1MJC.pdb) 10 residues (core)

  27. 2 3 0 1 7 8 6 4 9 5 cold-shock protein (core) After 1st-order DEE Hydrophobic Amino acids: A (1), F (3), I (3), L (2), V (2), W(7)

  28. Trp-Cage protein After 1st-order DEE . . . Residue 9 A: 1 C: 1,2 D: 1...7 E: 1...23 F: 1,2,3 G: 1 H: 1...8 I: 1,2,3 K: 1...87 L: 1,2 M: 1...17 N: 1...9 P: 1 Q: 1...30 R: 1...114 S: 1,2 T: 1 V: 1,2 W:1...7 Y: 1,2,3 . . . Residue 9 A: 1 C: 2 D: 6 E: 6,15 F: 1 G: 1 H: 7 I: 2 K: 18,22,59 L: 1 M: 1,12 N: 6 P: 1 Q: 4 R: 7,107 S: 2 T: 1 V: 1,2 W:6 Y: 1 All 20 AA

  29. Results for cold-shock protein (core) Seq. EScore N: V F I V V I L V F V -46.47 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. . . . I F I I I I L I F V -53.58 I F I I V I L I F V -52.48 V F I I I I L I F V -51.70 I F I I I I L V F V -50.72 V F I I V I L I F V -50.53 I F I I V I L V F V -49.63 I F V V I I L I F V -49.34 V F V I I I L I F V -49.23 I F V I I I L V F V -48.92 V F I I I I L V F V -48.88 Cold-shock protein (1MJC.pdb) 10 residues (core)

  30. Summary & Future • Speed • Achievement: Naïve ~ 107 sequence X 104 rotamers • DEE ~ 3000 sequences X 200 rotamers • BioX-cluster(~600 2.8GHz Xeon CPUs) 26 hrs • Future: Rotamers ordering (by self-energies) (Gordon 1998) • Comparison cluster focusing (Looger 2001) • Stronger elimination criteria (Looger 2001) • Accuracy • Achievement: 50 % identical with native sequence • High similarity in total energy • Future: Additional energy terms (H-bond, solvation) • Incorporate rigorous force field calculators(Gromacs) • Structure relaxation

  31. Thanks !

More Related