In silico Protein Design: Implementing Dead-End Elimination algorithm

In silico Protein Design: Implementing Dead-End Elimination algorithm CS273 Tyrone Anderson, Yu Bai & Caroline E. Moore-Kochlacs May 31st 2005

Computational protein design Backbone scaffold New sequence Iterative refinement Native structure • Given backbone • coordinates, find the best sequence(s) with which the protein is stable.

Components of the problem The protein design problem can be roughly divided into searching procedure and scoring function. • The searching procedure samples the sequence space AND side-chain conformational space to create conformations. • The scoring function evaluates each conformation created by the searching procedure. The evaluation scores are used to rank the conformations (and therefore the sequences) and pick the best one to be the final model.

Why is searching procedure difficult ? • Consider a short protein with 20 amino acids. Possible sequence: • S = 2020 ~1026 • Each side chain has on average 2 dihedral angles (χ angles). Assuming that we will sample every 40º in the dihedral angle space, N = (360/40)(202) ~1038 • This number S*N is too large to be naively sampled • Algorithms that find good solutions by screening only parts of the search space are needed

Rotamer libraries • Already in the 70s, Janin et al. showed that different side chain conformations are not found in equal distribution over the dihedral angle space but tend to cluster at specific regions of the space, much as in the Ramachandran plot. • In the 80’s, this observation was used to improve modeling of side chain conformations. • Today, essentially all programs that model side chain conformations use rotamer libraries.

What do rotamer libraries provide? • Rotamer libraries reduce significantly the number of conformations that need to be evaluated during the search. • This is done with almost no risk of missing the real conformations. • Even small libraries of about 100-150 rotamers cover about 96-97% of the conformations actually found in protein structures. • The probabilities of each rotamer in the librarycan be applied to estimate the potential energy due to interactions within the side chain and with the local backbone atoms, using the Boltzmann distribution. (Not applied in this project) E  ln(P)

Rotamer Library Creation • Source: • http://honiglab.cpmc.columbia.edu/programs/sidechain/rotamers.html • Parsing: • Select all Nitrogens (N), Oxygens (O), Alpha Carbons (CA) , & all other Carbons i.e CD, CZ, etc. • Exclude all other elements and the end of file • Store in a 3D array: Residues (1D)  Rotamers (2D)

Rotamer Library Creation • Example: • Black: Include in array • Red: Exclude from array • Blue: *Not part of the array

Aligning with the Backbone • Translate backbone and rotamer to origin • CA atom of ‘R’, 1 and backbone = (0, 0, 0) • Rotate rotamer around X-axis • Rotate rotamer around Z-axis • Translate rotamer back to original position based on original position of CA atom • i.e. CA atom of ‘R’, 1 = (3.99, -5.511 , 11.369)

Rotamer Library Manipulation • Retrieve a specific rotamer: • Provide the residue and the rotamer number • i.e. ‘R’, 1  Gives you the 1st rotamer related to the Arginine residue • Rotamer is already aligned with the backbone • Only the coordinates of the atoms are returned in a 2D array

Now, • Consider again our protein of 20 amino acids. Each side chain has on average 9 rotamers. Assuming that we search now in the space of rotamers: N = 920 ≈ 1019 • The searching space is restricted and oriented but the number of conformations is still too large for a naive search

Algorithms in searching (side-chain) conformational space • Greed search(systematically scans the search space) • DEE (Algorithmic approaches to reduce the search space) • Self consistent algorithms(iterative sequential procedure) • Monte Carlo algorithms (random search)

DEE (Dead-End Elimination) Aims to safely eliminates (clusters of)rotamers without loosing the GMEC (Global Minimum Energy Conformation). rotamer ir in force field of backbone only rotamer ir with rotamer(s) of other residues • Given residue i, eliminate a rotamer irif the minimum energy it can obtain by interaction with conformational background (js) is higher (worse) than the maximum possible energy that another rotamer it (of the same residue) can have

E(i,j) is it js rotamer background Desmet et al., 1992

The Goldstein improvement • Rotamer ir can be safely eliminated when some other rotamer itexists with lower (better) energy for a certain environment that mostly favors ir. • This criteria is much less restrictive and therefore more powerful. It requires though more computational time.

The Goldstein improvement is E(i,j) it js rotamer background

Scoring function: Energy function Terms: • Van der Waals • represents packing specificity • Hydrogen bonding • typically represented by an angle dependent, 12-10 hydrogen bond potential • Electrostatics • Guard against destabilizing interactions between like charged residues • Internal coordinate terms • ‘bonded’ energies • Solvation energy • Protein-solvent interactions • Entropy • Assumes conformational space is completely restricted in the folded state Gordon et al, 1999

Van der Waals • Interaction between two uncharged atoms • Mildly attractive as two atoms approach from a distance • Repulsive as they approach too close • Represents packing specificity • Prefers native-like folded states with well-organized cores over disordered or molten-globule states Gordon et al, 1999

Van der Waals http://employees.csbsju.edu/hjakubowski/classes/ch331/protstructure/ilennardjones2.gif • 12-6 Lennard-Jones potential • Standard approximation • R = distance between atoms • R0 = van der Waals radii • Dij = well depth • Variation from Kuhlman and Baker, 2004 • Erep is dampened to account for the fixed backbone and rotamer set being used.

Electrostatics • Stability • Moderate temperatures: favorable electrostatic interactions not thought to be strong enough to compensate for the energy of desolvation • Extreme conditions: salt bridges may stabilize • Specificity • folding and functional interactions • maybe the more significant role of electrostatics • Currently, term guards against destabilizing interactions between like-charged residues Gordon et al, 1999

Electrostatics • Approximations: • Coulomb’s Law (Gordon et al, 1999) • Qi,Qj = charge on amino acid • R = distance • ε= dielectric constant = 40 • Bayesian version (Kuhlman & Baker, 2004) • Probability of two amino acids close together given environment and distance (from PDB) • aa=amino acid, d = distance, env =environment

Solvation • Hydrophobic effects drive folding, modeling solvation effects is critical to a protein design force field • Computationally expensive • Solvent model from Lazaridis and Karplus, 1999 • dij = distance between atoms, rij = van der Waals radii, Vi = atomic volume • ΔGref = reference solvation free energy, ΔGfree = solvation free energy of free (isolated) group • λ = correlation length

Energy Function: Incomplete model • Current standard models include Bayesian terms based on PDB statistics • Several terms have not been thoroughly validated as useful for design (Gordon et al, 1999) • Hydrogen bonding • Electrostatics • Internal coordinates • Current standard models are ad hoc, physical quantities and variables are weighted based on “what works best”

Integrated algorithm schema N1 N2 N3 ..N1I2L3D2E1F2. .. . . . . .D1. . . ..N1L2L3K2N1V1. .. ..W7L3D2K9K10G1. .. Best seq . . . . .D2. . . 2nd DEE Exhaustive search 1st order DEE ..N1 . . .D2... D … N … . . . N1 . . .

Design cold-shock protein (core) & Trp-Cage protein Trp-Cage(1L2Y.pdb) 20 residues Cold-shock protein (1MJC.pdb) 10 residues (core)

2 3 0 1 7 8 6 4 9 5 cold-shock protein (core) After 1st-order DEE Hydrophobic Amino acids: A (1), F (3), I (3), L (2), V (2), W(7)

Trp-Cage protein After 1st-order DEE . . . Residue 9 A: 1 C: 1,2 D: 1...7 E: 1...23 F: 1,2,3 G: 1 H: 1...8 I: 1,2,3 K: 1...87 L: 1,2 M: 1...17 N: 1...9 P: 1 Q: 1...30 R: 1...114 S: 1,2 T: 1 V: 1,2 W:1...7 Y: 1,2,3 . . . Residue 9 A: 1 C: 2 D: 6 E: 6,15 F: 1 G: 1 H: 7 I: 2 K: 18,22,59 L: 1 M: 1,12 N: 6 P: 1 Q: 4 R: 7,107 S: 2 T: 1 V: 1,2 W:6 Y: 1 All 20 AA

Results for cold-shock protein (core) Seq. EScore N: V F I V V I L V F V -46.47 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. . . . I F I I I I L I F V -53.58 I F I I V I L I F V -52.48 V F I I I I L I F V -51.70 I F I I I I L V F V -50.72 V F I I V I L I F V -50.53 I F I I V I L V F V -49.63 I F V V I I L I F V -49.34 V F V I I I L I F V -49.23 I F V I I I L V F V -48.92 V F I I I I L V F V -48.88 Cold-shock protein (1MJC.pdb) 10 residues (core)

Summary & Future • Speed • Achievement: Naïve ~ 107 sequence X 104 rotamers • DEE ~ 3000 sequences X 200 rotamers • BioX-cluster(~600 2.8GHz Xeon CPUs) 26 hrs • Future: Rotamers ordering (by self-energies) (Gordon 1998) • Comparison cluster focusing (Looger 2001) • Stronger elimination criteria (Looger 2001) • Accuracy • Achievement: 50 % identical with native sequence • High similarity in total energy • Future: Additional energy terms (H-bond, solvation) • Incorporate rigorous force field calculators(Gromacs) • Structure relaxation

Thanks !

In silico Protein Design: Implementing Dead-End Elimination algorithm

In silico Protein Design: Implementing Dead-End Elimination algorithm

Presentation Transcript

Protein

Protein structures

Chapter 37

Designing Software With Flowcharts And Pseudo-code

Lecture 1: Crystallization Methods and Protein Crystal Properties

Chapter 6

Algorithm Design and Analysis (ADA)

PROTEIN

Biology in silico : Online tools for “ omics ” 7 th August 2013

Algorithm Design and Analysis (ADA)

Modern Physical Design: Algorithm Technology Methodology (Part II)

2d-3D Structure Modelling

PROTEIN METABOLISM

Part 11 Structures analysis and prediction

Charles Lockwood ASC Institute, LLC Littleton, CO asc-net

The Time Complexity of an Algorithm

Advanced Algorithm Design and Analysis

The Simplex Algorithm

Death of the Dead Sea