How to approximate complex physical and thermodynamic interactions?

Automated Molecular Docking Issues How to approximate complex physical and thermodynamic interactions? Employ rigid or flexible structures for ligand and receptor (Side-chains or Back-bone flexible) How to handle molecular motions? Treat with full atomic detail or simplified models? Which docking energy function is best?

The Molecular Docking Challenge Given two molecules with known 3D conformations: 1) Can we predict whether they bind to each other? This is harder than it sounds! 2) If yes, can we accurately predict: The binding affinity? The shape of the molecule-molecule complex? 3) Can we at least rank order the affinities of a range of ligands (Virtual Screening)? Relevance to chemistry/biochemistry: Protein-Small Ligand docking (drug design, usually rigid protein, flexible ligand) DNA-Small Ligand docking (drug design, usually DNA rigid, flexible ligand) Protein-Carbohydrate docking (usually rigid protein, flexible ligand) Protein-DNA docking (usually rigid protein, flexible ligand) Protein-Protein docking (usually rigid body)

Factors Affecting Binding Electrostatic Interactions (relatively long-range, proportional to 1/R): hydrogen bonds, salt bridges, charge-charge Dispersive Interactions (short range) Van der Waals attractions (proportional to 1/R6) Van der Waals repulsions (proportional to 1/R12) Hydrophobic contacts (depend on displacing solvent from the binding site, and are therefore short range) Tight binding requires both the correct shape of interacting surfaces (shape complementarity) and polarities (charge complementarity) The binding affinity is the energetic difference between the bound and free states which requires solvation and entropy to be considered Specificity is driven by shape and hydrogen bond complementarity (easy to quantify) Affinity is driven by hydrophobic and entropic effects (hard to quantify)

Applications of Docking • Estimating the binding affinity: • Searching for lead structures (drug candidates) for protein targets • Comparing a set of inhibitors • Estimating the influence of modifications in lead structures • De Novo ligand design • Design of targeted combinatorial libraries • Predicting the molecule complex: • Understanding the binding mode / principle • Optimizing lead structures • Determining ligand positions in crystal structures

Approximations in Docking • To make docking practical: • Eliminate explicit waters (what about desolvation?) • Approximate desolvation • Eliminate dynamics (what about entropy?) • Approximate entropy • Employ general force field (what about precision?) • Treat force field energies as adjustable not absolute • Ignore the unbound state (what about ΔG?) • Approximate ΔG

Scoring Functions (the Ugly Side of Docking) • Instead of using: ΔGBinding= ΔGComplex– ΔGLigand– ΔGReceptor • Develop a “scoring function” that takes part of the interaction energy from force field concepts and part from Empirical Fitting to Experimental Values: • Use: The interactions (Ei) might include: hydrogen bonds electrostatic interactions hydrophobic contacts solvent exclusion volume, among others... Each contribution has an adjustable weighting factor (fi).

Scoring Functions General or Specific? • In determining the weighting factors (fi) the developer must choose how broadly or how narrowly the scoring function is to be applied. • Is the function to be used for all classes of interactions? Or only some? For protein-protein only, or protein-drug only, or only for a particular class of drug? • There are many Scoring Functions. The AutoDock 3 function is: The f coefficients are determined empirically from a multi-linear regression (MLR) to a set of protein–ligand complexes with known binding constants. Because the f coefficients are not based on physics, scoring functions are considered empirical

Scoring Function Details (AutoDock 3) The indices i and j correspond to ligand and protein atoms, respectively. The Coulombic term includes the partial charges (q) and a distance-dependent dielectric function (εR). A, B, C and D are the Lennard–Jones parameters in the dispersion/repulsion 12-6 and H-bonding 12-10 formulas and R denotes the distance between the atomic pairs. ξt is a directional weight depending on angle t at the H-bonds. S and V denote the solvation parameter (empirical) and fragmental volume, respectively, in the solvation function of Stouten et al. The AutoDock4 scoring function has different parametrization of the desolvation term.

Finding Optimal Poses Simulated annealing Search Technique Autodock can use one of several optimization methods to search for the best placement of the ligand. Simulated annealing: At each step of simulated annealing, the position and internal rotational state of the ligand is adjusted and the energy calculated. If the energy decreases, the move is accepted. If not, it may be accepted with some probability that depends on the current temperature of the annealing. As the search goes on, the temperature is decreased, and eventually, the final state of the ligand is returned as the docked conformation. Because simulated annealing is a Monte Carlo (randomized) method, different runs will generally produce different solutions. http://cnx.org/content/m11456/latest/

Rigid or Flexible Protein? A central paradigm which was used in the development of the first docking programs was the lock-and-key model first described by Fischer. In this model the three dimensional structure of the ligand and the receptor complement each other in the same way that a lock complements a key. However, a more accurate view of this process was first presented by Koshland in the induced fit model. In this model the 3D structure of the ligand and the receptor adapt to each other during the binding process. It is important to note that not only the structure of the ligand but also the structure of the receptor changes during the binding process. This occurs because the introduction of a ligand modifies the chemical and structural environment of the receptor. http://cnx.org/content/m11456/latest/

Treating Induced Fit: Soft Receptors Soft receptors can be easily generated by reducing the van der Waals repulsive (1/R12) contributions to the total energy score. This makes the receptor “softer”, thus allowing, for example, a larger ligand to fit in a binding site determined experimentally for a smaller molecule. a) van der Waals representation of a target receptor. b) Close up image of a section of the binding sitewith normal van der Waals properties. c) Same section of the binding site as shown in b) but with reduced radii for the atoms in the receptor. This type of soft representation allows ligand atoms to enter the grey shaded area without incurring a high energetic penalty. http://cnx.org/content/m11456/latest/

Treating Induced Fit: Soft Receptors Soft receptors can be easily generated by reducing the van der Waals repulsive (1/R12) contributions to the total energy score. This makes the receptor “softer”, thus: Allowing a slightly larger ligand to fit in a binding site determined experimentally for a smaller molecule. Allowing a ligand to fit into a binding site from a structure that was determined in the absence of any ligand. The rationale behind this approach is that the receptor structure has some inherent flexibility which allows it to adapt to slightly differently shaped ligands by resorting to small variations in the orientation of binding site chains and backbone positions. It will not correct for a case in which ligand binding requires a significant change in the binding site, such as the flipping of a side chain into a different rotamer. The main advantage of using soft receptors is ease of implementation (docking algorithms stay unchanged) and speed (the cost of evaluating the scoring function is the same as for the rigid case (normal). http://cnx.org/content/m11456/latest/

Treating Induced Fit: Side Chain Rotations Rotations around single bonds, such as in side chains is a “natural” way to model induced fit. Selection of which torsion angles to permit to rotate is usually the most difficult part of this method because it requires a considerable amount of a priori knowledge of alternative binding modes for a given receptor. Alternatively, probable side chain orientations may be selected from rotamer libraries The principle problem with this method is that is adds significantly to the time required for the calculation because of the exponential number of permutations of side chain rotamers in a binding site http://cnx.org/content/m11456/latest/

Treating Induced Fit: Side Chain Rotations Stick representation of a section of a binding site To approximate the flexibility of the receptor it is possible to carefully select a few degrees of freedom. These are usually the torsional angles of side chains that have been determined to be critical in the induced fit effect for a specific receptor. In this example the selected torsional angles are represented by arrows. http://cnx.org/content/m11456/latest/

Treating Induced Fit: Multiple Receptor Conformations One possible way to represent a flexible receptor is to use multiple static receptor structures. This concept reflects the idea that proteins in solution do not exist in a single minimum energy static conformation but are in fact constantly jumping between low energy conformational sub-states. In this way the best description for a protein structure is that of a conformational ensemble of slightly different protein structures coexisting in a low energy region of the potential energy surface. Thus, the binding process can be thought of not as an induced fit model as described by Koshlandin 1958, but more like a selection of a particular sub-state from the conformational ensemble that best complements the shape of a specific ligand. http://cnx.org/content/m11456/latest/

Treating Induced Fit: Multiple Receptor Conformations Superposition of multiple conformers of a section of a binding site. These can be either considered individually as rigid representatives of the conformational ensemble or can be combined into a single representation that preserves the most relevant structural information.

Treating Induced Fit: Multiple Receptor Conformations The use of multiple static conformations for docking gives rise to two critical questions. 1) How can we obtain a representative subset of the conformational ensemble typical of a given receptor The structures can be determined experimentally either from X-ray crystallography or NMR, or generated via computational methods such as Monte Carlo or molecular dynamics simulations. 2) What is the best way of combining this large amount of structural information for a docking study? Should the multiple shapes be averaged in some way, or should independent docking be performed on each one? How many shapes should be used? These questions also remain open. http://cnx.org/content/m11456/latest/

Multiple Receptor Conformations versus Rotatable Side Chains One of the main advantages of using multiple structures instead of using a selection of degrees of freedom to represent protein flexibility is that the flexible region is not limited to a specific small region of the protein. The multiple structure approach allows the consideration of the full flexibility of the protein – including the back bone – without the exponential blow up in terms of computational cost that would derive from including all the degrees of freedom of the protein. On the other hand, only a small fraction of the conformational space of the receptor is represented by a limited number of shapes.

Ligand Docking (Handle with Care!) Accuracy – Ability to discriminate binders from non-binders (Scoring) – Ability to identify bound conformation (Internal Energies) – Ability to identify binding site (Search Algorithm) Efficiency – Conformation searching and pose searching are inversely proportional to ligand flexibility (Smaller is Better) Scoring functions have not been tuned for glycans (Aromatic Stacking) Docking functions do not include appropriate internal energies Induced fit in the protein is ignored

Ligand Docking (Handle with Care!) Accuracy – Ability to discriminate binders from non-binders (Scoring) – Ability to identify bound conformation (Internal Energies) – Ability to identify binding site (Search Algorithm) Efficiency – Conformation searching and pose searching are inversely proportional to ligand flexibility (Smaller is Better) Docking is: Fast Fun and Cheap But which pose is the winner?

Docking Energies Should Distinguish Good from Bad Poses pos Non-Binders Binding Energy 0 RMSD relative to known 3D structure Better Worse neg AutoDock 3.0.5

Docking Energies Should Distinguish Good from Bad Poses pos Non-Binders Binding Energy 0 RMSD Better Worse neg AutoDock (VINA-CARB) with Carbohydrate Internal Energies

Inclusion of Glycosidic Energy in Autodock VINA: AutoDock VINA-Carb *Averaged over top 20 poses, flexible glycan docked to positive control antibody

How to approximate complex physical and thermodynamic interactions?