Docking

Docking Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically: -shape complementarity (lock and key). Basis for early geometry-based docking algorithms -property complementarity; hydrophobic atoms to hydrophobic atoms, hydrogen bond donor to h.b. acceptor, positively charged to negatively charged Physically: -for a stable complex, bound conformation is the global free energy minimum: maximizing favorable and minimizing unfavorable interactions || O - - + -H H \ - + -CH2-CH2- -CH2-CH2-

Docking as a global energy optimization problem • Energy function • Fast approximation, but accurate enough - global minimum should correspond to the (near) native conformation. • Potentials derived from: • Molecular mechanics force-fields: physical terms, parameters based on QM and/or experimental physical properties (ECEPP,MMFF etc) • statistical/phenomenological • 1. ad-hoc • 2. Mean-force: observed statistics of inter-atomic distances + Boltzmann • Search algorithm • Quickly locates the global minimum on a (typically) extremely rugged energy landscape • geometry-based: rigid-body, possibly followed by local minimization • incremental construction: • split in rigid fragments, dock, rebuild from ‘anchors’ • genetic algorithm: • ‘chromosomes’ of variables, recombination/mutations, Darwinian evolution • - Monte-Carlo (+local minimization)

MCM global optimization procedure Monte-Carlo minimization: Random step: perturb one of the torsions or the position/orientation of the ligand Local gradient minimization Compare the new energy to the previous value, if improved, accept the new conformation, otherwise apply Metropolis criterion: accept/reject with the probability Exp(-∆E/kT) Go back to step 1 Termination: Adaptive heuristics for optimal MC run length based on ligand size and flexibility.

Fast Grid Protein/Flexible Ligand Docking in ICM • Global energy optimization: • ligand position and internal torsions optimized by stochastic Monte-Carlo search in the framework of Internal Coordinates Mechanics (ICM) • - local gradient minimization after each random move • - ligand is continuously flexible • - receptor represented by pre-calculated grid potentials • - energy terms include ligand internal force-field energy and grid receptor interaction potentials

Grid potentials • Continuously differentiable grid potential using spline interpolation for efficient gradient minimization • Terms: • Van der Waals - steric repulsion and dispersion attraction • Electrostatics • Directional (anisotropic) hydrogen bonding • Hydrophobic interaction E Acceleration: ~100 fold faster than explicit receptor. Implicit minor receptor flexibility: smoothing grid potentials, truncating VW repulsion to limit the adverse effect of minor steric clashes (‘soft’ docking). Soft potentials also make minimization more efficient EmaxVW d

Method: Internal Coordinate Mechanics (ICM) • Internal coordinates • Large radius of convergence • Efficient global energy • optimization algorithm • Applications: • Folding, protein modeling, • Docking, Virtual Screening • ICM References: • Abagyan, Mazur (1989) • Abagyan et al. (1994) • “ICM - a new method for protein modeling..” • J. Comp. Chem. 15, 488-506 • Abagyan, and Totrov, (1994). • “Biased Probability Monte Carlo searches …” • J. Mol. Biol. 235, 983-1002.

Global optimization procedure (2) Tricks to improve search efficiency Conformational stack: low-energy conformations accumulated, trajectory monitored by comparison with previously found minima. Multiple start: If the simulation is ‘trapped’ in the vicinity of a certain conformation, or if the energy remains higher than energy of a number of already found conformations, restart from another initial conformation. ‘Grid annealing’: first dock into smoothed grid potentials, than in ‘hard’ exact grids. ‘Reverse’ torsion steps, symmetry, Cartesian relaxation, etc..

Accuracy/Speed of Flexible Ligand Docking • Large benchmarks 100-300 complexes: • For a ‘clean’ benchmark (high resolution, good X-ray density for both ligand and binding site, no obvious crystallography errors) typically ~80% of ligand/receptor complexes are reproduced within 2Å RMSD • For broader benchmarks ~70% within 2Å. >75% within 3Å, ~60% within 1.5Å RMSD, only ~45% within 1Å RMSD. For >85% of complexes, a pose within 2Å is found among top 10 solutions. From: Schapira M, Abagyan R, Totrov M. Nuclear hormone receptor targeted virtual screening. J Med Chem. 2003 Jul 3;46(14):3045-59.

Receptor analysis/issues affecting docking • Quality of the X-ray structure: • -low resolution or NMR • -missing residues • -missing side-chains atoms • -high b-factors • -clashes • Special features: • -covalent binding • -rare residue charge state - protonated Asp or Glu (HIV protease), deprotonated Cys or Tyr. • -coupled ligand/ion binding (kinases - ATP/Mg) • -tightly bound water molecules (2-3 hbonds coordination) • ‘Druggability’: • binding pocket identification: PocketFinder

Receptor preparation • Conversion of pdb structure to ICM: • - hydrogens and missing heavy atoms added • - polar hydrogens optimized (possibly including water) • - atom types and partial charges assigned • Specific cases may involve: • regularization/refinement in cases of poor structure quality: idealized amino acid covalent geometry imposed, energy annealing, possibly in the presence of a ligand (site ‘molding’) • sampling of alternative side-chain conformations, loops • homology modelling

Potential pitfalls • Difficult cases: • Highly flexible ligands (more than 10-15 torsions) • Shallow pockets • Water-mediated binding • Covalent binding • Poor quality of the receptor structure (low resolution X-ray, NMR, homology models) • Receptor flexibility • Remedies: • - longer simulations • - include water • constrained docking • explicit receptor docking • - multiple receptor structures

Receptor flexibility • Approaches: • Explicit continuously flexible receptor: • - in principle, more comprehensive • - slow even for side-chains, very slow for backbone movements • - prone to artefacts: dozens of new variables, new local minima • - large backbone movements are still generallybeyond reach • Ensemble of pre-defined receptor conformations - either from multiple experimental structures or from simulations such as side-chain or loop sampling, homology modelling: • - can be fast • - any type of movement can be handled • - success mostly defined by the quality of the ensemble

Hybrid grid/explicit protocol: SCARE • ‘SCARE’ (Bottegoni et al, JCAMD 2008 A new method for ligand docking to flexible receptors by dual alanine scanning and refinement.) • Observation: typically, steric clashes resolved by induced fit involve 1-2 sidechains • Grid docking is performed to multiple versions of the binding site generated by systematic replacement of various pairs of sidechains by alanines • Top-scoring ligand pose from each grid simulation is refined with explicit flexible receptor. Best scoring refined conformation is selected as final answer • On a benchmark of 30 cross-docking pairs, top-ranking near native solution was found in 80% of cases. Protocol takes ~2Hr CPU time

Receptor ensemble docking in ICM: 4D grids • Direct incorporation of discrete receptor flexibility in grid-based simulation: • Receptor conformations provided by user in a conformational stack • Displaceable bound water molecules can be included • Potentials are pre-calculated for each receptor conformation • Stored as ‘4D’ grids - 4th dimension is the receptor conformation • During MC simulation, additional type of stochastic step is included: the grid 4D layer switch • Benchmarking recently published: Bottegoni et al J Med Chem 2009 Four-dimensional docking: a fast and accurate account of discrete receptor flexibility in ligand docking. • 99 therapeutically relevant proteins and 300 diverse ligands. 77% complexes correctly reproduced. On average 4-fold faster than independent grid docking into all available receptor conformations.

Virtual Ligand Screening • Virtual ligand screening (VLS) algorithms allow to identify potential novel ligands from databases in silico. • Search large databases (100K-1M or more compounds) and select subsets enriched with hits using: • -2D pharmacophore • - similarity measures (fingerprints, Tanimoto) • -3D pharmacophore • -receptor structure: • no prior ligand knowledge necessary • search is not biased to known chemistry • Receptor structure based VLS: • -dock each ligand in the DB to the receptor structure • -evaluate the quality of fit in the docked structures to select potential binders. Results of ICM docking of a virtual library of 200,000 compounds into an FGFR pocket

Measure of VLS efficiency: enrichment factors Full screening collection Ntotal, containing Atotal active compounds VLS is used to select Nsel, containing Asel actives Typically Nsel << Ntotal , but also in real life Nsel<Asel (false positives) and Asel<Atotal (false negatives) Enrichment factor: (Asel/Nsel)/(Atotal/Ntotal) Choice of threshold cutoff: Asel 100% Nsel

Receptor structure based VLS: Docking+Scoring Docking: Find a putative docked conformation for each compound, native-like for the binding ligands. * Efficient conformational search routine * Docking potential: - must be fast - has to rank top the native-like conformation among many different docking conformations of the same ligand Scoring: * One or few conformations per compound are evaluated * Potential (screening score) - must rank the binding ligands above large number of chemically diverse inactive compounds Binding energy?

Binding energy • Ligand/receptor binding energy in solution: • Van der Waals - favorable, but partially compensated by solvent • Electrostatics - mostly compensated by solvation, only becomes favorable for charged ligands (salt bridges). • Hydrogen bonds - mostly compensated by solvation, but major determinant of specificity • Hydrophobic - often provide most of the affinity • Strain - unfavorable • Entropy loss - unfavorable, limits affinity for highly flexible ligands. • Fucose binding protein/fucose (1abf) Antibody/progesterone (1dbb) • pKd=4uM, 8 HB, dASAhp=128A2 pKd=1nM, 2 HB, dASAhp=390A2

Binding energy calculations Ligand/receptor binding energy in solution - highly complex concept: Evan der Waals attraction+ Esteric repulsion+ Eel interact+ Eel desolvation+ Eh-bond+ Eacc/donor desolvation+ Ehydrophobic + Estrain + Eentropy loss -Multiple opposing (favorable and unfavorable) contributions that largely compensate each other. Accumulation of errors: (-100±5) + (90±5) = -10±7. -Solvent (water) effects: hydrophobicity, electrostatic desolvation, solute-solvent hydrogen bonds. Explicit water too computationally expensive and not always more accurate than implicit methods (continuum dielectric, surface tension). -Some contributions extremely sensitive to the accuracy of geometry Binding energy predictions remain, in general, only qualitatively accurate. Special model fitting for a specific receptor and/or chemotype of ligands is necessary to achieve quantitative agreement with experiment.

Energy function optimization Practically, Edocking Escoring EbindingEnergy are different approximations EbindingEnergy estimates binding energy typically slow, often doesn’t work unless tuned for a specific system Edocking Escoring are not accurate estimates of binding energy, but: Edocking discriminates correct bound pose very fast Escoring discriminates active ligands reasonably fast Traditional approach: fitting of experimental binding energy values, no non-binders in the training set. Goal of VLS : discrimination between binders and non-binders. ICM Score: Optimize Escoring performance for active ligand discrimination

ICM Scoring function • Components are physical terms: • 1. Internal force-field energy of the ligand • 2. Conformational entropy loss of the ligand • 3. Receptor-ligand hydrogen-bond interaction • 4. Solvation electrostatic energy change • 5. Hydrogen-bond donor/acceptor desolvation • 6. Hydrophobic energy • Due to imperfect term evaluation (errors in geometry, charges etc.), to obtain best performance the components have to be adjusted/weighted. • Evaluation of discrimination potential performance on a benchmark (‘score the score’): • Five weighting • factors optimized

Training VLS scoring function • Benchmark set generation: • * Diverse set of ligands and receptors * Structures generated by the docking procedure (not X-ray)* Artificial non-binding complexes are included • * Structures of 25 receptors and 75 ligands extracted from high-resolution ( <2Å ) PDB structures of complexes.* 10000 random ligands from ACD added • * Exhaustive cross-docking: all ligands to all receptors. • The score components pre-calculated for all 25*10075= 251625 putative complexes • Multiple runs of "Amoeba” simplex minimization to ensure convergence • Result: recognition significantly improved

Discrimination of active ligands

Virtual Database Screening Efficiency • Schapira M, Abagyan R, Totrov M. Nuclear hormone receptor targeted virtual screening. J Med Chem. 2003 Jul 3;46(14):3045-59. • 19 structures for 10 nuclear hormone receptors • One structure used (glucocorticoid receptor) was a homology model • 5000 random molecules from CDL screening collection • A library of 78 known NR ligands, 3 to 8 per receptor

Virtual Database Screening Efficiency Chen, H, Lyne, PD, Giordanetto, F, Lovell, T and Li, J; On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors. J Med Chem. J. Chem. Inf. Model. (2005) 12 protein targets of therapeutic interest, 17 to 622 active ligands per target, 20000 random compounds Enrichment factors at 1% of database subsetting

Ligand pre-processing • 2D to 3D conversion, type and charge assignment: • - using MMFF force field • Pre-selection of ligands in the database for drug-likeness according to Lipinsky-like criteria: • - size/weight • - number of h-bond donors and acceptors • - number of flexible torsions (Weber) • Protonation/charge states for ligands: charge carboxyls, amino groups, possibly generate tautomers. Important, especially for correct scoring. • New: pKa prediction will allow automatic protonation of non-trivial chargeable groups.

Post-processing VLS hit lists Top scoring ~1% - still 1000-5000 compounds. Further tightening of score cutoff typically does not improve hit rates Optimal way to select 100-500 for experimental validation? Rational for further filtering: -improve hit rate -diversify hits -ensure desired effect (e.g. inhibition) -achieve specificity with respect to homologous receptors Consesus scoring: while lowering of primary score cutoff beyond 1% typically does not improve enrichment, unsing a secondary scoring function may further enhance enrichment. Selection according to additional criteria, such as: -formation of specific h-bonds -contact with certain parts of receptor

Post-processing VLS hit lists: chem. clustering Chemical clustering for improved diversity and hit rates: -top scoring list often dominated by one/few compound families -to diversify final selection, cluster compounds by chemical similarity, select best scoring compounds from each cluster Once activity is confirmed for a chemotype, other compounds from the same cluster can be tested.

Lead optimization: screening focused combinatorial library Starting point - initial lead compound Identify scaffold and variable substituents (R1,R2 etc) Create substituent lists for each Ri position Assemble a Markush virtual combinatorial library VLS fully enumerated Markush combinatorial library Alternative - Two-step procedure: 1. VLS each Ri with constant small (H?) other positions, select best subset for each Ri. 2. Assemble Markush; VLS full enumeration of this smaller combinatorial library. For large Ri lists (>~1000 compounds), dramatically larger virtual chemical space can be explored by the two-step procedure. R1=H, CH3, Ph,… R2=H, CH3, Ph,… R3=H, CH3, Ph,…

Protein - Protein Docking

Protein-Protein Docking in ICM First demonstration Global Stochastic Free-energy optimization with pseudo-Brownian moves and Biased Probability Monte Carlo (JCC, 1994). Explicit All Atom docking and flexible side-chain refinement Lysozyme-Antibody(Nature SB, 1994) Beta-lactamase/inhibitor docking challenge(1995,96) Grid Docking and refinement 24 known protein-protein complexes (Protein Sci. 2002) Global Grid Docking and refinement CAPRI docking competition (on-going, 2003 Proteins)

A faster model: Atoms to Grids • Atoms-to-Atoms docking: • Pros: symmetrical, explicit flexibility can be introduced for both molecules • Cons: extremely time-consuming, scales poorly with the size • Atoms-to-Grids docking • One molecule is static (receptor) and is represented by grid potentials • Pros: energy calculation time does not depend on the receptor size; induced fit can be partially approximated by soft grids • Cons: non-symmetrical, some energy terms have to be adapted/simplified for grid representation.

Multiple start MC global optimization • Stochastic search good for local sampling, but diffusion gets slow on larger scale (d~t) • Pre-generate starting points spread evenly around receptor and ligand (Fig a) • Match each starting point on receptor (Nr) with each starting point on ligand (Nl) (Fig b) • Six rotations around the match axis, for a total of 6 Nr Nl starting configurations

Optimized scoring of docked solutions • Scoring function including the grid terms and three ASA-based solvation components - polar, aliphatic and aromatic • Term contributions weighted: • E=Evw+Eel+Ehb+Epol+Ear+Eal • For each of the 24 complexes in the benchmark, 6000-12000 docked conformations • Factors - optimized for best ranking of near-native solutions

ICM docking in CAPRI Top prediction used Molsoft’s ICM Protein-Protein Docking procedure Best result in the worldwide Critical Assessment of PRedicted Interactions (CAPRI). Proteins July 2003

Best Results for 3 targets A: Target 3, hemagglutinin / Fab B: Target 6, a-amylase / VHH C: Target 7, TCR-b / SpeA Improvement of best rigid body docking solution for Target 6 (in gray) after refinement (in red) X-ray structure Predicted ligand

CAPRI Round 2 and 3 results • Good models for 8 out of 9 targets • One failure: T9 large hinge-bending movements, Successfully used new scoring function for T14, T18 & T19 • 64-71% of native contacts • 0.4-1A interface RMSD • For T14, Rmsd 0.6A, Rank 1 by energy • T19: antibody - prion. Used no CDR bias + NMR model for prion.

Docking

Docking

Presentation Transcript

Protein-Ligand Docking

Boats - Docking

Molecular Docking

Drug Docking

Docking Systems

Docking@Grid validation

Molecular Docking

Flexible-Protein Docking

Cross Docking

Compound Docking

Molecular Docking

Protein Docking

Protein Docking

Docking Techniques

Molecular Docking