Developing & Benchmarking Large-scale Docking (LSD) Pipeline

Developing & Benchmarking Large-scale Docking (LSD) Pipeline Niu Huang, 02/17/2004

LSD pipeline Model Building (ModBase/PDB) Binding Site Refinement (PLOP/Modeller) LigBase Post-docking Refinement (PLOP) Ligand Docking (DOCK3.5.5.4) Central Database System

Where are we now?

Docking pipeline Target Protein SPHGEN GRID DATABASE SCORING DATA ANALYSIS

Test case(from J. Med. Chem., McGovern & Shoichet, 2003) TS DHFR GART Thrombin PNP SAHH AChE AR PARP

Expert vs automated docking • Enrichment plots comparing the performance of an expert (dark blue), automated procedure (magenta, referred to Test10), and random enrichment (black).

Approach to “expert docking” limit? Missing atoms

Case analysis (DHFR)

DHFR cont. 1 110 64* 43** 128*** (29) 0.3 3.7* 2.9** 0.3*** (2.0)

DHFR cont. 2 Using focused set of spheres appears to be essential for reducing the noise caused by inaccurate scoring function that favors the wrong docking poses, which is alleviated by only using the spheres filled in hot spot region.

Test1 docked ligands top scored mddr decoys Test10 docked ligands top scored mddr decoys DHFR cont. 3

Case analysis (Aldose Reductase) The conformational flexibility of the binding site appears to contribute to the poor enrichment as implicated by crystal structures, however it may be also due to other factors such as, lack of protein desolvation penalty in scoring function. * Structure, 1997, 5:601-612

AR cont. 1 • Correlation coefficients between electrostatic energy and total energy, vdw energy and total energy are 0.74 and 0.66 for docked ligands, individually, 0.62 and -0.33 for docked top 500 decoys. Clearly, electrostatic interaction is way too favorable and dominate the interaction energy score for docked decoys, which might be remedied by including the protein desolvation penalty.

Case analysis (PARP)

Docked ligands Top scored MDDR decoys PARP cont. 1

Case analysis (AChE) • Poor enrichment (5.0 % of db to find 25% of known ligands) appears to be caused by the large number of improbable docking poses. The AChE binding cavity is large with many waters and more than one clear binding region in the pocket; no direct hydrogen bonds between the ligand and the protein have been observed, only water-bridged hydrogen bonds, which presents a particular hard case to dock to. (Jacobsson, JMC, 2004) • Can we do something about it to improve our docking for such cases?

Case analysis (Thrombin) Multiple binding sub-sites? anything to do with the way to generate dockable database and the way to match spheres?

Preliminary Conclusion • A fully automated docking procedure and a consistent parameter set for Grids generation, Docking and Scoring appear to perform well across all the tested systems. • Cofactor, iron and structural waters involving in ligand binding are required to be carefully inspected, as well as protonation states of amino acid residues in binding site. • “larger binding pocket, more extensive sampling – INDOCK.3” is required (validated by DHFR, TS, thrombine and GART test sets). • Docking spheres and delphi spheres can be generated by using different schemes. Focused set of matching spheres were shown to be critical for systems like DHFR, TS and GART, and indicates that the information of hot spot in binding pocket will be important for directing docking. • Careful interpretation of docking results (energy component analysis) should be regularly employed to identify possible errors caused by certain factors.

High quality test sets Enrichment data sets (known ligands and decoys datasets) • Susan test set • Enolase test set • NCTR ER data set: 232 diverse compounds, covers a 106 – fold range in a validated ER competitive binding assay, and NCTR AR data set: 202 diverse compounds (Tong, et.al. 2001) • McMaster DHFR data set (http://hts.mcmaster.ca) • Compumine ERalpha , MMP3, AChE and fXa data sets (http://www.compumine.com/research/scoring.html) Docking and scoring test sets (experimental structures and binding affinities) • CCDC/Astex validation test set: 308 crystal complexes (http://ccdc.cam.ac.uk) • X-CScore dock set: 100 crystal complexes and binding affinities (wang, et al. 2003)

Suggestion • What is the first and possibly major second putative major principal component that if fixed would make the enrichment better? • For each improvement that could be made, your estimate of what should be done, how much effort, likelihood of improvement. • Closely look at the active site residues (ionization and protonation states) , use top decoy compounds to identify the residues that contribute to overestimation of the docking energy.

Acknowledgement • John @ Shoichet • CK @ Jacobson • Ursula & Eswar @ Sali

Developing & Benchmarking Large-scale Docking (LSD) Pipeline