Drug Design / drug discovery

Drug Design / drug discovery Jerome Baudry Assistant Professor BCMB UT/ORNL Center for Molecular Biophysics 2 previous incarnations: Research faculty at UIUC Research scientist at Transtech Pharma, Inc.

Drug Design / drug discovery What’s a drug? A substance that treats/cure a disease. A small molecule that interacts with a target, (often protein involved in the disease process; activator/inhibitor) Drug discovery: The process of finding such a small molecule – combination of approaches Drug discovery or drug design? In principle: “Design” is more rational and targeted, and “discovery” is more serendipitous. But design and discovery share a lot and are ~ synonymous in a pharmaceutical context. 5% of human genome is “druggeable” Hoopkins, Groom, Nat Rev Drug Discov. 2002 1(9):727-30.

Drug discovery market Gigantic economic importance: 10 years & $200 to $1,900 million to develop a drug 25 new molecules /year > $340 billion http://en.wikipedia.org/wiki/List_of_pharmaceutical_companies Intense scientific activity: very interdisciplinary approach

The drug discovery and design workflow: Target identification Discovery and design (hit/lead/optimisation) Biology: assay (binding/activity; in vitro / in vivo,) Chemistry: synthesis drug development: Pharmacology / testing

The long and winding road to drug discovery Computational chemistry /Molecular modeling useful across the pipeline, but very different techniques aim for success, but if not: fail early, fail cheap

Two pathways to drug discovery / drug seign Structure-based don’t know receptor, known ligands Structure-based know receptor, don’t known ligands ? What will be happy in there? Protein/ligand interactions structure/biophysics docking Statistical analysis of what group(s) are important for biological activity

Structure-based approaches Use knowledge of structure to find something that 1) binds, and 2) does the desired biological activity structure modeling (homology/experimental X-ray/NMR/neutron) Get a structure high-throughput docking/screening Get a “hit” (anything at all) focused library docking fragment-based growth ‘individual’ molecules simulations

Structure-based library screening What do we need: 1) Compounds libraries 2) Protein target 3) Binding site in the protein 4) Docking: generate different (many) possible conformations of the compounds in the binding site 5) Scoring: evaluate the strength of the protein/ligand interactions (score). 6) Select preferred ligands to propose a list of prioritized compounds for experimental screening.

Structure-based approaches Structure modeling Best case scenario, a high-quality experimental structure exists: PDB: http://www.rcsb.org/pdb/ - experimental collection of (49 295) structures, ~18 000 non-redundant sequences - X-Ray & NMR, - nucleic acids, proteins, carbohydrates

Structure-based approaches Structure modeling ~50,000 non-redundant protein structures in the PDB: is that a lot? that’s ~1% of the 5.5 million protein sequences in swissprot(http://www.ebi.ac.uk/swissprot/sptr_stats/index.html) and < ~0.00007% of earth’s proteins, (5E6 organisms, 5K genes/genome, low-end estimate.) …AQRTEVYTYRRS… Must do for new pharmaceutical target protein sequence protein structure (homology, ab-initio folding…) Structure-based drug discovery = “Post genomics challenge”: structural biology, functional genomics, chemical biology…

Structure-based approaches Structure modeling If no available experimental structure – work on that , and in the meantime: Homology modeling: use structure of close (sequence-wise) proteins to build, by analogy, a new protein.

R1 R2 R4 R3 more exploratory more focused Structure-based approaches Compound selection • Databases of compounds • - vendors • - literature • corporate/laboratory • virtual compounds • A priori anything, but we can be smarter than that Library designed against protein target, - based on hits from previous database screening Millions of cmpds’ structures are available from public databases. Major NIH effort to fund & develop libraries: http://nihroadmap.nih.gov/molecularlibraries/ http://blaster.docking.org/zinc/

outside inside deleted Structure-based approaches Binding Site Locate cavities in a protein When site is not known, eraser/flooding techniques binding site (3D) Or…make your life easier and build the site around a co-crystallized ligand If available…

Structure-based approaches docking save Most time-consuming part (by far) YES NO OK BETTER HIGH-THROUGHPUT OR LOW-THROUGHPUT ? fast (initial) accurate (on best cmpds from initial) Choices based on the desired throughput from 10 seconds to 10 minutes / compound 650,000 cmpds library, on 10 processors: from 3 days to 6 months

Structure-based approaches scoring LIGAND PROTEIN Scoring functions. Quantify the energy of protein/ligand interactions such as: hydrogen bond electrostatics van der Waals hydrophobic p/p etc … Several scoring functions exist, more/less specialized, fast etc…

scoring functions: • Force-field based: (CHARMM, AMBER etc). MMFF: very popular one because of “modular parametrisation”: easy to derive parameters from functional groups, well adapted to organic molecules. • Physically ‘accurate’ but slow, parametrisation issues. • Empirical – count the number of interactions and assign a score based on the # of occurrences. E.g. : • H-bonds, ionic interactions (easy because very directional and well quantified) • Hydrophobic interactions (more difficult to assess and quantify) • Number of rotatable bonds frozen (link to entropic cost of binding, quite difficult to estimate) • Knowledge-based – observe known protein/ligand structures, and favor interactions and geometries that are seen often. Idea: directly link to free energy because “real life” distribution (potential of mean force). • But: based on small # of entries. • Intense competition “my scoring function is better than yours” Future: force-field based / even QM-based Different approaches depending on size

Structure-based approaches scoring Often: consensus scoring: choose the few molecules that are ranked consistently well among many docking function 1,000,000 molecules, 30 actives.  1000 selected, 5 actives Enrichment factor = (5/30) / (30/ 1000000) = 166 HUGE SUCCESS

R1 R2 R4 R3 Structure-based approaches scoring 1,000,000 molecules, 30 actives.  1000 selected, 3 actives Enrichment factor = (3/1000) / (30/1,000,000) = 100HUGE SUCCESS Possible to start next round of iteration (or do ‘traditional’ modeling). Redock with improved accuracy (e.g QMMM) Discovery and design (hit/lead/optimisation) Biology: assay (binding/activity; in vitro / in vivo,) Chemistry: synthesis COMPUTATIONAL DOCKING: GENERATE TESTABLE IDEAS

Examples (low-throughput) Works great … in most publications Reproduce know xtal structure HIV protease and inhibitor Ligand-based site Flood-based site crystal structure first round of docking (shape only) final result (after rigid-body minimizations: energetics taken into account) Venkatachalam, et al.; J. Mol. Graph. Model. 2003, 289-307

Examples (low-throughput) But also… fails miserably (rarely in publications !) crystal structure final results (rigid-body minimizations) Illustrate issues with binding site’s shape (there are workarounds) Venkatachalam, et al.; J. Mol. Graph. Model. 2003, 289-307

Example II): discovery of ligand/function for a new P450 Ke et al, Archives of Biochemistry and Biophysics 436 (2005) 110–120

high-throughput docking Get a “hit” (anything at all) Development of a database of bio and agrochemical compounds of relevance for P450 (currently ~ 14,000 structures). In-house compounds, KEGG database: (http://www.genome.jp/kegg/ligand.html), Compendium of Pesticide Common Names: (http://www.alanwood.net/pesticides/index.html). Development of CYP120A1 model from CYP107A template (23.6% identity) ~14,000 structures HT-docking (LigandFit). identify 99 compounds consistently predicted to be good binders. Confirmed: retinoic acid Ke et al.. Arch. Biochem. Biophys. 2005

CONCLUSIONS • In-silico combinatorial library design & structure-based screening: • fast, efficient and inexpensive tool to : • discover new possible ligands against a macromolecular target • test library design ideas • identify most promising scaffolds and R groups prior to synthesis HT-DOCKING SUCCESS IF: i) FIND A FEW MOLECULES OF INTEREST ii) MUCH QUICKER AND CHEAPER THAN “real” screening

Comparison model / crystal structure residues within 4 Å of heme Green/blue: model, red/orange:crystal

Comparison model / crystal structure Residues around the ligand’s b-ionone ring are very close in both structures (phe182 & Trp76 same pharmacophore) Green/blue: model, red/orange:crystal

De novo design Fragment-based “inside-out” approach Put functional groups in binding site (docking or manually, or combination) Link these groups (docking or manual, or combination): *must* be able to synthesize it – no molecular monsters ii)keep low energy groups link with scaffolds iii) correct binding site, but ≠ too; “lead hopping” i)dock functional groups Caflish, Miranker, Karplus J .Med.Chem. 36, 2142-2167 (1993) Eisen, Wiley, Karplus, Hubbard Proteins Structure, Function and Genetics 19, 199-221 (1994).

Drug Design / drug discovery