1 / 39

Dragos Horvath Laboratoire d’InfoChimie, UMR 7177 CNRS – Université de Strasbourg

Ligand- Based Virtual Screening: Extraction of Knowledge from Experimentally Confirmed Ligands, and the Quest for other Candidates Matching the Known . Dragos Horvath Laboratoire d’InfoChimie, UMR 7177 CNRS – Université de Strasbourg 6700 Strasbourg, France

ovid
Télécharger la présentation

Dragos Horvath Laboratoire d’InfoChimie, UMR 7177 CNRS – Université de Strasbourg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ligand-Based Virtual Screening:Extraction of KnowledgefromExperimentallyConfirmed Ligands, and the Quest for other Candidates Matching the Known. Dragos Horvath Laboratoire d’InfoChimie, UMR 7177 CNRS – Université de Strasbourg 6700 Strasbourg, France horvath@chimie.u-strasbg.fr

  2. 1. « Ceci n’est pas une molécule » MolecularModelsand Descriptorsin Chemoinformatics: NumericalEncoding of Structural Information

  3. Molecular Descriptors or Fingerprints • Need to represent a structure by a characteristic bunch (vector) of numbers (descriptors). • Example: (Molecular Mass, Number of N Atoms, Total Charge, Number of Aromatic Rings, Radius of Gyration) • Should include property-relevant aspects: • the “nature” of atoms, including information on their neighbor-hood-induced properties, and their relative arrangement. • Number of N Atoms  (Primary Amino Groups, Secondary Amino Groups, … , … , Amide, … , Pyridine N, …) • … unless being a H bond acceptor is the key (O or N alike)! • Arrangement in space (3D, conformation-dependent distances in Å) or in the molecular graph (2D, topological distance = separating bond count)

  4. Example 1: ISIDA SequenceCounts O-C*C*C*C-N 1 1 2 O-C*N*C*C-N 1 … 0 … 0 … (1,1,…) (2,0,…)

  5. Example 2: Fuzzy mapping of pharmacophore triplets (2D-FPT) • Atoms labeled by their pharmacophore types: • Hydrophobic, Aromatic • Hydrogen Bond Donor, Cation • Hydrogen Bond Acceptor, Anion 5 4 3 3 3 4 5 3 4 7 5 5 3 4 6 … … … … … … … … … Ar4-Hp3-Hp4 Ar4-Hp3-Hp5 Hp7-Ar4-PC6 Hp3-Hp3-Hp3 Hp3-Hp3-Hp4 Hp3-Hp3-Hp5 Hp3-HA5-Ar5 Hp4-HA5-Ar5 Di(m) = total occupancy of basis triplet i in molecule m.

  6. Chemically Relevant Typing: Proteolytic equilibriumdependence of 2D-FPT Ar8-NC8-PC8 Ar5-NC5-PC8 12% 88% ?

  7. 3: A 3D Example: Overlay-Based ComPharm Pharmacophore Field Intensities Pharmacophoric Features Alk. Aro. HBA HDB (+) (-) 1 X X X X X X 11 12 13 14 15 16 2 X X X X X X 21 22 23 24 25 26 3 X X X X X X Reference Atoms 31 32 33 34 35 36 4 X X X X X X 41 42 43 44 45 46 5 X X X X X X 51 52 53 54 55 56 • A descriptor of the nature of the molecule’s pharmacophoric neigh-borhood “seen” by every reference atom, assuming an optimal overlay of the molecule on the reference...

  8. 2. Computer-Aided Ligand-Based Design: the « MedicinalChemistry » of Ligand Fingerprints « Similarmolecules have similarproperties » → « Moleculeswithsimilarfingerprints have similarproperties » « Structure-ActivityRelationships » → « Fingerprint-ActivityRelationships » (or Quantitative Structure-ActivityRelationships, QSAR)

  9. 2.1 Molecular Similarity in Chemoinformatics Molecular Similarity Expressed by Fingerprint Similarity

  10. The SimilarityPrinciple – NeighborhoodBehavior Molecule PairsM,m * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * PropertyDissimilarity * * * * * * * * * * * * * * * * * * * * * * * * * * * * Dissimilarity Cutoff s ? SomeRandomRankingCriterion for pairs (m,M) Calculated Structural DissimilarityS(m,M)

  11. Similarity-Based Virtual Screening… Active Reference Nearest Neighbors Reference Fingerprint Superposition-based Similarity Scoring Best Matching Candidates Automated Fingerprint Matching... Ligand Candidate Fingerprint Library

  12. Strenght & Limitations of Similarity-based VS • (+) Onlyneed ONE active ligand to seek for more likeit… • (+) Withappropriatedescriptors, calculatedsimilaritymaybecomplementaryto the scaffold-basedsimilarityperceived by medicinalchemists • → « ScaffoldHopping »: bypassingsyntheticbottlenecks and/or pharmacokineticpropertyproblems, patent spaceevasion, etc. • (--) Within the reference ligand, « all groups are equal, but some are more equalthanothers » whenitcomes to controllingactivity… sowhat if wemismatch the latter?? ?

  13. A QSAR model expresses observedcorrelationsbetweencertaindescriptors and activity å linear = = = a ´ ´ ´ A D i i i i i i neural net Model Fitting D 1 1 1 1 D A 2 2 2 2 D n n n n decision tree M (D1,D2,D3,…,Dn) D < ? oui oui oui oui non non non non i (M active) (M inactive) neighbor-hood model ? 2.2: So, weneed to LEARN the featuresthatreallymatter – building QSARs

  14. Correlations: The Cornerstone of QSAR Philosophy (or, perhaps, Religion?) • I always end up in this deplorable state, no matter whether I drink: • Vodka-Soda • Martini-Soda • Gin-Soda • Whisky-Soda… • … therefore, as of tomorrow, I decided to stop drinkin’ SODA ! More isbetter Less is better

  15. Correlation is not Causality - an Obvious, but Inconvenient Truth… • SAR sets are alwayslimited in diversity and thereforemay (and alwayswill) accomodatecoincidentalrelationshipsbetweendifferentfeatures: Diverse library of 16x6x10=960 compounds… with NPC=NHD

  16. Why Lucky Correlations may Outperform more Rigorous Modeling… • Rule-based pharmacophore feature assignment has a hard time with the imine group =N–. In rule-based triplets of benzodiazepine receptor ligands, it was flagged as cation. • Proper pH-sensitive flagging corrected this error… and dramatically reduced model quality! • Labelling as ‘cation’ was a way to ‘highlight’ that N group – and, since preferentially seen in actives, highlighting made QSAR learning easier

  17. CAUSAL QSAR… but not in the way you’d expect it! A Psychedelic µ Receptor Model… • Training set: small combinatorial carbamate library, of 240 compounds obtained by robotized synthesis, LC/MS purity control and µ receptor affinity (pIC50) measurement (proof-of-concept study, CEREP 1997) -OR, -NR2 • A successful ComPharm QSAR model (R2≈0.8) was built to explain the measured pIC50 values (btw. µ- and mM) • HB-acceptor in para of benzyl alcohol enhances µ receptor affinity

  18. …based on wrong experimental data! • The most “active” carbamates of the training set turned out to be contaminated with ‰ traces of decarboxylation product, featuring the opioid ligand specific tertiary amine and having nanomolar potencies! + -OR, -NR2 • Our QSAR actually explained… the decarboxylation mechanism: p-OR or –NR2 stabilizes the intermediate carbocation… thus rendering contamination possible!

  19. Not seeing the Scaffold because of the Molecules: why Water is a Thrombin Inhibitor. • Non-linear Thrombin affinity model, with R2train=0.92, Q2=0.84 and R2validation=0.61. • pKi= 2.2e-4Ar4Hp14PC122 –3.8e-5Ar4Ar12HA102 –1.4HD8Hp6PC42 +2.45exp[-(Ar6Ar10HD14-19.8)2/1362] +4.36/{1+exp[-(Ar10Ar12Hp8-71.3)/119]} +0.77exp[-3(Ar12HD8Hp12-13.3)2/1042] –1.05/{1+exp[-(Ar12Hp4Hp10-185.1)/327]} –2.26{1+exp[-(Ar12Hp6NC8-3.2)/13]} –4.71exp[-(HA4HD4Hp4-43.4)2/1222] +7.3e-4HA4Hp4Hp6 –2.2e-3HA10HD4Hp8 +5.94exp[-(HA12Hp14PC12-1.2)2/152] • If all population levels are zero, the calculated pKi is of 5.9, mostly due to contributions from the highlighted Gaussians. • Absence of pharmacophore triplets automatically qualifies any small molecule as micromolar thrombin inhibitor! • The model has learned that, for benzamidines – the chemical class represented in the training set, the presence of underlined triplets coincides with a loss of activity.

  20. Beware of “Antipharmacophores”! • Ar12-HD8-Hp12 is an “antipharmacophore” of the Thrombin model: • Absence of this pharmacophore triplet means an enhanced activity, its presence correlates with an observed affinity loss • The undesirable Presence, in the above context, implicitly means presence in specific points of the ubiquitous benzamidine scaffold • Fragments or pharmacophore elements that are genuinely “bad” for activity, no matter where they are located, are rare. An “antipharmacophore” rather reflects poor training set diversity! Actives Inactives Actives, but not available for training

  21. The Phantom Scaffold… materializes when adding diverse inactives to the training set • All the training molecules – both actives and inactives – being benzamidines, the QSAR model cannot possibly learn the importance of the benzamidine moiety! • After enriching the thrombin data set with inactives, one model out of ~2000 was able to predict the activity of unrelated thrombin ligands of known binding geometries. • Scaffold Hopping: Yes, we can! • Triplets entering the model successfully corroborate some of the ligand features involved in binding. • These include features entering the pockets P1 and P2, but not the aromatic moiety binding in pocket P3.

  22. Enhanced Training Set Diversity leads to Models with “Scaffold Hopping” abilities

  23. Missing P3: deleting a Phe does not lead to ‘the same molecule, but with one phenyl less’. • The training set included both examples of actives, with required aromatic P3 substituent and inactives, without this key moiety. So, why did the model not learn about the importance of P3? • In all training examples, however, the removal of the P3 moiety was always done by deacylating the phenylalanine… • thus, compounds missing P3 substitution were systematically compounds having one excess protonable group • the model actually chose to learn the ‘bogus’ rule that one more cation causes an activity loss.

  24. “Let s be a representative sample of the set S…” • It takes a sample of ~104 individuals to extrapolate the voting intentions of a population of ~107. What’s the representative subset size of 1025 drug-like compounds? • If we ever dared to publish QSARs trained on fewer compounds, shame on us! • If given N=3 coordinate pairs (Y,X), not even Carl Friedrich Gauss could come up with a model more sophisticated than Y=aX2+bX+c • Don’t listen when they say that Support Vector Machines have very few “tunable parameters”! • May your model apply to one million and one molecules – it may still fail for the one million and second! • One cannot validate QSAR – but just fail to invalidate it! DISCLAIMER THE AUTHOR STRONGLY DISAGREES WITH HIS PERSONAL OPINIONS OUTLINED IN THIS SLIDE.

  25. We are Medicinal Chemists – tell us about Pharmacophore Models, forget QSAR!! • Bad news: Pharmacophore models are just a peculiar type of 3D-QSAR: • use overlay models to “bind” descriptors to specific spots in space • Pharmacophore hot spots are defined by the consensual presence of groups of similar type, throughout the series of known actives • Descriptors are occupancy levels of these spots No knowledge of the active site – need alternative overlay hypotheses !

  26. Kama Sutra with Ligands: Match As Many Equivalent Pharmacophore Features You May! + ? + ? ? + ? + + + ? + methotrexate dihydrofolate Böhm, Klebe, Kubinyi, “Wirkstoffdesign” (1999)

  27. The Cox-2 Scenario: An Ideal Pharmacophore Model • As exhaustive and diverse as possible a set of CycloOxygenase-II (Cox2) inhibitors, including: • A set of 1914 marketeddrugs of the U.S. Pharmacopeia, tested on Cox2 by the Cerep laboratories (BioPrintTM). • A set of 326 inhibitorscompiledfromliterature by N. Baurin (thanks!), includingco-crystallized selective Cox2 ligand SC-558 and related medicinalchemistryseries. • Potencies, expressed as pIC50 = -log IC50[mol/l] canbedirectlycompared (cross-check on compounds present in bothsubsets)

  28. MinimalisticOverlay-based Model HipHop Hydrophobe HipHop Hydrophobe HipHop Acceptor! • Training RMS=0.712, R2=0.712 • validation RMS=0.698, Q2=0.724

  29. Furthermore, it supports « ScaffolfdHopping » ! • it manages to explain the Cox2 activities of the apparentlyunrelatednonspecific Cox1/Cox2 inhibitors: • This is an ideal scenario – scaffold-independent model trained on thousands of compounds: somaybethe overlay models are mechanisticallyrelevant !

  30. … or maybe not! ? ??  SC-558 !

  31. Predictive? Yes! Enlightening? No! • Overlay-basedmodelscorrectlyexplained the behavior of the two distinct Cox2 binder classes… on hand of an erroneousworkinghypothesis!! • The ‘correct’ overlay asks an ‘anion’ (flurbiprofene –COO-) to bealignedatop of a ‘hydrophobe’ (CF3) – a heresy in pharmacophore matching! (Well, is CF3 a hydrophobe??) • QSAR building is never safe from correlation artifacts, not even in models with 6 variables versus 2200 observables and excellent statistics! • Such model may be very successful in selecting database subsets enriched in new Actives - but QSAR alone would never have elucidated the binding mechanism to Cox2!

  32. QSAR – a Bookkeeping Tool !? • “Bookkeeping” QSAR: a quantitative way to wrap up the information contained in your training set compounds • Models are scaffold-bound, heavily populated by scores of bogus antipharmacophores • It will typically tell you things you’d notice by simply looking at the molecules • Sampling of all the possible models fitting the observations allows to enumerate all the alternative working hypotheses that still await to be discarded… • …or validated. The model might highlight not yet tried combinations of known features with better activities! • Think positive: this provides a rational plan to challenge these hypotheses, and thus learn more from better planned experiments.

  33. 3. Didweforgetsomething? Each Model has its Limited Applicability Domain… … even General Relativity and Quantum Physics! Defining the Conditions of Applicability of QSAR Models – and respectingthem – might help!

  34. The Applicability Domain (AD) – A Compromise… • Restrict the applicability of a QSAR model to a well-defined subset of the chemical space – the one populated by the training molecules. Insufficient sampling of chemotypes outside this AD is then irrelevant. • How do we define this subset of chemical space to be as large as possible, while nevertheless densely enough populated by training molecules? Feature count 1 Example: the Feature Control Approach * * * * * * * * * * * * * * * * Feature count 2

  35. … but no miraculous solution! A real-life inspired (Gedanken)Experiment • Modeling of metal ligation propensities, with a training set composed of three subfamilies, R being alkyl chains: • pKbind=aAnilin/AcidNAnil+ aPyr/AcidNPrd+ gsizeF(R) + CAcid • AD requirements: the molecule should contain • NAnil=0 or 1 aniline fragment • NAcid=0 or 1 benzoic acid fragment • NPrd=0 or 1 pyridine fragment • Alkyl chains of size as seen in training set

  36. Contributions of a good programmer, but lousy chemist, to the understanding of QSAR! • Would you like to know whether propane is a potent metal binder ? • Yes, it is: pKbind= gsizeF(C3) + CAcid (same as for p-propyl benzoic acid) • But it can’t possibly be within the AD, can it? • NAnil=0 or 1 aniline fragment • NAcid=0 or 1 benzoic acid fragment • NPrd=0 or 1 pyridine fragment • Alkyl chains of size as seen in training set Feature count 1 Example: the Feature Control Approach * * * * * * * * * * * * * * * * Feature count 2

  37. Building Up Trust from Consensus • pKbind= aAnil/AcidNAnil + aPyr/AcidNPrd+ gsizeF(R) + CAcid • pKbind= aAnil/PrdNAnil + aAcid/PrdNAcid+ gsizeF(R) + CPrd • pKbind= aAcid/AnilNAcid + aPrd/AnilNPrd+ gsizeF(R) + CAnil • These three alternative models are perfectly equivalent – or “redundant” – as far as training set molecules are concerned • Identical prediction for each training molecule, identical statistical parameters • However, they cease to be redundant when it comes to propane: CAcid ≠ CAnil ≠ CPrd • Divergent prediction by allegedly “redundant” models is a clear signal of AD violation!

  38. Conclusions… • In Ligand-based knowledge extraction, the single most important piece of hardware is a BRAIN • Correlation is not causality… it’s correlation! • So, if correlations observed within the training set do apply to other molecules, forget metaphysical afterthoughts and exploit them, in successful virtual screening • However, an in-depth analysis of the model – if feasible – may reveal intrinsic limitations and pitfalls, and help to better delimit the AD. • Training set diversity is the key! • Do not hesitate to add bogus inactives, in order to teach your model proper “border conditions” such as “cosmic vacuum is inactive”. Absence of features cannot provide activity…

  39. Conclusions… • If a big pharma manager asks you “So, is QSAR useful?”, please reply “Compared to what?” • A wrong QSAR model may nevertheless ring a bell in a medicinal chemist’s brain, and help to make right decisions • There are moments in life when one should rely on the accumulated knowledge, and use QSAR to discover new combinations of known features: • A properly trained model, within its AD, stands fair chances to discover new actives and sensibly decrease synthesis/testing effort, maybe find a new scaffold porting the known binding pharmacophore (lead hopping) • There are moments when one should put known things aside, and venture out for random search of new paradigm-breaking ligands – new scaffold, new binding mode, new action mechanism, but... … if you’re afraid to be branded a low-tech amateur for not using QSAR, count on your Marketing Department to sell random picking as “THE ultimate successful application of the Universal QSAR Model”: pKi = 4.0 + ħ x HosoyaIndex + 5.0 x rand( )n Proprietary & patented algorithm and fittable coefficient n Laymen will appreciate the exotic resonance of the name, evoking some oriental wisdom from high-tech Japan Planck’s constant – a flavor of fundamental science (and conveniently small)

More Related