1 / 22

Luddite: An Information Theoretic Library Design Tool

Luddite: An Information Theoretic Library Design Tool. Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002. Outline. Overview Search Strategy Cost Function Algorithms Algorithm Extensions Implementation Details Results. Overview.

jamuna
Télécharger la présentation

Luddite: An Information Theoretic Library Design Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Luddite: An Information Theoretic Library Design Tool Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002

  2. Outline • Overview • Search Strategy • Cost Function • Algorithms • Algorithm Extensions • Implementation Details • Results

  3. Overview • Genomics and proteomics provide many novel targets • Need to find drugs for targets • Which compound to screen? • What target? • Methods to answer debated for many years • QSAR • Recently combinatorial and parallel synthesis techniques have transformed question of which single compound to analyze to one of which collection of compounds (library).

  4. Overview • Develop algorithm for design libraries • Discrete – collection of individual compounds • Combinatorial – collections of compounds synthesized in a parallel or combinatorial fashion • Based on information theoretic techniques

  5. Overview • Idea – Use molecules to “interrogate” target receptor about what chemical features are required for binding • Objective – Compose library maximizing conclusions drawn from “answers” across all possible experimental outcomes • Goal – Design library that allows discovery of most information about optimization target

  6. Search Strategy • Strategies used in “20 Questions” are applicable • Binary Search • With every guess eliminate half the search space • Codeword Search • Every outcome corresponds to a single codeword • Optimal set of questions can be asked simultaneously • Same set of optimal questions can be used every time

  7. Search Strategy

  8. Search Strategy • Library design analogous to “20 Questions” • Searching for features required for ligand binding, desired phenotype, and/or good pharmacokinetic properties instead of a number • “feature” – four-point pharmacophore

  9. Search Strategy - Example

  10. Search Strategy - Assumptions • “20 Questions” Analogy useful but assumes • Every compound tests half of possible features • Can synthesize any compound in design space • Every assay value is accurate • Goal is a single feature

  11. Search Strategy - Remedies • Eliminating Assumptions • 1. Minimum of log2(F) bits to decode F outcomes • Loose upper bound on number of compounds • 2. Ability of set of questions to decode message is invariant to column reordering – therefore not necessary that every compound in design space be obtainable in order to find a maximally efficient set of questions

  12. Search Strategy - Remedies • 3. Error-correcting codes (ECC) based on Hamming Distance • 4. Adjust probability of features in an iterative process and prune unlikely features. • Will probably lead to convergence • Enhances Efficiency • Improves probability of success

  13. Cost Function • Given set of features search for a set of compounds that allow decoding of each individual feature • If not possible seek to decode as many features as possible with flattest distribution across size of feature classes • Feature Class – subset of features that all have same codeword • Entropy well suited to this calculation

  14. Entropy – measure of uncertainty All codewords same – no uncertainty -> minimal entropy All codewords different -> maximum entropy Wish to optimize following equation M is library measure H is entropy of feature classes C is # distinct classes ||ci|| is size of feature class i F is # of features Cost Function - Entropy

  15. Cost Function – Entropy Example

  16. Algorithm - Overview • Start with list of synthesized compounds • Goal - select subset to maximize entropy • State - set of compounds whose entropy can be calculated • Note: From entropy calculation that state is a function of classes but our moves through state space are a function of the compounds. • In general can’t be calculated incrementally and must be completely reevaluated whenever the state changes • Stark contrast with other library design methods • Despite seeming limitation method is very efficient

  17. Algorithm - Details • Approach to discrete and combinatorial designs very similar • Both use a greedy build-up of library to desired number of compounds • Greedy – technique that utilizes local max to find global max • Followed by a second phase that reevaluates each of the library components looking for a better selection • Repeat till no improvement

  18. Algorithm - Extensions • Often desirable to guarantee certain items included in library • Ability to sub sample source pool during build-up and optimization phases • Dramatically decrease run time • Only slightly impact quality of designs • Define minimum Tanimoto fingerprint similarity between any two compounds in discrete library • 1 implemented for discrete and combinatorial algorithms. • 2 and 3 only implemented for discrete algorithm.

  19. Implementation Details • C++ • Microsoft Window NT • 500 MHz Intel Pentium III • 500 MB RAM

  20. Results • 9 different libraries selected with algorithm • 273,373 compound source pool • 3 component reaction A+B+C->D • Monomer lists of length 33,436 • 19 4-point pharmacophore signatures calculated for all compounds in source pool • Compared final measures to optimal result and random result

  21. Results

  22. Results - Entropy • Combinatorial algorithm lags behind discrete one for performance • Discrete Library of 91 compounds has same measure as optimal combinatorial library of 250 compounds • Still possibly more cost-effective to synthesize combinatorial library • General rule – twice as many compounds required in a combinatorial library to achieve same information as a discrete library • Iterative setting • Use combinatorial algorithm early to discover • Use discrete algorithm later to cherry-pick specific compounds

More Related