1 / 25

A Multiobjective Approach to Combinatorial Library Design

A Multiobjective Approach to Combinatorial Library Design. Val Gillet University of Sheffield, UK. Outline. SELECT GA based program for combinatorial library design Combinatorial subset selection in product-space Multiobjective optimisation via weighted-sum fitness function

kelii
Télécharger la présentation

A Multiobjective Approach to Combinatorial Library Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Multiobjective Approach to Combinatorial Library Design Val Gillet University of Sheffield, UK

  2. Outline • SELECT • GA based program for combinatorial library design • Combinatorial subset selection in product-space • Multiobjective optimisation via weighted-sum fitness function • Limitations of a weighted-sum approach • MoSELECT • Multiobjective optimisation via MOGA

  3. Library Design is a Multiobjective Optimisation Problem • Early HTS results disappointing • Low hit rates • Hits too lipophilic; too flexible; high molecular weights… • Diverse libraries • Distance-based/cell-based diversity • Bioavailability; cost; ease of synthesis… • Focused/targeted libraries • Similarity to known active; predicted active by QSAR model; fit to receptor site • Bioavailability; cost,….

  4. Product-Based Library Design • A two-component combinatorial library can be represented by a 2D array • A combinatorial subset can be defined by intersecting rows and columns of the array • Exploring all combinatorial subsets is equivalent to testing all permutations of the rows and columns of the array

  5. R1 R2 6 ´4 subset 11 8 2 30 7 25 10 1 19 18 Selecting Combinatorial Subsets Using a GA • Chromosome encoding • each chromosome represents a combinatorial subset as an integer string • one partition for each reactant pool • the size of a partition equals the no. of reactants required from the corresponding pool • Crossover, mutation and roulette wheel parent selection are used to evolve new potential solutions

  6. Multiobjective Optimisation in SELECT • Weighted-sum fitness function • enumerate the combinatorial library represented by a chromosome • calculate descriptors for molecules in the library • Objectives are scaled and user defined weights are applied

  7. Multiobjective Optimisation in SELECT cont. • Diversity indices • distance-based (e.g. sum of pairwise dissimilarities and Daylight fingerprints) • cell-based • Physical property terms • minimise the difference between the distribution in the library and some reference distribution, e.g. • “drug-like” profile derived from WDI • Cost: £ • minimise the cost of the library

  8. Library Enumeration in SELECT • Virtual library is enumerated upfront • ADEPT (A Daylight Enumeration and Profiling Tool) • Identify potential reactants • Filter out unwanted ones • Enumerate virtual library • Reaction Tookit (Reaction transforms; MTZ language) • Descriptors are calculated upfront • Combinatorial subset accessed via fast lookup

  9. 10K virtual library 100 amines ´ 100 carboxylic acids 30 x 30 amide subsets WDI – World Drugs Index Reactant-based selection: diversity (Diversity 0.564 ) Product-based Reactant-based • Product-based selection: diversity & molecular weight profile (Diversity 0.573) Example: Amide Library 25 WDI 20 15 Percentage of Compounds 10 5 0 0 200 400 600 800 Molecular weight

  10. Limitations of a Weighted-Sum Fitness Function • Definition of fitness function difficult especially for different types of objectives • e.g. molecular weight profile and cost • Setting of weights is non-intuitive • Can result in regions of search space being obscured especially when objectives are in competition • Difficult to monitor progress since >1 objective to follow simultaneously • A single solution is found

  11. Varying Weights in SELECT • Objectives are in competition resulting in trade-offs • A family of alternative solutions exist that are all equivalent

  12. Multiobjective Optimisation • Evolutionary algorithms, e.g., GAs • operate with a population of individuals • well suited to search for multiple solutions in parallel • readily adapted to deal with multiobjective optimisation • MOGA: MultiObjective Genetic Algorithm • Fonseca & Fleming. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 28(1), 1998, 26-37.

  13. MOGA • Multiple objectives are handled independently without summation and without weights • A hyper-surface is mapped out in the search space • represents a continuum of solutions where all solutions are seen as equivalent • represents compromises or trade-offs between the various objectives • solutions are called non-dominated, or Pareto solutions. • A family of non-dominated solutions is sought rather than a single solution

  14. 0 0 2 4 0 0 0 0 1 • Pareto ranking: an individual’s rank corresponds to the number of individuals in the current population by which it is dominated 0 0 0 0 Dominance & Pareto Ranking • A non-dominatedindividual is one where an improvement in one objective results in a deterioration in one or more of the other objectives when compared with the other individuals in the population f2 A B f1

  15. SELECT MoSELECT* Initialise Population Initialise Population Select parents Select parents Apply genetic operators Apply genetic operators Calculate objectives: a,b,c... Calculate objectives: a,b,c... Calculate dominance: a, b,c Apply fitness function f=w1a + w2b + w3c + ... Rank using Pareto Ranking: based on dominance Rank based on fitness Test for convergence Test for convergence Family of solutions Single solution * Patent Applied for

  16. 0 iterations 100 iterations 1000 iterations 5000 iterations MoSELECT: Search Progress

  17. 0.574 0.578 0.582 Diversity 0.586 0.59 0.594 0.58 0.6 0.62 0.64 D MW Family of Solutions • Each run of MoSELECT results in a family of solutions • Finding the same coverage of solutions using SELECT would require multiple runs using various combinations of weights • One run of MoSELECT takes the same cpu time as one run of SELECT 5000iterations

  18. Focused Library: Aminothiazoles • a-bromoketones & thioureas extracted from ACD • ADEPT used to • filter reactants (MW < 300; RB < 8) • enumerate virtual library => 12850 products (74 a-bromoketones & 170 thioureas) • MoSELECT used to design 15×30 subsets optimised on • Similarity to a target compound (Daylight fingerprints) • Cost ($/g)

  19. 5000 iterations MoSELECT Solutions: 1 0 iterations

  20. Running MoSELECT with niching MoSELECT Solutions: 2 5000 iterations

  21. Moving to > 2 Objectives:Parallel Graph Representation 5000 iterations 0.578 0.582 Diversity 0.586 0.59 0.594 0.58 0.6 0.62 0.64 D MW Each objective is scaled using the Max and Min values achieved when the objective is optimised independently

  22. Focused Library: Amides • 100 × 100 virtual library • MoSELECT used to design 10 × 10 subsets • Objectives • Similarity to a target • Sum of similarities using Daylight fps • Predicted bioavailability • Each compound rated from 1 to 4 • Sum of ratings • Hydrogen bond profile • Rotatable bond profile

  23. MoSELECT Solutions • Population size 50 • Iteration 5000 • Niching 30% • Number of solutions = 11 • CPU 53s (R12K 360 MHz)

  24. Conclusions • Advantages of MoSELECT • a family of equivalent solutions is obtained in a single run with each solution representing one combinatorial library • this is achieved at vastly reduced computational cost compared to performing multiple runs of SELECT • no need to determine weights for objectives • optimisation of different types of objectives is readily achieved • visualisation of the search progress allows trade-offs between objectives to be observed • the user can make an informed choice on which solution(s) to explore

  25. Acknowledgements • Illy Khatib, Peter Willett; Information Studies, University of Sheffield • Peter Fleming; Automatic Control and Systems Engineering, University of Sheffield • Darren Green, Andrew Leach; GlaxoSmithKline, UK • Funding by GlaxoSmithKline, UK • John Bradshaw; Daylight • Daylight for software support

More Related