1 / 34

Scientific Workshop Maximum Common Substructure

Scientific Workshop Maximum Common Substructure. Miklós Vargyas. UGM 2006. Workshop overview. Introduction, concepts, t heory Clustering, the role of MCS Applications Future plans. Motivations. Automated reaction mapping. Mapping chemical reactions. ChemAxon’s automapper.

gaetan
Télécharger la présentation

Scientific Workshop Maximum Common Substructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific WorkshopMaximum Common Substructure Miklós Vargyas UGM 2006

  2. Workshop overview • Introduction, concepts, theory • Clustering, the role of MCS • Applications • Future plans

  3. Motivations • Automated reaction mapping

  4. Mapping chemical reactions

  5. ChemAxon’s automapper • Find parts common to both sides • Map common parts

  6. ChemAxon’s automapper • Map the rest • Score possible mappings • Find the one that scores the highest

  7. Concepts and theory • MCS/MCES/MOS • MCS complexity O(nm)

  8. MCS search methods / Clique • Barrow and Burstall, 1976 • Raymond and Willett, RASCAL, 2002 • Details in brief • Construct the product graph of G1 and G2 • Node count: |V1 |∙|V2 | • Find clique, it corresponds to largest matching • Why is it good • Very elegant, pure graph theory • MCES can also be found • Disconnected MCS/MCES can be found • Node and edge coloring fits easily • What are the drawbacks • Product graph is large and dense • Recent advances in clique detection

  9. MCS search methods / Backtrack • Crandell-Smith, 1983 • Advantages • Flexible, easy to add constraints, incorporate chemical knowledge, heuristics • Dynamic programming • Various search strategies • Recent algorithms • Jun Xu, GMA, 1995

  10. Comparison of methods • Brint and Willett, 1986: Clique based substantially faster • Recent publication, 2006: backtracking is superior • We tested both approaches • Backtracking: 1.2 s (exhaustive search) • Clique based was stopped after 2 hours!!!

  11. 15 14 11 12 8 9 7 13 10 1 3 5 2 4 6 ChemAxon MCS search approach • Based on Wang and Zhou, EMCSS, 1996 • Backtracking • Divide and conquer strategy • Create all spanning trees of the query graph

  12. 11 13 8 10 7 14 12 6 9 15 1  5 1 3 4 14 2 11 2 5 12 8 9 7 13 3 4 10 1 3 5 2 4 6  ChemAxon MCS search approach • Use this as a route plan to traverse the target graph

  13. An application of MCS • Reaction automapping (live demonstration) • Average mapping time: 320ms • Complex structures cannot be mapped efficiently

  14. Product development philosophy Sophisticated technology High performance (speed, accuracy, features) Rounded, industry relevant functionality Client driven development >300 active clients Fast and reliable support Customizable Extendable Long term relevance Comprehensive API Platform independence (Java)

  15. LibMCS motivations “However, finding MCS from a pair of molecules has limited usage for our study. When we get hits from HTS, we cluster them into groups and the chemists will eye browse each group to find the scaffolds that are potentially good templates for later expansion. One main use of MCS will be to process multiple compounds of similar structures and automate what chemists have been doing by eyes now.” “We expect to use MCS tools for two cases: 1) use to analyze hits from HTS screens. 2) use it as a sorting tool for data retrieval, i.e., whenever people export data from our database (compounds across assays), we run MCS so that structurally similar compounds are grouped together. Chemists like this very much (we currently do this by clustering based on overall Tanimoto similarity).” “The typical hits from screens range from 2000-10000 (in few cases). In lead optimization phase, the compound list is around 3000-5000 in a typical project. So if MCS tools can process 5000 compound under 5 seconds, it can be integrated with online web tools. Otherwise, if it takes several minutes, it will be only used to analyze hits off-line based on user requests. If it takes more than an hour, its usage will be very limited.”

  16. LibMCS is a hard problem to solve • Exact solution • Requires the pair-wise comparison of each structure • n ∙ (n - 1) / 2 MCS computations • Next problem is larger!! • All CS (above a given size) have to be found • n ∙ (n - 1) / 2 CS computations • Partitioning O(n3) CS

  17. Pair-wise MCS table

  18. Pair-wise MCS computation • Average MCS computation: 100ms • First step: n ∙ (n - 1) / 2 MCS computations • 100 structures: 50 ∙ 99 ∙ 100ms = 8 min • 1000 structures: 14 hours • Second step: larger problem has to be solved • Practically not feasible approach

  19. Known approaches / Products • Stahl and Mauser, 2004, 2005 • Cluster first (ES) • Find an MCS for each cluster • Wilkens, Janes and Su, 2004 • BioReason ClassPharmer • ChemTK • LeadScope • Tripos ? • Daylight ?

  20. ChemAxon’s approach • Goal • Reduce the number of MCS pair computations • Idea: guess which two structures give significant MCS • Similar compounds are likely to share large MCS • Similarity guided pair-wise MCS • Not clustering by similarity and determine the MCS for the cluster • Which molecular descriptor gives best correlation • ChemAxon fingerprint • BCUT (Burden matrix) • Consequence • Approximate solution

  21. LibMCS algorithm Read input structures Get two most similar SSS Found Generate fingerprint Compute MCS n y Calculate similarity matrix MCS large n Add to cluster y Similarity above threshold Create new cluster y n More structures Make singletons y n

  22. Applications • Screen analysis • Data visualization and profiling • Combinatorial library partitioning • Buying new compounds • ? • Suggest more!!!!

  23. Application 1 / Screen analysis

  24. Activity filtering

  25. Live demonstration • Partitioning mixed combinatorial library • Affect of parameters • Affect of modes • Benchmarks • Quality of clusters

  26. Combichem library scaffolds

  27. Combichem library scaffolds • Turbo mode distorts clusters

  28. Combichem benchmark • Influence of normal/fast/turbo mode • Worth, distortion is not significant

  29. Development roadmap • Soon • R-Group decomposition • Stereo care MCS • Preserving rings • Lower bound pre-filtering • Disconnected MCS • Multi cluster members • Mid term • Integrate Ward/Jarvis-Patrick in the new GUI • Long term • Integrate molecular descriptors, metrics • Integrate virtual screening

  30. Coming soon – R-Group decomposition

  31. Coming soon – R-Group decomposition

  32. Coming soon – Multi cluster

  33. Summary • MCS developed for automatic reaction mapping • MCS based hierarchical clustering • Fast method • Chemical adequacy must be improved • Various uses, currently focusing on combinatorial library partitioning

  34. Acknowledgements • Developers • Péter Vadász • Nóra Máté • Ideas • Szabolcs Csepregi, Ferenc Csizmadia • Special thanks to

More Related