Challenges in Computer-Assisted Molecular Discovery: Implications for System Design

Future CAMD Workloads and their Implications for Computer System Design IEEE 6th Annual Workshop on Workload Characterization

What is CAMD? • Computer-Assisted Molecular Discovery used in … • drug discovery • agrochemical discovery (herbicides, insecticides, etc.) • “cosmeceutical” discovery • common objectives of all CAMD applications: • find a small molecule (“drug” or “ligand” or “active”) with the right chemical structure for optimal … • interaction with large biomolecule (“receptor” or “target” or “protein”) • ADMET properties (getting & keeping ligand near receptor in body) • decide which compounds (potential drugs) should be synthesized/purchased and tested (“screened”) next? • decide by using computer to first do “virtual screening”

Molecular discovery process Genomics, Proteomics Target ID/Validation & Structure CAMD Assay Development Hits • Bioinformatics Lead Identification Lead Optimization Preclinical/ADMET • Cheminformatics • Modeling & Simulation • Decision Support Clinical Trials Sales & Marketing

Three types of CAMD problems • Intensive computationsonone structure or complex • getting 3D structure of target from genomic information • “protein folding problem” – a classic CAMD problem area • parallel/distributable algorithms exist but best done on a single processor • huge number of possible conformations  short cuts taken • refining 3D structure of target from X-ray/NMR data • performing protein-ligand docking & scoring • virtual receptor-ligand complexation  virtual screening • flexibility of ligand is currently addressed • flexibility of protein is rarely addressed due to  cpu time • scoring functions are crude due to  cpu time • faster cpu’s and more memory (for protein folding) would enable better quality results

Three types of CAMD problems • Modest computations on MANY structures • millions of real compounds; billions of “virtual cmpds” • many subtasks associated with virtual screening; e.g.: • convert 2D structure of ligand to 3D (Concord) • generate multiple conformations of each ligand 3D structure • perform various  cpu tests to identify which ligands merit further attention using  cpu methods (e.g., docking) • crude estimates of ADMET-related properties (e.g., solubility, membrane permeability, etc.) • crude shape-complementarity tests • perform docking (at increasing levels of accuracy) • large input stream  ideally suited for distributed proc. • grid-computing using many thousands of nodes (and faster nodes) would enable better quality results

Three types of CAMD problems • Storing data for virtual compounds • millions of real compounds; billions+ of virtual cmpnds • why store data for virtual compounds? • costs time & money to generate & regenerate data • science-related reasons • data generated for one project is often useful for another • must store data for each conformation of each structure • must store data for each structure that a compound can adopt (Optive Research will introduce technology early next year) • new technology will result in HUGE volumes of virtual data • IP-related, competition-related reasons • pharma industry is already planning for offensive and defensive needs in the coming virtual-screening and virtual-IP “wars” • need means to store and access huge volumes of data

Closing comments • practitioners of CAMD are well aware that quality of current methods is limited by compute-resources • rate of discovery and quality of actives discovered would both improve if CAMD methods improved • given that the sales of many actives each exceed $1 billion per year, the market for improved compute-power (and improved CAMD software) is quite substantial • I sure hope that you computer architects can help!! ;-)

Contact Info • for questions about this short presentation, please feel free to contact me at: Dr. Robert S. Pearlman, Pres. & CSO Optive Research, Inc. 512-514-6222 bob.pearlman@optive.com • for questions about Optive Research, Inc. and/or about the Computer-Assisted Molecular Discovery software which we develop, contact me as indicated above or visit our web-site at: www.optive.com

Challenges in Computer-Assisted Molecular Discovery: Implications for System Design

Challenges in Computer-Assisted Molecular Discovery: Implications for System Design

Presentation Transcript

The Causes of Survey Mode Differences and Their Implications for Survey Design

Energy-efficient Cluster Computing with FAWN : Workloads and Implications

Hints for Computer System Design

Educating for Their Future

Tax Inefficiencies and Their Implications for Optimal Taxation

Implications for the future: issues

Advances in population projection methods and their implications for the future

ANALYZING STORAGE SYSTEM WORKLOADS

MICRO COMPUTER SYSTEM DESIGN

CAMD

Property Prediction and CAMD

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS

WP 2:Future disease patterns and their implications for disability in later life

Hints for Computer System Design

Workpackage 2: Future disease patterns and their implications for disability in later life

Analysis of Multimedia Workloads with Implications for Internet Streaming

Future precision neutrino experiments and their theoretical implications

New Technologies and Their Implications for Experimentation

Implications for the future

Hints for Computer System Design

Dependability Requirements of the LBDS and their Design Implications

Implications for Design