How Novartis Leverages GRID Technology for Advanced Drug Discovery and Development

How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

The Challenges of Drug Discovery • Mechanism-based Drug Discovery • Understanding Disease • Pathways elucidation • Target validation • Clinical PoC • New drug candidates (to be tested in PoC studies) • Reduce project life cycle • Increase PoS after D3 (Lead optimisation) } Systems Biology: Combination of *Omics & Mathematical Modelling

Organizational complexity • United States • Diabetes • Infectious diseases • Cardiovascular • Oncology • Discovery Techologies • Discovery Chemistry • Animal Models • Pathways • Genome and Proteome Sciences • Great Britain • Respiratory • Gastrointestinal • Switzerland • Muscular and Bone • Nervous system • Oncology • Transplantation • Ophthalmology • Genome and Proteome Sciences • Discovery Techologies • Discovery Chemistry • Protease Platform • GPCR • Austria • Autoimmunity • Japan • Oncology • Diabetes • Cardiovascular

Data and Information complexity Molecular Structure Literature Raw data from instruments Genomics and Proteomics

Data Information and Knowledge GRID  Knowledge Space / Semantic Web Computational life science and HPC GRIDs People Networks The Vision Enable and transform the Drug Discovery process through: - Comprehensive and reliable Data and Information - Seamless information integration for easy navigation • Turning Data into Knowledgeusing in silico science • Simulate biomolecular processes using in silico science • E-Collaboration and v-communities

Macromolecular Structure & Function Lab Computational Chemistry Lab Bioinformatics Lab Computational Aspects in Drug Discovery Target finding Target validation Lead finding Lead optim.

5 ... 0 - 5 - 10 - 15 - 20 - 25 - 30 0 50 100 150 200 Signal Transduction Networks Omics experiments Mathematical Representation

SAP DB DB DB DB DB SNP QSAR In Silico Drug Discovery Model & Map DNA Sample Sequencing Translate & Map/Align Human data Structures & Modelling templates Kinases NR Proteases Disease association Validated Targets Functional and Structural insights Virtual Drug Discovery In Silico Docking In Silico “Chemogenomics” Virtual Library Design Predictive MedChem Tox PK/PK ADME modelling Compounds Proteins

SETI@Home recognised as a leading new concept (ComputerWorld Award) GeneCrunch recognised as a leading new concept (ComputerWorld Award) SWISS-MODEL and 3D-Crunch recognised as a leading new concept (ComputerWorld Award) UD recognised for visionary use of information technology in the category of Medicine (ComputerWorld Award) In Silico Drug Discovery Pipeline: Can it be done? 1990 1995 2000 2005 First PC-GRID at Novartis GeneCrunch In Silico Drug Discovery and Chemogenomics pipeline Productive Automated Protein modelling email server SETI@Home Docking in production at Novartis 3D-Crunch Productive Automated Protein modelling Web server Full Transcriptome Modelling at Novartis Genome scale Automated Protein modelling Protein Model Structure database First automated pipelines Automated ToxCheck and other CIx tools

Novartis’ HPC Grid Strategy Shared Servers Linux Clusters Job submission layer PC GRID ExternalCollaborations

Target Target Drug Ligand ACTIVE Influencing Biomolecular Processes INACTIVE Target = enzyme, receptor, nucleic acid, … Ligand = substrate, hormone, other messenger, ...

PC Grid Success Story: Protein Kinase CK2 Inhibition • Target finding: • Protein Kinase CK2 has roles in cell growth, proliferation and survival. • Protein Kinase CK2 has a possible role cancer and its over expression has been associated with lymphoma. • Target validation: • To elucidate the different functions and roles of CK2 and confirm it as a drug target for oncology, one needs a potent and selective inhibitor. • Approach: • The problem was addressed by in silico screening (docking).

Virtual Screening by in silico Docking > 400,000 Compounds DockingProcessandSelectionofpossiblehits < 10 Compounds

Important results Conclusion We have identified a 7-substitued Indoloquinazoline compound as a novel inhibitor of protein kinase CK2 by virtual screening of 400 000 compounds, of which a dozen were selected for actual testing in a biochemical assay. The compound inhibits the enzymatic activity of CK2 with an IC50 value of 80 nM, making it the mostpotent inhibitor of this enzyme ever reported. Its high potency, associated with high selectivity, provides a valuable tool for the study of the biological function of CK2. “The reported work clearly shows that large database docking in conjunction with appropriate scoring and filtering processes can be useful in medicinal chemistry. This approach has reached a maturation stage where it can start contributing to the lead finding process. At the time of this study, nearly one month was necessary to complete such a docking experiment in our laboratory settings. The Grid computing architecture recently developed by United Devices allows us to now perform the same task in less than five working days using the power of hundreds of desktop PC’s. High-throughput docking has therefore acquired the status of a routine screening technique.”

Major benefits of GRID computing • Optimization of resources utilization: • HPC platforms usage is maximized and Technology expertise is shared. • Response to additional performance requirements is easier and faster • No service downtime due to possibility to run same job on many platforms across different sites. • Enable cross business units collaboration and synergies: • Single efficient access path to Data and Compute resources. • Tools are easily exchanged between scientists/programs. • Favor “out of the box” thinking: • Apply HPC to areas which one would not even have considered a year ago. This has created a fertile ground for a new paradigms in Drug Discovery leading to Business Process transformation.

Performance of the PC-GRID (today) • Computing Power: • Theoretical >5 TeraFLOPS harvested from 3000 PCs in all geographical locations. • Acceleration of the in silico Docking process versus 1 standard 2002 PC (start of project): ~4000x • Financial: • Immediate savings in excess of 2m$. • No need for additional data centre to support this computing power. • Optimally use of existing hardware (associates’ PCs)

Building a GRID: Management focus • You need a champion! • Do not punctuate every sentence with the GRID word and avoid the Hype! • Demonstrate value through pilots: • Think “Iterative Improvement”. The conceptual layers are there, prototype are emerging, improvements and optimization is essential, maturity will follow • Leadership, transcendence, entrepreneurship and tenacity are the essence of transformation! • Concepts are easy to draw on a napkin over beer! • But new and great things are hard to achieve! • Use external goodwill to create internal acceptance!

Peru Community projects help with acceptance

Building a GRID: User base • You need a clearly defined and communicated HP Computing strategy. • Address unmet computational needs. • Apply HPC to areas which one would not even have considered two years ago. This has created a fertile ground for a new paradigms in Drug Discovery leading to Business Process transformation.  Are all problems “GRIDable”? • Further applications: • Sequence identification in proteomics from LC-MS/MS data • Text Mining and semantic Web infrastructure

Building a GRID: Software • The Software licensing models will have to evolve • Do not stop because of software licensing issues. • Show success with freeware and home grown algorithms. • Demonstrate business value and cost leadership. • Opportunity to develop your own code? • Unification of HPC applications environment: • Ensure that applications can run on maximum number of systems. • Introduce HPC software management: • Influence licensing models. The classical models do not fit the GRID and HPC paradigm.

Building a GRID: PC owners • Education and awareness. • Ensure that the HelpDesk is well trained and gives the right answers. • Ensure that PC owners know about the REAL impacts, including network. • The PCs are company and not personal assets! • Strategy to use them when they are idle is not a user but a company decision. • Address power saving policies in a transparent manner.

Knowledge Space - Vision • The "Knowledge Space Portal” is a Drug Discovery oriented implementation of the Semantic Web. Through a single customizable interface it: • Federates heterogeneous data resources and provide precise organization of the content • Provides quick and intuitive access to information • Provides data extraction, analysis and exploration tools • Allows data integration, data exchange and interoperability of applications • Provides mechanisms for data capture and annotation • Provides knowledge sharing and collaborative tools

Basic principles behind the Knowledge Space • The Knowledge Space consists of: • The collection of all types of data and information within the scope of interest defined by a particular business. There is no conceptual difference between internal and external data/information. • The Meta Data and the Knowledge Map which describe the collection in terms of content and location. • The Text Mining platform which allows the identification of entities (using vocabularies) and the concepts they belong to using ontologies. • The Ultralinker, which associates identified entities and concepts with specific contextual rules. • A user interface.

What is an Ultralink? • The Ultralink is an “intelligent” context-sensitive Hyperlink created at run time by the Ultralinker. • The Ultralink is generally a menu of links instead of a single link. • This menu will only offers sensible actions/options: • No dead ends due to a verification process ensuring that the link has a target. • The Ultralink provides direct interaction between any type of entity (gene name, compound name, mode of action, disease name, company name, etc… with an appropriate set of tools and resources as defined by the rules encoded in the Ultralinker. • The Ultralink functionality allows the selection of any portion of text in the Web browser and sends it as input to the Ultralinker for analysis and menu creation. • The Ultralink allows easy navigation across the information domains contained in the Knowledge Space.

How the Ultralinker works • The Ultralinker is a Web service which analyses any information (such as a complete web pages) it receives for recognisable entities using text mining and pattern recognition methods. • Each recognised item is mapped onto the ontologies and the Knowledge Map. • The Expert System will define what can be done with the identified entities e.g. • If a gene name is recognised then Ultralinks are created to: • get its sequence and perform sequence similarity searches; • query genetic disorder databases and map it onto the chromosome; • produce a 3D structure by comparative modelling; • look for hits from High Throughput Screening; • etc… Automated predefined processes can thus be activated by a single click (Ultraaction or work-flow). • The Ultralinker will create a menu that will be sent to the User interface.

Literature Comp. Inf. Bioinformatics Biology Other Chemistry Internet ResearchDocumentation What constitutes the Knowledge Space Meta Data K map Defined workflows Ultralinker Text Mining Analytics SemanticSearch Thesaurii Ontologies Rules

Knowledge Space Search Modes Structure Concepts Text

Knowledge Space: Text search Expansion: EMTREE + Novartis proprietary dictionary  expansion for protease modulators + respective synonyms

Display-Navigation-UltralinkProtease modulator in Literature DB (Medline-Embase) Easy navigation in record titles Sort capabilities Ranking value and access to document Analysis tools Search report: Number of Docs, Key-words extracted

Take advantage of the full-text article provided by PubMed Document view

Analysis Tools

Univariate - Companies Univariate - MOA Univariate - Diseases conditionned by Companies Clustering Diseases -MOAs Data AnalysisProtease modulators in CI DBs July 2004 - ADIS & Pharmaprojects

Graph Navigator Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects

Clustering

Chemistry, Chemoinformatics and Structural Biology

How Novartis Leverages GRID Technology for Advanced Drug Discovery and Development