1 / 24

Cloud Computing for Chemical Property Prediction

Microsoft Cloud Futures Conference, Redmond, 8 th April 2010. Cloud Computing for Chemical Property Prediction. Paul Watson School of Computing Science Newcastle University, UK Paul.Watson@ncl.ac.uk. The team: David Leahy, Jacek Cala, Hugo Hiden,

Télécharger la présentation

Cloud Computing for Chemical Property Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microsoft Cloud Futures Conference, Redmond, 8th April 2010 Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Paul.Watson@ncl.ac.uk • The team: David Leahy, Jacek Cala, Hugo Hiden, • Dominic Searson, Vladimir Sykora, Martyn Taylor, Simon Woodman • With thanks to: • Microsoft External Research for their financial support for Project Junior • Christophe Poulain, Savas Parastatidis

  2. Chemists want to know: Q1. What are the properties of this molecule? Toxicity Biological Activity Solubility Q2. What molecule would have aqueous solubility of 0.1 μg/mL?

  3. Answering the Question by performing experiments ..... time consuming, expensive, ethical Issues

  4. An alternative to experimentation: QSAR Quantitative Structure Activity Relationship - predict properties based on similar molecules Activity≈ f( ) quantifiablestructural attributes, e.g. #atoms logp shape .....

  5. Generating the models -Discovery Bus (Leahy et al) Data Model-Builders Models www.openqsar.com New Data or Model-Builders Model Generation New/ Improved Models

  6. Chemical Structures & their Activities Separate Training & Test Data Test Data Training Data Calculate Descriptors from Structures Descriptors + Responses Combine Descriptors Selected Descriptors + Responses Combined Descriptors + Responses Filter Descriptors Multiple Linear Regression Neural Network Partial Least Squares Classification Trees Build & Test Models Independently ..... Select Best Models Add to Model Database

  7. Increasing amounts of data for model building... CHEMBL : data on 622,824 compounds, collected from 33,956 publications WOMBAT : data on251,560 structures, for over 1,966targets WOMBAT-PK: data on 1230 compounds, for over 13,000 clinical measurements All contain structure information & numerical activity data  More models  Better models •  Computationally expensive: • 5 years for new datasets on existing server

  8. JUNIOR Project Aim Use Azure to generate models in weeks not years .... using as much of the available data as possible .... make models available on www.openqsar.com ... so that researchers can generate predictions for their own molecules

  9. Potential for concurrency... Chemical Structures & their Activities Separate Training & Test Data Test Data Training Data Calculate Descriptors from Structures Descriptors + Responses Combine Descriptors Combined Descriptors + Responses Filter Descriptors Selected Descriptors + Responses Multiple Linear Regression Neural Network Partial Least Squares Classification Trees Build & Test Models Independently ..... Select Best Models Add to Model Database

  10. Approach • avoid rewriting all existing Discovery Bus software • move existing Discovery Bus to Amazon Cloud • without parallelisation • move critical tasks to run concurrently on Azure • base solution around e-Science Central ....

  11. Clouds to the rescue? • Building scalable, dependable, science applications is still hard ..... • e-Science Central • Science Cloud Platform

  12. Science Cloud Options Science App 1 Science App n Users Users .... Science Cloud Platform Science App 1 Science App n .... Cloud Infrastructure: Storage & Compute Cloud Infrastructure: Storage & Compute

  13. e-Science Central Science as a Service for users Science App 1 Science App n Users .... Science Cloud Platform for developers Science Cloud Platform Cloud Infrastructure: Storage & Compute

  14. What should the Science Cloud Platform Include? Identify the common needs of our e-Science Users Data(instruments, experimental data, sensors...)

  15. .... App App Analysis Services e-Science Central App API Security Social Networking Science Cloud Platform Provenance Workflow Enactment Metadata Processing Cloud Infrastructure Storage

  16. Discovery Bus Planner Amazon Analysis Services e-Science Central App API Security Social Networking Provenance Workflow Enactment Metadata Processing Azure Storage

  17. 2 Workflow decomposed to Message Plan 1 Discovery Bus invokes e-Science Central Workflow via API Temporary workflow storage assigned, Message Plan queued for execution. 3 4 Message Plan Call Message Internal Service RMI / JMS NFS Response Message Workflow temporary storage Messages sent in sequence Call Message Azure Service HTTP HTTP Post Response Message 5 5 Workflow Execution Completes Discovery Bus notified with results Results data stored in e-Science Central folder

  18. e-Science Central Blob Storage • Web Node • Worker Node • Worker Node • Worker Node Results Queue Azure

  19. Current Status – Work in Progress • Running across up to 100 Azure nodes • Azure utilisation increasing (average ~60% over runs) • Moving more admin tasks to Azure / e-Science Central • model validation • co-ordination

  20. CPU Utilization

  21. Summary • Discovery Bus exemplifies a good Cloud pattern • large, variable, bursty requirements • proposal to apply to software verification • clouds do NOT make it easier to build complex, scalable, dependable distributed systems • we need higher-level “Science Cloud Platforms” • e-Science Central is our attempt at this • using the Azure Cloud we have a scalable system that can handle large new datasets • the models are being made freely available • www.openqsar.com

More Related