1 / 33

e-Science Central: Doing Science on the Web, powered by the Cloud

e-Science Central: Doing Science on the Web, powered by the Cloud. Paul Watson Director, Digital Institute School of Computing Science Newcastle University, UK Paul.Watson@ncl.ac.uk. The team: Hugo Hiden, Simon Woodman, David Leahy, Jacek Cala

tambre
Télécharger la présentation

e-Science Central: Doing Science on the Web, powered by the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. e-Science Central: Doing Science on the Web, powered by the Cloud Paul Watson Director, Digital Institute School of Computing Science Newcastle University, UK Paul.Watson@ncl.ac.uk • The team: Hugo Hiden, Simon Woodman, David Leahy, Jacek Cala • Dominic Searson, Vladimir Sykora, Martyn Taylor, Joanna Berry • With thanks to: • Microsoft External Research, EPSRC, OneNE, RCUK • Christophe Poulain, Savas Parastatidis

  2. Why Clouds? • Cloud computing can revolutionise e-science • give access to resources when needed • reduce time from idea to realisation • provide sustainable infrastructure

  3. Good Workload Patterns for Clouds(with acknowledgements to Dianne O’Brien) Fast Growth Bursting new data processed new algorithm runs event triggers computation (e.g. earthquake) Compute Compute • datasets / applications with rapidly growing popularity Time Time

  4. Science in the Cloud. Option #1 Users • Problem: • Building the complex, scalable, • dependable systems • researchers need is still hard: • high-level IT skills • on-going management costs • bespoke Science App 1 Science App n .... Cloud Infrastructure: Storage & Compute

  5. The Long Tail of Scientists • individuals, research groups, SMEs • lack skills & access to resources • largely untouched by e-Science

  6. Cloud Challenge for e-Science • How can we increase the number of researchers who benefit? • x100,000 • across a wide range of research areas • in academia and industry

  7. Science Cloud Option #2 Science App 1 Science App n Users Users .... Science Cloud Platform Science App 1 Science App n .... Cloud Infrastructure: Storage & Compute Cloud Infrastructure: Storage & Compute

  8. e-Science Central Science App 1 Science App n Science as a Service for users Users .... Science Cloud Platform ? Science Cloud Platform for developers Cloud Infrastructure: Storage & Compute

  9. North East Regional e-Science Centre Aim - a regional centre of excellence in e-science Edinburgh 2001- Newcastle – North East Centre Belfast Manchester Cambridge Oxford Cardiff Imperial Southampton

  10. Research Areas – over 25+ funded projects (€50M+) • Bioinformatics • Ageing & Health • Neuroscience • Chemical Engineering • Chemistry • Transport • Geomatics • Video Archives • Artistic Performance Analysis • Computer Performance Analysis • Computer Science

  11. Identify Common IT Needs of Research Data(instruments, experimental data, sensors...)

  12. .... App App Analysis Services e-Science Central App API Security Social Networking Science Cloud Platform Provenance Workflow Enactment Metadata Processing Cloud Infrastructure Storage

  13. Editing and Running a Workflow

  14. Blogs and links

  15. Case Study: Project Junior • Predicting Chemical Activity • A collaboration with Prof David Leahy’s Chemistry research group • Funded by Microsoft External Research

  16. Chemists want to know: Q1. What are the properties of this molecule? Toxicity Biological Activity Solubility Q2. What molecule would have aqueous solubility of 0.1 μg/mL?

  17. Answering the Question by performing experiments ..... time consuming, expensive, ethical Issues

  18. An alternative to experimentation: QSAR Quantitative Structure Activity Relationship - predict properties based on similar molecules Activity≈ f( ) quantifiable structural attributes, e.g. #atoms logp shape .....

  19. Generating the models -Discovery Bus (Leahy et al) Data Model-Builders Models www.openqsar.com New Data or Model-Builders Model Generation New/ Improved Models

  20. Chemical Structures & their Activities Separate Training & Test Data Test Data Training Data Calculate Descriptors from Structures Descriptors + Responses Combine Descriptors Selected Descriptors + Responses Combined Descriptors + Responses Filter Descriptors Multiple Linear Regression Neural Network Partial Least Squares Classification Trees Build & Test Models Independently ..... Select Best Models Add to Model Database

  21. Increasing amounts of data for model building... CHEMBL : data on 622,824 compounds, collected from 33,956 publications WOMBAT : data on 251,560 structures, for over 1,966 targets WOMBAT-PK: data on 1,230 compounds, for over 13,000 clinical measurements All contain structure information & numerical activity data  More models  Better models •  Computationally expensive: • 5 years for new datasets on existing server

  22. Chemical Structures & their Activities Separate Training & Test Data Test Data Training Data Calculate Descriptors from Structures Descriptors + Responses Combine Descriptors Combined Descriptors + Responses Filter Descriptors Selected Descriptors + Responses Multiple Linear Regression Neural Network Partial Least Squares Classification Trees Build & Test Models Independently ..... Select Best Models Add to Model Database

  23. Discovery Bus Good Workload Patterns for Clouds(with acknowledgements to Dianne O’Brien) New Model Builder New Data Fast Growth Bursting new data processed new algorithm runs event triggers computation (e.g. earthquake) Compute Compute • datasets / applications with rapidly growing popularity Time Time

  24. Project JUNIOR Aim to use Azure & e-Science Central to generate models in weeks not years .... make models available on the web ... so that researchers can generate predictions for their own molecules

  25. Discovery Bus Planner Amazon AWS Analysis Services e-Science Central App API Security Social Networking Provenance Workflow Enactment Metadata Processing Windows Azure Storage

  26. 2 Workflow decomposed to Message Plan 1 Discovery Bus invokes e-Science Central Workflow via API Temporary workflow storage assigned, Message Plan queued for execution. 3 4 Message Plan Call Message Internal Service RMI / JMS NFS Response Message Workflow temporary storage Messages sent in sequence Call Message Azure Service HTTP HTTP Post Response Message 5 5 Workflow Execution Completes Discovery Bus notified with results Results data stored in e-Science Central folder

  27. e-Science Central Blob Storage Web Node Worker Node Worker Node Worker Node Results Queue Azure

  28. Result • Successfully used Windows Azure to generate models quickly • - 100 workers gave result in 3 weeks (not 5 years!) • 750K new models available • (50x more than previously available)

  29. Current e-Science Central Status • 40+ regular users (and growing) • 200K workflows enacted • exploring business models to provide sustainable science as a service : www.inkspotscience.com • In Venus-C • enhancing and moving workflow engine into Azure • exploring competitive workflow as a generic cloud pattern

  30. Summary • Cloud computing can revolutionise e-science • provide sustainable infrastructure • reduce time from idea to realisation • Clouds can revolutionise e-science • but they do NOT make it easier to build the complex, scalable, dependable systems that science needs • e-Science Central offers a Science Cloud Platform • reduces complexity of developing cloud applications • hides cloud entirely from end-users • demo available

More Related