1 / 9

Directions in eScience Interoperability and Science Clouds

Directions in eScience Interoperability and Science Clouds. Geoffrey Fox gcf@indiana.edu Director , Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies,  School of Informatics and Computing Indiana University Bloomington. June 19 2012

stammy
Télécharger la présentation

Directions in eScience Interoperability and Science Clouds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Directions in eScience Interoperability and Science Clouds Geoffrey Fox gcf@indiana.edu Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies,  School of Informatics and Computing Indiana University Bloomington June 19 2012 Interoperability in Action – Standards Implementation in VENUS-C & the context of the SIENA RoadmapOGF35 at HPDC 2012 Delft

  2. Successes in eScience I • Basic Supercomputer Architecture now being extended to Exascale • Grand Challenge activity 1990-2000 produced consensus • Basic OGF standards such as JSDL, BES, SAGA, GridFTP • Software as a Service • Use of Services • Use of Workflow • Use of Portals • Say “use of” as details not agreed

  3. More on Successes • Appliances/Roles in Clouds (see Venus-C later) • Images defined explicitly (by construction) or implicitly by content • Value added Platforms such as MPI, parallel domain specific Libraries, (Iterative) MapReduce, Queues, Tables and other NOSQL data models, Object Stores, HDFS/GFS style file systems • PaaS delivered by tools/libraries/roles? • Other good important general standards in security, OVF, accounting, networking

  4. What Platforms to use in Clouds • HDFS style file system to collocate data and computing • Or Object Stores as basic scalable storage • Queuesto manage multiple tasks • Tables to track job information • MapReduce and Iterative MapReduce for parallelism • Services for everything • Portals as User Interface • Appliances and Roles as customized images • Software environments/tools like Google App Engine, memcached, • Workflow to link multiple services (functions)

  5. What to use in Grids and Supercomputers? • Portals,Services and Workflow as in clouds • MPI and GPU/multicore threaded parallelism • Wonderful libraries supporting parallel linear algebra, particle evolution, partial differential equation solution • Parallel I/O for high performance in an application • Wide area File System (e.g. Lustre) supporting file sharing • This is a rather different style of PaaS from clouds – we should unify?

  6. Comments • No agreement on problem to solve e.g. what is architecture for data intensive problems, role of clouds(!) • Certainly no agreement on even style of workflow • Services can be WSDL or REST • Confusion as to architecture level being standardized • User or developer? • e.g. clouds may be built on federated infrastructure; that must be hidden from user

  7. Some Standards Futures • In general look for a few key SIMPLE concepts • From past, SQL and MPI standardization very successful – suggesting that Cloud PaaS standards should be looked at • MapReduce • NOSQL data models • Needs to be done at right time • De facto standard “Hadoop” versus “real” standard • What “roles” are important: Worker, Web, Grid, Worker + I/O, MPI, MapReduce, GPU – need a study? • Roles v. Libraries v. Standard Interfaces • GPU related standards: OpenACC extends OpenMP

  8. Using Science Clouds in a Nutshell • High Throughput Computing; pleasingly parallel; grid applications • Multiple users (long tail of science) and usages (parameter searches) • Internet of Things (Sensor nets) as in cloud support of smart phones • (Iterative) MapReduce including “most” data analysis • Exploiting elasticity and platforms (HDFS, Object Stores, Queues ..) • Use worker roles, services, portals (gateways) and workflow • Good Strategies: • Build the application as a service; • Build on existing cloud deployments/roles such as Hadoop; • Use PaaSif possible; (This is not clearly eScience strategy – uses IaaS?) • Design for failure; (Not much work on what this means. Are there tools?) • Use as a Service (e.g. SQLaaS) where possible; (WHAT should be Provided) • Address Challenge of Moving Data (Need Production large scale Science Cloud)

  9. Cosmic Comments • Does Cloud + MPI Engine cover the future? • Will current High throughput computing and cloud concepts merge? • Need Data analytics libraries for HPC and Clouds • Does a “modest-size private science cloud” make sense • Too small to be elastic • Should governments fund use of commercial clouds (or build their own) • Most science doesn’t have privacy issues motivating some private clouds • Most interest in clouds from “new” applications such as life sciences • Recent cloud infrastructure (Eucalyptus 3, OpenStack Essex) much improved • More employment opportunities in clouds than HPC and Grids; so cloud related activities popular with students

More Related