Environments for eScience on Distributed Infrastructures

Environments for eScienceon Distributed Infrastructures Marian Bubak Department of Computer Science and Cyfronet AGH University of Science and Technology Krakow, Poland http://dice.cyfronet.pl Informatics Institute, System and Network Engineering University of Amsterdam www.science.uva.nl/~gvlam/wsvlam/

Coauthors • BartoszBalis • Tomasz Bartynski • ErykCiepiela • WlodekFunika • Tomasz Gubala • Daniel Harezlak • Marek Kasztelnik • MaciejMalawski • Jan Meizner • Piotr Nowakowski • KatarzynaRycerz • Bartosz Wilk • Adam Belloum • Mikolaj Baranowski • Reggie Cushing • Spiros Koulouzis • Michael Gerhards • Jakub Moscicki www.science.uva.nl/~gvlam/wsvlam dice.cyfronet.pl

Motivation and main goal • Recent trends • Enhanced scientific discovery is becoming collaborative and analysis focused; in-silico experiments are more and more complex • Available compute and data resources are distributed and heterogeneous • Main goal • Optimal usage of distributed resources (e-infrastructures, ubiquitous) for complex collaborative scientific applications

Collaborative eScience experiments • (2)Experiment • Prototyping: • Design experiment workflows • Develop necessary components • (4) Results • Publication: • Annotate data • Publish data • Problem • investigation: • Look for relevant problems • Browse available tools • Define the goal • Decompose into steps Shared repositories • (3) Experiment • Execution: • Execute experiment processes • Control the execution • Collect and analysis data A. Belloum, M.A. Inda, D. Vasunin, V. Korkhov, Z. Zhao, H. Rauwerda, T. M. Breit, M. Bubak, L.O. Hertzberger: Collaborative e-Science Experiments and Scientific Workflows, Internet Computing, July/August 2011 (Vol. 15, No. 4),pp. 39-47

Applications Stream oriented applications Data parallel application Parameter sweep applications Infrastructure Desktops Clusters Grids Clouds Storage Federated Cloud Storage Hbase Scaling Automatic Task farming for grid jobs and web services MapReduce Provenance Open Provenance model Xml history Tracing System under research Repository Provenance workflow Cloud Cloud www.science.uva.nl/~gvlam/wsvlam/

Research objectives Investigating applicability of distributed computing infrastructures (DCI; clusters, grids, clouds) for complex scientific applications Optimization of resource allocation for applications on DCI Resource management for services on heterogeneous resources Urgent computing scenarios on distributed infrastructures Billing and accounting models Procedural and technical aspects of ensuring efficient yet secure data storage, transfer and processing Methods for component dependency management, composition and deployment Information representation model for DCI federation platforms, their components and operating procedures

Spatial and temporal dynamics in grids • Grids increase research capabilities for science • Large-scale federation of computing and storage resources • 300 sites, 60 countries, 200 Virtual Organizations • 10^5 CPUs, 20 PB data storage, 10^5 jobs daily • However operational and runtime dynamics have a negative impact on reliability and efficiency ~95% 1 job asynchronous and frequent failures and hardware/software upgrades 100 jobs <10% long and unpredictable job waiting times 3 hours seconds J.T.Moscicki:Understanding and mastering dynamics in Computing Grids,UvA PhD thesis, promoter: M. Bubak, co-promoter: P. Sloot; 12.04.2011

User-level overlay with late binding scheduling • Improved job execution characteristics • HTC-HPC Interoperability • Heuristic resource selection • Application awaretaskscheduling 1.5 hours 40 hours Completion time with late binding. Completion time with early binding. J.T.Moscicki, M.Lamanna, M.Bubak, P.M.A.Sloot: Processing moldable tasks on the Grid: late job binding with lightweight user-level overlay, FGCS 27(6) pp 725-736, 2011

Cloud performance evaluation • Performance of VM deployment times • Virtualization overhead Evaluation of open source cloud stacks (Eucalyptus, OpenNebula, OpenStack) • Survey of European public cloud providers • Performance evaluation of top cloud providers (EC2, RackSpace, SoftLayer) • A grant from Amazon has been obtained M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski and S. Varma:Evaluation of Cloud Providers for VPH Applications, posterat CCGrid2013 - 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013

Resource allocationmanagement The Atmosphere Cloud Platform is a one-stop management service for hybridcloud resources, ensuringoptimaldeployment of application services on the underlying hardware. Admin External application VPH-Share Master Int. OpenStack/Nova Computational Cloud Site VPH-ShareCore Services Host Amazon EC2 Other CS Atmosphere Management Service (AMS) CloudFacade (secureRESTful API ) Developer Scientist Cloud Manager AtmosphereInternal Registry (AIR) Cloud stackplugins(Fog) Development Mode Generic Invoker Workflow management WorkerNode Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Head Node Cloud Facade client CustomizedapplicationsmaydirectlyinterfaceAtmosphere via itsRESTfulAPI calledthe Cloud Facade Image store (Glance) P. Nowakowski, T. Bartynski, T. Gubala, D. Harezlak, M. Kasztelnik, M. Malawski, J. Meizner, M. Bubak: Cloud Platform for Medical Applications, eScience 2012 (2012)

Costoptimization of applications on clouds • Infrastructure model • Multiple compute and storage clouds • Heterogeneous instance types • Application model • Bag of tasks • Leyered workflows • Modeling with AMPL (A Modeling Language for Mathematical Programming) • Cost optimization underdeadline constraints • Mixed integer programming • Bonmin, Cplex solvers M. Malawski, K.Figiela, J.Nabrzyski:Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages 1786-1794, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2013.01.004

Workflow management systems in eScience “are key technology to integratecomputing and data analysis components, and to controlthe execution and logical sequences among them. By hidingthe complexity in an underlying infrastructure, SWMSs allow scientists to design complex scientific experiments, access geographically distributed data files, and execute the experiments using computing resources at multiple organizations.“ Report of the NSF/Mellon Workshop on Scientific and Scholarly Workflow. Oct 3-5, 2007, Baltimore, MD

Auto-scaling workflows • Automatic scaling of workflow components based • Resource load • Application load • provenance data • Scaling across various infrastructures • desktop • Grids • Clouds R. Cushing, S.Koulouzis, A. S. Z. Belloum, M.Bubak:Dynamic Handling for Cooperating Scientific Web Services, 7th IEEE International Conference on e-Science, December 2011, Stockholm, Sweden

Auto-scaling workflows Service Load Running Service instances R. Cushing, S.Koulouzis, A. S. Z. Belloum, M.Bubak:Dynamic Handling for Cooperating Scientific Web Services, 7th IEEE International Conference on e-Science, December 2011, Stockholm, Sweden

Auto-scaling workflows R. Cushing, S.Koulouzis, A. S. Z. Belloum, M.Bubak:Prediction-based Auto-scaling of Scientific Workflows, Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science, ACM/IFIP/USENIX December 12th, 2011, Lisbon, Portugal

Workflow as a Service • Once a workflow is initiated on the resources it stays alive and process data/jobs continuously • Reduce the scheduling overhead R. Cushing, Adam S. Z. Belloum, V. Korkhov, D. Vasyunin, M.T. Bubak, C. Leguy:Workflow as a Service: An Approach to Workflow Farming, ECMLS’12, June 18, 2012, Delft, The Netherlands

Provenance in Practice: Blast Application [Department of Clinical Epidemiology, Biostatistics and Bioinformatics (KEBB), AMC ] The aim of the application is the alignment of DNA sequence data with a given reference database. A workflow approach is used to run this application on distributed computing resources. • For Each workflow run • The provenance data is collected an stored following the XML-tracing system • User interface allows to reproduce events that occurred at runtime (replay mode) • User Interface can be customized (User can select the events to track) • User Interface show resource usage on-going work UvA-AMC-fh-aachen

Semanticworkflowcomposition • GworkflowDL language (with A. Hoheisel) • Dynamic, ad-hoc refinement of workflows based on semantic description in ontologies • Novelty • Abstract, functional blocks translated automatically into computation unit candidates (services) • Expansion of a single block into a subworkflow with proper concurrency and parallelism constructs (based on Petri Nets) • Runtime refinement: unknown or failed branches are re-constructed with different computation unit candidates T. Gubala, D. Harezlak, M. Bubak, M. Malawski:Semantic Composition of Scientific Workflows Based on the Petri Nets Formalism.In:"The 2nd IEEE International Conference on e-Science and Grid Computing", IEEE Computer Society Press, http://doi.ieeecomputersociety.org/10.1109/E-SCIENCE.2006.127, 2006

Semantic integration for science domains • Concept of describing scientific domains for in-silico experimentation and collaboration within laboratories • Based on separation of the domain model, containing concepts of the subject of experimentation from the integration model, regarding the method of (virtual) experimentation (tools, processes, computations) • Facets defined in integration model are automatically mixed-in concepts from domain model: any piece of data may show any desired behavior • Proposed, designed and deployed themethod for 3 domains of science: • Computational chemistry inside InSilicoLab chemistry portal • Sensor processing for early warning and crisis simulation in UrbanFlood EWS • Processing of results of massive bioinformatic computations for protein folding method comparison • Composition and execution of multiscale simulations • Setup and management of VPH applications T. Gubala, K. Prymula, P. Nowakowski, M. Bubak: Semantic Integration for Model-based Life Science Applications. In: SIMULTECH 2013 Proceedings of the 3rd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, Reykjavik, Iceland 29 - 31 July, 2013, pp. 74-81

Cooperative virtual laboratory for e-Science • Design of a laboratory for virologists, epidemiologists and clinicians investigating the HIV virus and the possibilities of treating HIV-positive patients • Based on notion of in-silico experiments built and refined by cooperating teams of programmers, scientists and clinicians • Novelty • Employed full concept-prototype-refinement-production circle for virology tools • Set of dedicated yet interoperable tools bind together programmers and scientists for a single task • Support for system-level science with concept of result reuse between different experiments T. Gubala, M. Bubak, P.M.A. Sloot:Semantic Integration of Collaborative Research Environments, chapter XXVI in “Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare”, Information Science Reference IGI Global 2009, ISBN: 978-1-60566-374-6, pages 514-530

GridSpace - platform for e-Science applications • Experiment: an e-science application composed of code fragments (snippets), expressed in either general-purpose scripting programming languages, domain-specific languages or purpose-specific notations. Each snippet is evaluated by a corresponding interpreter. • GridSpace2 Experiment Workbench: a web application - an entry point to GridSpace2. It facilitates exploratory development, execution and management of e-science experiments. • Embedded Experiment: a published experiment embedded in a web site. • GridSpace2 Core: a Java library providing an API for development, storage, management and execution of experiments. Records all available interpreters and their installations on the underlying computational resources. • Computational Resources: servers, clusters, grids, clouds and e-infrastructures where the experiments are computed. E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M. Malawski, M. Bubak: ExploratoryProgramming in the Virtual Laboratory. In: Proceedings of the International Multiconference on Computer Science and Information Technology, pp. 621-628, October 2010, thebestpaper award.

Goal: Extending the traditionalscientificpublishing model with computationalaccess and interactivitymechanisms; enablingreaders (includingreviewers) to replicate and verifyexperimentationresults and browselarge-scaleresultspaces. Collage - executable e-Science publications Challenges: Scientific: A commondescriptionschema for primary data (experimental data, algorithms, software, workflows, scripts) as part of publications; deploymentmechanisms for on-demandreenactment of experiments in e-Science. Technological: Anintegratedarchitecture for storing, annotating, publishing, referencing and reusingprimary data sources. Organizational: Provisioning of executablepaper services to a largecommunity of usersrepresentingvariousbranches of computational science; fosteringfurtheruptakethroughinvolvement of major players in the field of scientificpublishing. P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: Proceedings of the International Conference on Computational Science, ICCS 2011 (2011), Winner of the Elseview/ICCS Executable Paper Grand Challenge E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Servicein ProcediaComputer Science, vol. 18, 2013

GridSpace2 / Collage - Executable e-Science Publications Jun 2012 Dec 2011 Jun 2011 23 • Goal: Extend the traditionalway of authoring and publishingscientificmethods with computationalaccess and interactivitymechanismsthus bringing reproducibility to scientific computationalworkflows and publications • Scientific challenge: Conceive a model and methodology to embracereproducibility in scientificworflows and publications • Technological challenge: supportthese by modern Internet technologies and availablecomputinginfrastructures • Solution proposed: • GridSpace2 – web-orienteddistributedcomputing platform • Collage – authoring environment for executablepublications

GridSpace2 / Collage - Executable e-Science Publications • Results: • GridSpace2/Collage won Executable Paper Grand Challenge in 2011 • Collage was integrated with Elsevier ScienceDirect portal so papers can be linked and presented with corresponding computational experiments • Special Issue of Computers & Graphics journal featuring Collage-basedexecutable papers was released in May 2013 • GridSpace2/Collage hasbeen applied to multiplecomputationalworkflows in the scope of PL-Grid, PL-Grid Plus and Mapperprojects E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Service.In: ProcediaComputer Science, vol. 18, 2013 E. Ciepiela, P. Nowakowski, J. Kocot, D. Harężlak, T. Gubała, J. Meizner, M. Kasztelnik, T. Bartyński, M. Malawski, M. Bubak: Managing entire lifecycles of e-science applications in the GridSpace2 virtual laboratory–from motivation through idea to operable web-accessible environment built on top of PL-grid e-infrastructure. In: Building a National Distributed e-Infrastructure–PL-Grid, 2012 P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: ProcediaComputer Science, vol. 4, 2011

Cookery – framework for buildingDSLs • Workflowsbased on graphrepresentationsarewidelyused to developscientificapplications. Howevertheyencountercertainissues, theyare not easy to share, to trackchagnes and to performtests. • Applications developedusinggeneral-purposeprogramminglangaugesdon’tmeettheseissues – a widerange of toolsweredeveloped for software development for codesharing and trackingchanges (version controll, codereviews). • We propose a solutionbased on Rubyprogramminglanguagethatcombinesadvanteges from twoworlds, itis not morecomplex for the end-userthansolutionsbased on graphicalrepresentations and itenables the widerange of tools for software development • Applications can be written in DSL thatisclose to English: Read file /tmp/test_data.gzip. Count words. Print result.

Transformingscripts intoworkflows • Scientific workflowsareconsidered to be a convinient high-levelalternative to solutionsbased on programminglanguages • We investigateGridSpacecollaborative and execution environment based on Rubylanguagethatenablesacces to GridinfrastructureusingAPIs • We describehow to addressissues of analysingRubysorucecode to buildworkflowrepresentations a = GObj.create b = a.async_do_sth("") c = b.get_result d = a.async_do_sth(c) e = d.get_result M. Baranowski, A. Belloum, M. Bubak and M. Malawski: Constructing workflows from script applications,Scientific Programming, 2012, doi:10.3233/SPR-120358

HyperFlow: model & execution engine HyperFlow model JSON serialization { "name":"...", name of the app "processes":[...], processes of the app "functions":[...],functions used by processes "signals":[...], exchanged signals info "ins":[...],inputs of the app "outs":[...]outputs of the app } • Supports a rich set of workflow patterns • Suitable for various application classes • Abstracts from other distributed app aspects (service model, data exchange model, communication protocols, etc.) Simple yet expressive model for complex scientific apps App = set of processes performing well-defined functions and exchanging signals

Scalable data access • Storage federation • In service orchestration, all data is passed to the workflow engine • Data transfers are made through SOAP, which is unfit for large data transfers S.Koulouzis, R.Cushing, K.Karasavvas, A. Belloum, M.Bubak: Enabling web services to consume and produce large distributed datasets, to be published JAN/FEB, IEEE Internet Computing, 2012

Data reliability and integrity DRI is a toolwhichcankeepstrack of binary data storedina cloudinfrastructure, monitor data availability and faciliateoptimaldeployment of application services in a hybridcloud (bringingcomputations to data ortheotherwayaround). LOBCDER DRI Service Metadata extensions for DRI A standaloneapplication service, capable of autonomousoperation. It periodicallyverifiesaccess to anydatasetssubmitted for validation and iscapable of issuingalerts to datasetowners and system administrators in case of irregularities. Validation policy Register files Get metadata Migrate LOBs Get usage stats (etc.) Configurable validation runtime (registry-driven) Runtime layer Extensible resource client layer End-user features (browsing, querying, direct access to data, checksumming) Binary data registry Store and marshal data VPH Master Int. OpenStack Swift Cumulus Amazon S3 Data management portlet (with DRI management extensions) Distributed Cloudstorage

Data security in clouds • Data should be secure stored and realiable deleted when no longer needed • Clouds not secureenough, data optimisationspreventingensuringthat data weredeleted • A solution: • end-to-end encryption (decryption key stays in protected/private zone) • data dispersal (portion of data, dispersed between nodes so it’s non-trivial/impossible to recover whole message) Jan Meizner, Marian Bubak, Maciej Malawski, and Piotr Nowakowski: Secure storage and processing of confidential data onpublic clouds.In: Proceedings of the International Conference On Parallel Processing and Applied Mathematics(PPAM) 2013, Springer LNCS • To ensuresecurity of data in transit • Modern applicationsusesecuretranportprotocols (e.g.TLS) • For legacyunencryptedprotocolsifabsolutlyneeded, or as additionalsecuritymeasure: • Site-to-Site VPN, e.g. between cloud sites is outside of the instance, might use • Remote access – for individual users accessing e.g. from their laptops

Colaborativemetadatamanagement Objectives • Provide means for ad-hoc metadata model creation and deployment of corresponding storage facilities • Create a research space for metadata model exchange and discovery with associated data repositories with access restrictions in place • Support different types of storage sites and data transfer protocols • Support the exploratory paradigm by making the models evolve together with data Architecture • Web Interface is used by users to create, extend and discover metadata models • Model repositories are deployed in the PaaS Cloud layer for scalable and reliable access from computing nodes through REST interfaces • Data items from Storage Sites are linked from the model repositories

MapReduce specificlanguage • We provide a domainspecificlanguagefor definingMapReduceoperations • It allowes to executeoncespecifiedqueries on manyMapReduceengines • Applications canswitch data sourceseasier • Applications canhaveseparatedenvironmenatsfor differentstages of development (development, testing, production) – morerobustcode

Separation of concerns • Scientific applications are constructed from 3 types of components • We strictly define their concerns • Tasks is the place where we define computations • Resource is where we define used resources • In Mapping we join resources with • We limit interactions by defining relations • Tasks use constructs determined by Resource (e.g. MapReduce constructs • Mapping maps corresponding Tasks to Resources

Towards ecosystem of data and processes Is it possible to create an ecosystem where scientific dataand processescan be linked through semanticsand used as alternative to the current manual composition of eScience applications? • How to implement adaptive scheduling needed for workflow enactment across multiple domains? • How to achieve QoS for data centric application workflows that have special requirements on network connections? • How to achieve robustness and fault tolerance for workflow running across distributed resources? • How to increase re-usability of workflows, workflow components, and refine workflow execution?

WorkflowlesseScience

Self-organizing linked process ecosystem A Networked Open Processes. built from an RDF store describing SADI services. • Vertexesare operations described in BioMoby Semantics. • Edgesshow a semantic match between output and input

Computing on browsers R. Cushing, G.a Putra, S. Koulouzis, A.S.Z Belloum, M.T. Bubak, C. de Laat: Distributed computing on an Ensemble of Browsers, IEEE Internet Computing, PrePress 10.1109/MIC.2013.3, January 2013

Automata-based dynamic data processing • Data processing schema can be considered as a state transformation graph • The graph facilitates data processing in many ways • Data state can be easily tracked • Using the graph as a protocol header, a virtual data processing network layer is achieved • Data becomes self routable to processing nodes • Collaboration can be achieved by joining the virtual network State Graph describing a filtering state machine for tweets which is mapped to 11 VMs R.Cushing, A.Belloum, M.Bubaket al.: Automata-based Dynamic Data Processing for Clouds, BigDataClouds 2014

Building scientific software based on Feature Model Research on Feature Modeling: • modelling eScience applications family component hierarchy • modelling requirements • methods of mapping Feature Models to Software Product Line architectures Research on adapting Software Product Line principles in scientific software projects: • automatic composition of distributed eScienceapplications based on Feature Model configuration • architectural design of Software Product Line engine framework B. Wilk, M. Bubak, M. Kasztelnik: Software for eScience: from feature modeling to automatic setup of environments, Advances in Software Development, Scientific Papers of the Polish Informations Processing, Society Scientific Council, 2013 pp. 83-96

Common Information Space (CIS) • Facilitatecreation, deployment and robustoperation of EarlyWarning Systems in virtualizedcloud environment • EarlyWarning System (EWS): any system workingaccording to foursteps: monitoring, analysis, judgment, action (e.g. environmental monitoring) • Common Information Space • connectsdistributed component into EWS and deployitoncloud • optimizesresourceusagetakingintoacount EWS importancelevel • provides EWS and selfmonitoring • equipped with autohealing B. Balis, M. Kasztelnik, M. Bubak, T. Bartynski, T. Gubala, P. Nowakowski, J. Broekhuijsen: The UrbanFlood Common Information Space for Early Warning Systems. In: Elsevier Procedia Computer Science, vol 4, pp 96-105, ICCS 2011.

Multiscaleprogrammingand executiontools • A method and an environment for composing multiscaleapplications from single-scale models • Validation of the themethodagainst real applicationsstructuredusingtools • Extension of applicationcompositiontechniquesto multiscalesimulations • Support for multisite execution of multiscalesimulations • Proof-of-concepttransformation of high-levelformaldescriptionsintoactualexecutionusing e-infrastructures MaMe MAD GridSpace K. Rycerz, E. Ciepiela, G. Dyk, D. Groen, T. Gubala, D. Harezlak, M. Pawlik, J. Suter, S. Zasada, P. Coveney, M. Bubak: Support for Multiscale Simulations with Molecular Dynamics, ProcediaComputer Science, Volume 18, 2013, pp. 1116-1125, ISSN 1877-0509 K. Rycerz, M. Bubak, E. Ciepiela, D. Harezlak, T. Gubala, J. Meizner, M. Pawlik, B.Wilk: Composing, Execution and Sharing of Multiscale Applications, submitted to FutureGenerationComputer Systems, after 1st review (2013) K. Rycerz, M.Bubak, E.Ciepiela, M. Pawlik, O. Hoenen, D. Harezlak, B. Wilk, T. Gubala, J. Meizner, and D. Coster: Enabling Multiscale Fusion Simulations onDistributed Computing Resources, submitted to PLGrid PLUS book 2014 • MAPPER Memory (MaMe) a semantics-aware persistence store to record metadata about models and scales • Multiscale Application Designer (MAD) visual composition tool transforming high level description into executable experiment • GridSpace Experiment Workbench (GridSpace) execution and result management of experiments

PL-Grid Project Results • First working NGI in Europe in the framework of EGI.eu (since March 31, 2010) • Number of users (March 2012): 900+ • Number of jobs per month: 750,000 - 1,500,000 • Resourcesavailable: • Computing power: ca.230 TFlops • Storage:ca. 3600 TBytes • High level of availiability and realibility of the resources • Facilitating effective use of these resources by providing: • innovative grid services and end-user toolslikeEfficient Resource Allocation, Experimental Workbenchand GridMiddleware • Scientific Software Packages • User support: helpdesk system, broadtrainingoffer • Various, well-performeddissemination activities, carried out at national and internationallevels, whichcontributed significantly to increasing of awareness and knowledge about the Project and the grid technology in Poland.

PLGrid Plus Project Results • New domain-specific services for 13 identified scientific domains • Extension of the resources available in the PL-Grid Infrastructureby ca. 500 TFlops of computing power and ca. 4.4 PBytes of storage capacity • Design and start-up of support for new domain grids • Deployment of Quality of Service system for users • by introducing SLA agreement • Deployment of new infrastructure services • Deployment of Cloud infrastructure for users • Broadconsultancy, training and disseminationoffer

Summary • Modelling of complex collaborative scientific applications • domain-oriented semantic descriptions of modules, patterns, and data to automate composition of applications • Studying the dynamics of distributed resources • investigating temporal characteristics, dynamics, and performance variations to run applications with a given quality • Modelling and designing a software layer to access and orchestrate distributed resources • mechanisms for aggregating multi-format/multi-source data into a single coherent schema • semantic integration of compute/data resources • data aware mechanisms for resource orchestration • enabling reusability based on provenance data

Topics for collaboration • Optimization of service deployment on clouds • Constraint satisfaction and optimization of multiple criteria (cost, performance) • Static deployment planning and dynamic auto-scaling • Billing and accounting model • Adapted for the federatedcloud infrastructure • Handle multiple billing models • Supporting system-level (e)Science • tools for effective scientific research and collaboration • advanced scientific analyses using HPC/HTC resources • Cloud security • security of data transfer • reliable storage and removal of the data • Cross-cloud service deployment based on container model dice.cyfronet.pl www.science.uva.nl/~gvlam/wsvlam

Environments for eScience on Distributed Infrastructures