Innovations in Data Management for Cloud Workflow Systems
160 likes | 293 Vues
This work by Dong Yuan from Swinburne University explores the intricate interplay between cloud computing and workflow systems, emphasizing data management's essential role. The outline covers an introduction to cloud workflow systems, challenges in data management, and the impact of emerging technologies like virtualisation and SOA. Key topics include data placement, replication strategies, and intermediate data storage, aiming to optimize system costs and improve data accessibility. This research presents a comprehensive overview of current features and future directions within cloud workflow management.
Innovations in Data Management for Cloud Workflow Systems
E N D
Presentation Transcript
Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology
Outline • Cloud Computing & Cloud Workflow Systems • Introduction to cloud workflow systems. A brief overview of grid workflow systems. • Data Management in Cloud Workflow Systems • New features and research issues • Cloud Computing Environment and SwinDeW-C • Our simulation environment and cloud workflow system
Cloud Computing • Some new features of cloud computing • Large data centres with cheap hardware • Virtualisation • Internet based and SOA • SaaS, PaaS, IaaS • Market driven and cost model • Research of cloud computing has emerged in many areas • Data mining, Database, Parallel computing & Scientific application, Content delivery
Cloud Workflow Systems • Grid workflow systems • Kepler, Pegasus, Taverna, MOTEUR, Triana, ASKALON • Gridbus, GridFlow • Build-time: focus on data modelling. • Kepler: actor-oriented data modelling. Taverna - Sculf. ASKALON - AGWL • Runtime: adopt Data Grid system • Grid DataFarm, GDMP, GridDB, SRB, RLS (P-RLS), GSB, DaltOn
Cloud Workflow Systems • Architecture • Based on Internet • Platform as a Service • More distributed
Data Management in Cloud Workflow Systems • New features and challenges • Independent of users and automatic • Cost driven • computation cost, storage cost, data transfer cost • Data dependency • Task – data, data – data, derivation • Some research issues • Data partition, placement, replication, synchronisation, provenance, catalogue, meta-data, consistence, reduction, storage, movement, etc.
Data Placement in Cloud Workflow Systems • Data Placement: to decide where to store the application data in the distributed data centres • Aims: • Reduce data movement • Reduce task waiting time • Strategy: • Data dependency: dataset – dataset • Build-time: existing data, runtime: generated data (also intermediate data)
Data Replication in Cloud Workflow Systems • Data replication: for one dataset, store several copies in different places (data centres) • Aims: • Increase data security • Fast data access • Reduce data movement • Strategy: • Dynamic replication.
Intermediate Data Storage in Cloud Workflow Systems • Intermediate data storage is especially importance in scientific workflows • Aim: • Reduce system cost • Strategy: • Intermediate data can be regenerated with data provenance information • Selectively store some key intermediate datasets
End • Questions? Thanks!