Workflows and Scheduling in Grids

Workflows and Schedulingin Grids Ramin Yahyapour University DortmundLeader CoreGRID Institute on Resource Management and Scheduling CoreGRID – Summer School Budapest, 05 September 2007

CoreGRID RMS Institute Objective • Objectives: • Development of a common and generic solution for Grid resource management/scheduling in Next Generation Grids. • Development of new algorithms for coordinated scheduling for all resource types, including data,network etc. • Support of Grid business models in the scheduling process Architecture Algorithms • Goal: linking theoretical foundation and practical implementation on the different level of Resource Management Implementations European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Current Institute Roadmap Common Scheduling/Brokerage Architecture Model Support for SLA Management and Negotiation Inst. RMS Algorithms for coordinated scheduling/negotiation Solutions for Evaluation, Testing, Prediction Domain-specific solutions forComputational Grids European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Participants • CETIC, Belgium • IPP-BAS, Bulgaria • CNR-ISTI, Italy • CNRS, France • Delft University, Netherlands • EPFL, Switzerland • Fraunhofer Gesellschaft, Germany • Research Center Jülich, Germany • PSNC, Poland • MTA SZTAKI, Hungary • University of Münster, Germany • University of Calabria, Italy • University of Cyprus • University of Dortmund, Germany • University of Manchester, UK • EAI-FR, Switzerland • University of Westminster, UK • Technical University of Catalonia, Spain • Zuse Institute Berlin, Germany • University of Innsbruck, Austria 20 participating institutes; 89 researchers European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Grid Scheduling

Key Question • “Which services/resources to use for an activity, when, where, how?” • Typically: • A particular user, or business application, or component applicationneeds for an activity one or several services/resourcesunder given constraints • Trust & Security • Timing & Economics • Functionality & Service level • Application-specifics & Inter-dependencies • Scheduling and Access Policies • This question has to be answered in an automatic, efficient, and reliable way. • Part of the invisible and smart infrastructure! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Motivation • Resource Management for Future/Next Generation Grids! • But what are Future Generation Grids? • HPC Computing • Parallel Computing • Cluster Computing • Desktop Computing • Enterprise Grids • Business Services • Application Server • SOA/Webservices • Ambient IntelligenceUbiquitous Computing • PDA, Mobile Devices depends on who you ask! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Resource Definition • Concluding from the different interpretations of “Grid”:for broad acceptance Grid RMS should probably cover the whole scope; • Resources: • Compute • Network • Storage • Data • Software • components, licenses • Services • functionality, ability • Management of some resources is less complex, • while other resources require coordination and orchestration to be effective (e.g. HW and SW). European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Resource Management Layer • Grid Resource Management System consists of : • Local resource management system (Resource Layer) • Basic resource management unit • Provide a standard interface for using remote resources • e.g. GRAM, etc. • Global resource management system (Collective Layer) • Coordinate all Local resource management system within multiple or distributed Virtual Organizations (VOs) • Provide high-level functionalities to efficiently use all of resources • Job Submission • Resource Discovery and Selection • Scheduling • Co-allocation • Job Monitoring, etc. • e.g. Meta-scheduler, Resource Broker, etc. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Core Grid Infrastructure Services Information Services MonitoringServices SecurityServices Grid Resource Manager PBS Grid Resource Manager LSF Grid Resource Manager … Local Resource Management Resource Resource Resource Grid RMS Higher-Level Services User/Application Grid Middleware ResourceBroker European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Grid Scheduling Grid User Grid-Scheduler Scheduler Scheduler Scheduler time time time Schedule Schedule Schedule Job-Queue Job-Queue Job-Queue Machine 1 Machine 2 Machine 3 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Select a Resource for Execution • Most systems do not provide advance information about future job execution • user information not accurate as mentioned before • new jobs arrive that may surpass current queue entries due to higher priority • Grid scheduler might consider current queue situation, however this does not give reliable information for future executions: • A job may wait long in a short queue while it would have been executed earlier on another system. • Available information: • Grid information service gives the state of the resources and possibly authorization information • Prediction heuristics: estimate job’s wait time for a given resource, based on the current state and the job’s requirements. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Co-allocation • It is often requested that several resources are used for a single job. • that is, a scheduler has to assure that all resources are available when needed. • in parallel (e.g. visualization and processing) • with time dependencies (e.g. a workflow) • The task is especially difficult if the resources belong to different administrative domains. • The actual allocation time must be known for co-allocation • or the different local resource management systems must synchronize each other (wait for availability of all resources) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Scheduler Scheduler Scheduler time time time Schedule Schedule Schedule Job-Queue Job-Queue Job-Queue Machine 1 Machine 2 Machine 3 Example Multi-Site Job Execution Grid-Scheduler Multi-Side Job • A job uses several resources at different sites in parallel. Network communication is an issue. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Advanced Reservation • Co-allocation and other applications require a priori information about the precise resource availability • With the concept of advanced reservation, the resource provider guarantees a specified resource allocation. • includes a two- or three-phase commit for agreeing on the reservation • Implementations: • GARA/DUROC/SNAP provide interfaces for Globus to create advanced reservation • implementations for network QoS available. • setup of a dedicated bandwidth between endpoints • “WS-Agreement” defines a protocol for agreement management European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Using Service Level Agreements • The mapping of jobs to resources can be abstracted using the concept of Service Level Agreement (SLAs) • SLA: Contract negotiated between • resource provider, e.g. local scheduler • resource consumer, e.g., grid scheduler, application • SLAs provide a uniform approach for the client to • specify resource and QoS requirements, while • hiding from the client details about the resources, • such as queue names and current workload European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Execution Alternatives • Time sharing: • The local scheduler starts multiple processes per physical CPU with the goal of increasing resource utilization. • multi-tasking • The scheduler may also suspend jobs to keep the system load under control • preemption • Space sharing: • The job uses the requested resources exclusively; no other job is allocated to the same set of CPUs. • The job has to be queued until sufficient resources are free. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Job Classifications • Batch Jobs vs interactive jobs • batch jobs are queued until execution • interactive jobs need immediate resource allocation • Parallel vs. sequential jobs • a job requires several processing nodes in parallel • the majority of HPC installations are used to run batch jobs in space-sharing mode! • a job is not influenced by other co-allocated jobs • the assigned processors, node memory, caches etc. are exclusively available for a single job. • overhead for context switches is minimized • important aspects for parallel applications European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Parallel Application Types • Rigid • Requires a fixed number of processors • Moldable • The number of processors can be adapted only at the start of the execution • Malleable • Number of assigned processors can be changed during runtime (i.e., grow/shrink) # of Processors # of Processors # of Processors time time time Rigid Moldable Malleable D. G. Feitelson and L. Rudolph, “Toward convergence in job schedulers for parallel supercomputers,” in JSPP’96 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Preemption • A job is preempted by interrupting its current execution • the job might be on hold on a CPU set and later resumed; job still resident on that nodes (consumption of memory) • alternatively a checkpoint is written and the job is migrated to another resource where it is restarted later • Preemption can be useful to reallocate resources due to new job submissions (e.g. with higher priority) • or if a job is running longer then expected. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Job Scheduling • A job is assigned to resources through a scheduling process • responsible for identifying available resources • matching job requirements to resources • making decision about job ordering and priorities • HPC resources are typically subject to high utilization • therefore, resources are not immediately available and jobs are queued for future execution • time until execution is often quite long (many production systems have an average delay until execution of >1h) • jobs may run for a long time (several hours, days or weeks) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Typical Scheduling Objectives • Minimizing the Average Weighted Response Time • Maximize machine utilization/minimize idle time • conflicting objective • criteria is usually static for an installation and implicit given by the scheduling algorithm • r : submission time of a job • t : completion time of a job • w : weight/priority of a job European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Job Steps Scheduler • A user job enters a job queue, • the scheduler (its strategy) decides on start time and resource allocation of the job. time Job ExecutionManagement Schedule lokaleJob-Queue Grid-User Job Description Node Job Mgmt Node Job Mgmt Node Job Mgmt HPC Machine European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Example of Grid Scheduling Decision Making Where to put the Grid job? Grid User Grid-Scheduler 40 jobs running 80 jobs queued 5 jobs running 2 jobs queued 15 jobs running 20 jobs queued Scheduler Scheduler Scheduler time time time Schedule Schedule Schedule Job-Queue Job-Queue Job-Queue Machine 1 Machine 2 Machine 3 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Available Information from the Local Schedulers • Decision making is difficult for the Grid scheduler • limited information about local schedulers is available • available information may not be reliable • Possible information: • queue length, running jobs • detailed information about the queued jobs • execution length, process requirements,… • tentative schedule about future job executions • These information are often technically not provided by the local scheduler • In addition, these information may be subject to privacy concerns! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Grid-Level Scheduler • Discovers & selects the appropriate resource(s) for a job • If selected resources are under the control of several local schedulers, a meta-scheduling action is performed • Architecture: • Centralized: all lower level schedulers are under the control of a single Grid scheduler • not realistic in global Grids • Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Towards Grid Scheduling • Grid Scheduling Methods: • Support for individual scheduling objectives and policies • Multi-criteria scheduling models • Economic scheduling methods to Grids • Architectural requirements: • Generic job description • Negotiation interface between higher- and lower-level scheduler • Economic management services • Workflow management • Integration of data and network management European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Scheduling Objectives in the Grid • In contrast to local computing, there is no general scheduling objective anymore • minimizing response time, minimizing cost • tradeoff between quality, cost, response-time etc. • Cost and different service quality come into play • the user will introduce individual objectives • the Grid can be seen as a market where resource are concurring alternatives • Similarly, the resource provider has individual scheduling policies European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Workflow Scheduling

Workflows • What is a workflow? Example: A simple Job Chain Task1 Task 2 Task 3 Task 4 Dependencies between tasks/job steps:Control and/or data dependencies European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Example of a Workflow • A simple workflow from climate research with data dependencies Task1 Task 2 Task 3 NewResults Climate Archive DataSubset Select InterestingData Simulate Visualize European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Communication/Data Dependencies • Workflows can cover different communication models • synchronous(e.g. streaming of multiple active jobs)or • asynchronous(e.g. via files) • Synchronous communication requires co-allocation of jobs and data streaming management • Asynchronous communication requires file/data management in distributed Grid environments European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Impact of Coordinated Scheduling (1) • Consider an application example with a simple workflow consisting of 4 consecutive tasks/steps each running 4 minutes Task1 Task 2 Task 3 Task 4 • Consider also a Grid resource with a batch queuing system (e.g. Torque) that has on average a queue waiting time of 60 minutes. • We apply a just-in-time scheduling. • How long will it take to execute the whole workflow? Task 1 waits for 1h and runs for 5 minutesTask 2 waits for Task 1 to complete, all other tasks analogous = 4*1h + 4*5min = 4h 20 min European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Impact of Coordinated Scheduling (2) • How to improve? • put several step in the queue and keep them on hold if preceeding step is not finished(might produce idle times on resources) • or using advance reservations (= Planning) • How long will it ideally take to execute the whole workflow? Task 1 waits for 1h and runs for 5 minutesTask 2 starts immediately after Task 1 all other tasks analogous = 1h + 4*5min = 1h 20 min European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

More complex workflow (1) • Concurrent activities T 2.1 T 4.1 Task1 Task3 Task5 T 2.2 T 4.2 T 2.3 T 4.3 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

More complex workflow (1) • Using loops Task1 Task 2 Task 3 Task 4 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Job A Job B Job C Job D Example: DAGMan • Directed Acyclic Graph Manager • DAGMan allows you to specify the dependencies between your Condor-G jobs, so it can manage them automatically for you. • (e.g., “Don’t run job “B” until job “A” has completed successfully.”) • A DAG is defined by a .dag file, listing each of its nodes and their dependencies: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies Source: Miron Livny

Dynamic Workflows vs Static Workflows • Some workflows are not known in advance and its structure might be determined during run time = Dynamic Workflows • Static workflows are known in advance • Major impact for planning and scheduling workflows European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Promoter Identification Workflow Source: Matt Coleman (LLNL) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Source: NIH BIRN (Jeffrey Grethe, UCSD) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Test sample (d) Species presence & absence points (native range) (a) Native range prediction map (f) Training sample (d) GARP rule set (e) Data Calculation Map Generation Map Generation EcoGrid Query EcoGrid Query Validation User Validation Sample Data +A2 +A3 Model quality parameter (g) Generate Metadata Integrated layers (native range) (c) Layer Integration Layer Integration +A1 Environmental layers (native range) (b) Invasion area prediction map (f) Selected prediction maps (h) Model quality parameter (g) Integrated layers (invasion area) (c) Environmental layers (invasion area) (b) Species presence &absence points (invasion area) (a) Ecology: GARP Analysis Pipeline for Invasive Species Prediction European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies Source: NSF SEEK (Deana Pennington et. al, UNM)

http://www.gridlab.org/ European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

GEO 600 Coalescing Binary Search Triana Prototype European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Workflow Taxonomy Workflow System Workflow design And specification Scheduling and Enactment Operational Attributes Data Management Component/Service Discovery structure composition Model/spec Source: Omer Rana European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Workflow Composition Composition Automated User Directed Planner Graph-based Language-based Templates Petri Net DAG Design Patterns Logic Markup UML Functional Sub-workflows Process Calculi Process Calculi User defined Factory scripting Source: Omer Rana European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Taxonomy of Workflow Scheduling • Scheduling Criteria • Single vs. multiple • Number of workflows considered during scheduling step • Single (optimizing a single workflow) vs. • multiple (optimizing several or all workflows at the same time) • Dynamicity • Full-ahead vs. • Just-time vs. • Hybrid Source: CoreGRID Report by U. Innsbruck, FhG FIRST Berlin European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Taxonomy of Workflow Scheduling (2) • Optimization Model • Workflow-oriented (considering the benefit of a single workflow/user) vs. • Grid-wide (overall optimization goal) • Advance Reservation • With AR (using reservations/SLAs) • or without Source: CoreGRID Report by U. Innsbruck, FhG FIRST Berlin European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Taxonomy of Workflow Scheduling Systems Source: Jia Yu, Rajkumar Buyya European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Workflow Languages • Plenty of them, see Grid Workflow Forum: • Workflow languages (scientific and industrial) • * AGWL • * BPEL4WS • * BPML • * DGL • * DPML • * GJobDL • * GSFL • * GFDL • * GWorkflowDL • * MoML • * SWFL • * WSCL • * WSCI • * WSFL • * XLANG • * YAWL • * SCUFL/XScufl • * WPDL • * PIF • * PSL • * OWL-S • * xWFL Source: Grid Workflow Forum (www.gridworkflow.org) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Excerpt of Workflow Scheduling Systems • DAGMan • Pegasus • Triana • ICENI • Taverna • GridAnt • GrADS • GridFlow • Unicore • Gridbus workflow • Askalon • Karajan • Kepler Source: Grid Workflow Forum (www.gridworkflow.org) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

Workflows and Scheduling in Grids

Workflows and Scheduling in Grids

Presentation Transcript

Online Scheduling in Grids

Grids and Workflows

Introduction to Workflows and Use of Workflows in Grids and Grid Portals

Virtualization in Clusters and Grids

Workflows in SharePoint

Compute Grids, Data Grids and Service Grids

Documentation and Workflows

Multi-Objective Scheduling of Streaming Workflows

Computational grids and grids projects

Workflows

Workflows in PROGRESS and GridLab environments

Multicriteria approach to scheduling in Grids with QoS guarantees

Authorisation in Grids

Grid scheduling issues in the Sun Data and Compute Grids Project

High Performance Workflows for Networks and Grids

Workflows in Archie

Workflows in SharePoint

BPEL in Grids

Introduction to Workflows and Use of Workflows in Grids and Grid Portals

EMAN, Scheduling, Performance Prediction, and Virtual Grids

Efficient Hierarchical Self-Scheduling for MPI Applications Executing in Computational Grids