1 / 33

MULTICRITERIA SCHEDULING ON THE GRID Jan Węglarz

MULTICRITERIA SCHEDULING ON THE GRID Jan Węglarz Institute of Computing Science, Poznań Univ. of Technology Poznań Supercomputing & Networking Center. Introduction to Grids Resource Management in Grids Multicriteria Approach for Job Scheduling in Grids

jory
Télécharger la présentation

MULTICRITERIA SCHEDULING ON THE GRID Jan Węglarz

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MULTICRITERIA SCHEDULING ON THE GRID Jan Węglarz Institute of Computing Science, Poznań Univ. of Technology Poznań Supercomputing & Networking Center

  2. Introduction to Grids Resource Management in Grids Multicriteria Approach for Job Scheduling in Grids Model (i) with unknown time characteristics Model (ii) with known time characteristics Summary Outline

  3. Currently Grids are subject of intensive research The main objective is to provide heterogeneous, dynamically scalable, shared resource environment In the context of computations this develops the well-known idea of metacomputer in a wide area network Grids

  4. Check list (given by Ian Foster): coordinates resources that are not subject to centralized control; uses standard, open, general-purpose protocols and interfaces; delivers nontrivial qualities of service. When does a distributed system become a grid?

  5. Grid Problem • Flexible, secure, coordinated resource sharingamong dynamic collections of individuals, institutions, and resources based on “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” • Enabling various groups (virtual organizations) to share geographically distributed resources in order to solve common complex problems assuming the lack of: • centralized location of resources, • centralized management, • knowledge of the global state of the system, • full trust among users.

  6. Virtual Organizations • Virtual Organization joins users belonging to various real institutions (RI) • Each RI has its own resource management policies • Users share resources (computing, information, scientific devices, software licenses etc.) Source (figure): www.globus.org

  7. Main characteristics: applications may execute on geographically distributed resources, taking advantage of different specialized system architectures and scientific instruments applications may require a VO consisting of several supercomputers, clusters of workstations, network connectivity between them, access to remote datasets and various scientific devices such as microscopes and telescopes Such emerging infrastructure is called Grid, by analogy to the electric power grid. Grid environments A C Administrator VO B

  8. Research Problems • Security • Mobile users’ support • preparation of the off-line task • submitting task after connecting to the Grid system using mobile device (e.g. PDA) • Assurance of a quality of service (QoS) • Network bandwidth • Computational resources • Maximal transparency of the system • Users’ management, accounting • Resource management • scheduling of users’ jobs, • resource load balancing, • multicriteria approaches, • user’s preference modeling

  9. Evaluation of resources static and dynamic performance characteristics of resources (e.g. memory, storage, flops) historical information (availability, stability, reliability) Evaluation of resource co-allocations consequences of the assignment of resources to tasks (e.g. application execution time, mean response time, meantardiness, cost of computations) Several groups of stakeholders of the Grid resource management process different points of view focus on different criteria various preferences concerning criteria Multicriteria GRM - motivations

  10. End-users make use of Grid applications and portals have requirements concerning their applications Resource Administrators and Owners administrate and share resources to achieve some benefits VO Administrator manages and controls VO makes global policies Stakeholders of the GRM problem

  11. Concerning particular resources (e.g. memory, flops) or co-allocations of resources (e.g. estimated processing time, maximum lateness) Specific for end-users (e.g. mean response time, mean tardiness, cost of computations), resource owners (e.g. machine idleness) and VO administrator (e.g. throughput, makespan) Time criteria (e.g. mean response time, makespan, mean tardiness), cost criteria (e.g. weighted resource consumption, cost of computations) andresource utilization criteria (e.g. load balancing, machine idleness) Criteria

  12. As a function by means of parameters used in an utility function in order to model relative importance of criteria expressed using the multi-criteria resource specification language or gathered from stakeholders during an interactive process As a relation relation between solutions occurs if a solution is not worse than any other solution (the so-called outranking relation) input parameters such as weights and thresholds are provided by stakeholders As logic statements (decision rules) often preferences of a VO admin. and resource owners concerning e.g. load of resources or users’ access rules may also express preferences of end-users Preference modelling

  13. We investigate two different models Model (i): MC-GRM with unknow time characteristics In huge grids, with a large (millions) number of users and nodes it is difficult to predict waiting and execution times of jobs. Variance of resources, users, jobs etc. is a strong factor here. Such environments, in a single criteria case, are the mostcommon among current Grid infrastructures managed by relatively simple grid brokers such as: Condor-G, CSF, EGEE Workload Manager, Nimrod-G (they use mostly “best effort” techniques and strategies) Model (ii): MC-GRM with known time characteristics (based on prediction and resource reservation) E.g. in so called Enterprise Grids many manufacturing jobs are well described, since the same jobs run repeatedily. Processing nodes in such environments usually offer some reservation mechanisms The models of MC-GRM

  14. The models of MC-GRM: illustration • Grid scheduling problems with unknown time characteristics • Grid scheduling in presence of time characteristics achieved by prediction techniques, and by resource reservation mechanisms available on resources.

  15. Grid Virtual Organization is managed and controlled according to resource and security policies defined by a VO administrator Additionally, local administrators can define their own local constraints that must be obeyed. End users (usually the largest group of stakeholders) need to get access to remote resources and run their applications according to domain-specific constraints and their preferences. End users’ requirements and preferences can vary from detailed properties of applications and resources to advanced criteria such as cost or QoS. Administrator VO B A C A C B Common assumptions for both examples Hard constraints Soft constraints (preferences)

  16. Grid broker does not posses any knowledge aboutexecution and waiting times, Decisions concerning allocation of resourcesare made upon the hard constraints (e.g. security and resource management policies) and also soft constraints concerning resource and job characteristics, Our goal is to meet all hard constraints and to maximize the global satisfaction of all end-users. Model (i): Problem description

  17. Criteria: w: the average waiting time in a queue on RP (Resource Provider) s: the speed of processor unit m:the size of available memory Resource Providers characteristics CPU speed of RP1 is around two times faster than other RPs (jobs are executed faster) RP3 has a lot of memory available for applications There are some jobs running on RP1 and RP3 when a Grid broker makes a scheduling decision. Therefore, all jobs have to wait within both RP1 and RP3 queues (the average estimated waiting times equal 50 and 70 units respectively for all jobs) We assume that all RPs meet hard constraints. Model (i) - Example

  18. JC Model (i) – Example Resource not available The shortest queue waiting time on RP VO Sites RP4 JA RP3 RP2 JB RP1 Cmax Time Values of criteria:

  19. Model (i) – Example Multi-criteria approach VO Sites JC RP4 JA RP3 RP2 JB RP1 Cmax Cmax Time Values of criteria:

  20. Model (i): Example

  21. Calculation of satisfaction • For each user u from the set of users U: • satisfaction of user u: • wi – weight of criterion i • ci – value of criterion I • ci* - nadir point • Cu – values of criteria of user u • s(x) – scaling function • for cost criteria: • for gain criteria: • Total satisfaction:

  22. Easier expression of user preferences Non-skilled users do not need to evaluate technical details of resources (they can focus on time and cost only) Providing users with a priori information about waiting and execution times Essential, e.g. for interactive applications Realization of a quality of service (QoS) e.g. handling jobs with deadlines Efficient (synchronized) co-allocation of resources e.g. for MPI applications Reliable calculation of resource utilization costs Users are aware what they are charged for Model (ii): Motivations

  23. Advance reservation mechanism Exclusive assignement of resources for a certain user for a specific time This functionality must be supported by Grid Scheduler (globally) as well as by Resource Providers (locally) May have negative influence on waiting times of other jobs Prediction of waiting and execution times Based on historical information about submitted jobs May be very imprecise due to heterogenity and size of grids Accounting and billing of resource utilization Various policies based on specific bussines cases Imposes the need of QoS guarantees Model (ii) - Problems

  24. Problem Find a compromise (maximizing users’ satisfaction) schedule of jobs J of users U on nodes N Evaluation criteria (soft constraints) ts – guaranteed start time (start time of reservation for this job) tc – estimated completion time (start time + estimated execution time) c – cost of resource utilization Hard constraints Resource requirements, e.g. operating system, CPU architecture and speed, amount of memory etc. Time requirements, e.g. time slots, deadlines, days of week etc. Solutions Sets of assignments of jobs to resources in certain time periods {(ji, nj, Rnj, tstart, tend), i=1..n, j=1..k} Modeled as a multicriteria choice problem Model (ii) – Problem Definition

  25. JB JC JA JC JB JA Model (ii) – Example VO sites MST Multi-criteria approach Reservations Jk J6 RP4 J7 J4 J5 Free time slots RP3 J3 RP2 Estimated job execution times J2 J1 RP1 time Values of criteria:

  26. JB JC JA JC JB JA Model (ii) – Example VO sites MST Multi-criteria approach Reservations J5 J6 RP4 J7 Makespan = 100 J4 J5 Free time slots RP3 J3 Makespan = 125 RP2 Estimated job execution times J2 J1 RP1 time Values of criteria:

  27. Model (ii) – Example • Total evaluation of solutions (schedules) is obtained through aggregation of evaluations of all users • Solution A1 B2 C3 although the worst in terms of makespan and mean start time satisfies users’ preferences in the best way

  28. Job scheduling in Grids is a multicriteria process by nature Different preferences of stakeholders Hard and soft constraints Two different situations can be distinguished Model without knowledge about job waiting and execution times Model with knowledge about job waiting and execution times They can be solved using similar decision support approaches but with different criteria. Summary

  29. GridLab was an EU funded project (2002-2005) developing tools and services for Grids 11 EU partners and 3 US contractors, budget: 5 MEuro, GridLab testbed, consisting of resurces in Europe, US and Korea still exists (development, testing). GRMS (Grid Resource Management System) was one of the services developed by PSNC within GridLab. Today GRMS manages many Grid infrastructures world-wide Clusterix Virtual Laboratory (Vlab) SURA (US) ACGT Grid Cancer Project InteliGrid (European VO for engineering applications) GridLab Project

  30. GRMS is a job scheduling and resource grid management framework GRMS is based on dynamic resource discovery and selection, mapping and advanced multicriteria scheduling methodologies Both models (i) and (ii) have been implemented within GRMS, together with other multicriteria models (MC-Evaluator Module). GRMS manages the whole process of remote job submission and control Our preliminary tests have showed that based on hard/soft constraint concept and multicriteria analysis GRMS is more efficient then other existing grid brokers Example Grid Broker: Grid Resource Management System (GRMS)

  31. MC Evaluator GAS MCEvaluator

  32. MCEvaluator is a module of GRMS responsible for making decisions concerning selection of resources and schedules. It consists of a set of classes providing abstraction of entities used in multicriteria models. It contains criteria, constraints, solutions etc. Main entities Criteria (objectives, soft constraints: user is interested in their greatest or lowest possible values) Constraints (hard constraints: if is not satisfied solution is not taken into consideration) Solutions (e.g. resources, schedules etc. along with description parameters) Evaluator (decision point) Other specific entities derived from these above Various Multi-criteria methods Evaluation function (e.g. weighted sum) Non-dominated solutions Lexicographic order Other ... MCEvaluator

  33. GRMS as a main grid component/service controls complex applications (workflows) on behalf of end users (Live demos during iGrid2005 and Supercomputing2005 conferences) Merging Computing Node Data flow Data flow Data flow GRMS Displaying Extracting Encoding Extracting Encoding Computing Node Computing Node Computing Node

More Related