Local Resource Management System & State Estimation
550 likes | 684 Vues
This document explores the capabilities of local resource management systems, focusing on Condor and Maui. Condor is presented as a flexible batch job system providing computing resources through a matchmaking process, while stressing the importance of preemption and checkpointing for reliability. Maui, designed for high-performance scenarios, enhances resource allocation with features like reservations and guaranteed completion times. Together, these systems improve resource selection and job scheduling efficiency, ensuring robust performance in diverse computing environments.
Local Resource Management System & State Estimation
E N D
Presentation Transcript
Local Resource Management System & State Estimation • Local resource management systems • Condor, Maui, LSF, PBS • Prediction techniques • example NWS • improve resource selection
Condor - Introduction • Batch job system that allows usage of both dedicated and non-dedicated systems. • Provides users with extra computing power • Introduces complexities • remove jobs before they are finished (preemption) • run on a wide array of machines (matchmaking)
CondorPreemptive Resume Scheduling • Advantages • use resources that are only available occasionally by the use of checkpoints, preemption and allocation • no backfilling (take advantage of holes in the schedule to run more jobs, and hereby increase efficiency) • fair sharing of jobs and towards users • compute on demand (low vs high priority)
Condor – Scheduling • Submit jobs to local computer queue • Interact with matchmaker to run job (1 cpu/job) • Run appropiate (ClassAd) job by claiming it
Triumvirate • User agent – make sure job finishes, on failure resubmit, etc. • Owner agent – ensure owner's policy of how computer is used, responsible for running submitted jobs • Matchmaker – find matches between user and owner agent and implement system-wide policies
Condor – Matchmaking & Claiming • User submits job to queue, unique identification • User agent sends ClassAd (5 min) until there are jobs that are not running • Owner agent sends ClassAd (5 min) to describe the computer it is responsible for • Matchmaker accepts ClassAd's and attempts to find matches – negotiation • On match, user and owner agent independently of matchmaker work out the details (up-to-date inf.) • User agent sends job to owner agent, and it runs
Condor – Matchmaking & Claiming (2) • On problems outside process redo matchmaking; on program error, record problem and inform user • When program starts, another process (shadow) is started on user agent that is responsible for Condor’s remote I/O capabilities • Running jobs continue even if matchmaker fails
Condor - preemption • Preemption is necessary to respect interests of all parties • Key to success is checkpoint creation • when preempted from a machine • manual checkpoint creation • periodic checkpoint creation to safeguard against failures • Crashes/disruptions happen frequently in grids • Check pointing and reacting to preemptions is an essential part of Condor’s approach to reliability.
Condor – user preemption • Manual preemption • Automation of above process (eg. running time) • Preemption on behalf of Condor • eg. check if job can run on a better machine • not supported in current version of Condor • needs consideration such as ‘thrashing’ (always look for better computer, not being able to do any jobs)
Condor – owner / matchmaker preemption • Owner removes job running on his machine • automated by Condor (eg. check keyboard inactivity) • manually by running a command • Matchmaker can enforce administrator policies to increase efficiency • eg. run a better job on a machine already running one • Condor strongly prefers however not to preempt jobs if they can be run on an idle machine.
Condor - conclusion • Condor can balance the desires of all stakeholders • Condor can take both advantage of sporadically available resources and react to problems such as failures • This flexibility and robustness is its key to success
Maui Scheduler - Introduction • High performance scheduler for local clusters • Includes resource reservation, availability estimation and allocation management • External manager, extends and enhances the capabilities and performance of existing scheduler
Maui – Allocation properties • Concept of reservation to maintain resource allocations • most important feature is future allocations • set aside a block of resources for various purposes such as cluster maintenance, guaranteed job start time • resource expression: resource quantity and type conditions which must be met to include • access control list (ACL): which consumers may utilize the reserved resources • timeframe: time period over which reservation actually blocks resources
Maui – Allocation properties (2) • Revocation of allocation • support for revocable and irrevocable reservations • eg. strict time constrains on data availability or job completion • default is irrevocable; reservations maintained until timeframe has expired or explicitly removed • Guaranteed completion time of allocations • locked to exact time, guaranteed to complete before certain time or guaranteed to start after given time • scheduler regularly tries to optimize
Maui – Allocation properties (3) • Guaranteed number of attempts to complete a job • don’t attempt to start job until all prerequisites are met • using defer mechanism maui can specify how many times to locate resources for a job before giving up, or putting on hold • Allocation run-to-completion • configure to disable all or subset of preemptions thus guaranteeing a job to complete without interference • Exclusive allocations • request dedicated resources to guarantee exclusive access
Maui – Allocation properties (4) • Malleable Allocations • all aspects can be dynamically modified • if job consumes excessive resources, Maui can preempt or even cancel job depending on the resource utilization policy
Maui - Access to available scheduling info • Access to the tentative scheduler • provide information to all possible availability times • scheduler can request single estimated start time for job • Exclusive control • Maui maintains exclusive control over the execution • Event notification • generalized event management interface; respond immediately to changes in the environment
Maui – Requesting resources • Allocation offers • full contextual information regarding the request and if and how Maui can satisfy this request • Allocation cost or objective information • interface with allocation management systems that assist to assign costs to resource consumption • Advance reservation • allows full control to peers over the scheduling of jobs through time • Requirement for providing maximum allocation time in advance • credential-based walltime limits can be configured based on various criteria
Maui – Requesting resources (2) • Deallocation policy • support for single-step resource allocation requests; create resource allocation valid until job completion • two-phase courtesy reservation; after courtesy is sent, needs to receive a reservation commit; otherwise remove job • Remote co-scheduling • stage remote jobs to a local cluster • Consideration of job dependencies • offer basic job dependency support to block certain job steps until specific prerequisites are met
Maui – Manipulating the allocation execution • Preemption • suspend operations are supported as far as that capability is available in the underlying manager • Checkpointing • ‘checkpoint and terminate’ & ‘checkpoint and continue’ are supported • Migration • support for intra-domain job migration, but no support for QoS, load balancing, or other optimization • Restart • checkpoints used if available
LSF - Introduction • As a low-level scheduler • Load Sharing Facility
LSF – Available-information attributes • Access to the tentative scheduler • often impractical in real-world applications, no support • Exclusive control • LSF executes in user-space, so its control is not exclusive so can only provide necessary measures • Event notification • supplies an event-notification service for high-level schedulers
LSF – Available-information attributes • Access to the tentative scheduler • often impractical in real-world applications, no support • Exclusive control • LSF executes in user-space, so its control is not exclusive so can only provide necessary measures • Event notification • supplies an event-notification service for high-level schedulers
LSF – Requesting resources • Allocation offers • doesn’t expose potential resource allocations • Allocation cost or objective information • unsupported • Advance reservation • provides built-in and Maui-integrated capabilites • Requirement for providing maximum allocation time in advance • high regard
LSF – Requesting resources (2) • Deallocation policy • automatic • Remote co-scheduling • support by a higher-order scheduling instances • Consideration of job dependencies • built-in support for job dependencies by logical expressions based on 15 dependency conditions
LSF – Allocation properties • Revocation of allocation • not needed because of resource shortness, etc. • Guaranteed completion time of allocations
LSF – Allocation properties (2) • Guaranteed number of attempts to complete a job • distinguish between attempts that are execution pre-condition and execution condition with complete flexibility • Allocation run-to-completion • with implicit assumptions that allocations don’t exceed resource limits for example • Exclusive allocations • can dispatch jobs to hosts where no other LSF job is running
LSF – Allocation properties (3) • Malleable Allocations • built-in mechanisms allow allocations to decay consumption over time on a per-resource basis
LSF – Manipulating the allocation execution • Preemption • support since 1995, preempted workloads retain resources • Checkpointing • assuming application supports it, LSF provides interface • Migration • provide mechanism to be done by high-level scheduler • Restart • provides interface
LSF - Conclusion • Supports most attributes of a low-level scheduler that can be exploited by a high-level scheduler
PBS – Introduction • Portable Batch System • Flexible workload management and batch job scheduling system • Covers the entire Grid computing space: security, information, compute and data • Middleware technology that sits between compute-intensive or data-intensive applictions and the network, hardware and OS • All jobs to single virtual pool which is scheduled and distributed on the grid
PBS – Security • Fundamental capabilities are secure authentication and authentication • Internally it makes use of user-name based auth • Support for X.509 Grid standard identification • certificate lifetime (expire/renew) • Identity mapping between sites is handled by a mapping function
PBS - Information • Information management with access to the state of the infrastructure • Collect real-time data on state with job executor daemon process (MOMs) • Easy integration with larger Grid information databases
PBS - Compute • Advance reservation support • check for conflicts • eg. reserve resources for car-crash test including computer cycles, network, database, facility • Cycle harvesting • expand available computing resources by using idle workstations • Peer scheduling • enable a site or sites with different PBS installations to automatically run jobs from eachother • no job will be moved if it cannot run immediately
PBS - Data • Most basic capability of data Grid: file staging • automatic handling of copying files onto execution nodes (stage-in) prior to running job • copying files off execution nodes (stage-out) after job completes • PBS will not run jobs until stage-in is fully done • Support for Globus Toolkit, scp, Gridftp, etc.
PBS – Available-information attributes • Access basic information by typing qstat • Email notification
PBS – Requesting resources • Single resource solution to a job request • Estimated completion time is configurable • absence of this information however hampers peformance (needed by backfilling for example) • Job dependencies • Co-scheduling by simply configuring the queues of the system
PBS – Allocation properties • Revoke any allocation both while job is queued or is running • Also possible preemption by the scheduler; choice of suspension, checkpointing, requeuing, termination • Configurable job completion attempts • Configurable exclusive allocation, etc. • No support for malleable allocation (eg. allows addition or revocation of resources during runtime)
PBS - Manipulating the allocation execution • Support for requeue, restart • On preemption checkpoint generation and migration
Prediction techniques • Problem of scheduling and resource allocation are central to Grid performance • Applications must balance between performance and communication overhead parallelism produces • Grid resources differ widely in performance • A resource allocator must choose right combination of resources from pool while it's constantly changing
Prediction techniques (2) • Categorization into static and dynamic performance characteristics based on speed of change • static: clock speed (CPU) for example • dynamic: CPU load, network throughput
Grid resource performance prediction • For a grid scheduler two characteristics can be exploited to overcome the complexities introduced by the dynamics of Grid performance response • Observable Forecast Accuracy • predictions for future performance measurements can be evaluated by recording the accuracy once the measurements are actually gathered • Near-term Forecasting Epochs • scheduler can make decisions dynamically, just before execution begins. Since accuracy usually degrades into the future, make decision at last possible moment
Prediction – an example (NWS) • Provide 3 fundamental functionalities • Monitoring, Forecasting, Reporting • NWS – Network Weather Service • grid monitoring and forecasting tool designed to support dynamic resource allocation and scheduling • sensor control subsystem • historical data for future performance prediction • multiple reporting interfaces • convenient methodology for replication and caching
Prediction – an example (NWS) (2) • Performance monitoring and forecasting system must be able to execute on all platforms available to the user • written in C; highest portability with standard libs • Two types of monitors (CPU probe) • passive: read measurement gathered through some other means (eg. local OS) eg. UNIX load average • non-intrusive • inaccurate? • active: load own resource and observe performance response • know exact performance • intrusive
Prediction – an example (NWS) (3) • Intrusiveness vs Scalability (Network probe) • probe the network by timing packet travel duration • for more hosts, probe collision will occur, resulting in loss of bandwidth • NWS uses a token-passing method to prevent such problems
Prediction – an example (NWS) (4) • Forecasting • an inherent problem of prediction. • assumptions made on what resources will be when the job runs • in Grid settings, available resource performance can fluctuate dynamically • NWS uses statistical methods to attempt to mechanize and automate forecasting based on historical data
Prediction - Conclusions • Effective resource allocation and scheduling are critical to performance • Immediate performance history data is used to make implicit prediction • To be truly effective the performance gathering system must be robust, portable and non-intrusive • Overhead introduced by perf.gath. system must be carefully controlled • Using fast, robust techniques it is possible to improve accuracy of performance predictions
Improve resource selection with prediction • Run time predictions • statistical analysis that have already run • automatic code analysis or instrumentation • Explanation of two techniques, both using statistical data with information provided to scheduler upon run
Categorization prediction technique • Derive run time predictions from historical information based on previous similar runs • many ways to look at similar applications; application name, user, arguments, submission time, etc. • use of genetic algorithm to identify good templates (eg user+time) for a given workload • use a mean prediction type • results are an average error of 39%