1 / 40

Job scheduling: Workload Management System and Job Submission

Job scheduling: Workload Management System and Job Submission. Tiziana.Ferrari@cnaf.infn.it INFN – CNAF Corso di Laurea specialistica in Informatica Anno Acc. 2005/2006. Application. Resource. Connectivity. Fabric. CE and Workload Management System. Grid Architecture. Workload

amaya-weiss
Télécharger la présentation

Job scheduling: Workload Management System and Job Submission

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Job scheduling:Workload Management System andJob Submission Tiziana.Ferrari@cnaf.infn.it INFN – CNAF Corso di Laurea specialistica in Informatica Anno Acc. 2005/2006 Job scheduling

  2. Application Resource Connectivity Fabric CE and Workload Management System Grid Architecture Workload Management System Collective Computing Element Job scheduling

  3. Workload Management System (WMS) • The user interacts with Grid via a Workload Management System for job submission. • The Goal of WMS is the distributed job scheduling and resource management in a Grid environment. • What does it allow Grid users to do? • Find the list of resources suitable to run a specific job • Submit a job/DAG for execution on a remote Computing Element • Check the status of a submitted job/DAG • Cancel one or more submitted jobs/DAGs • Retrieve the output files of a completed job/DAG (output sandbox) • Retrieve and display bookkeeping information about submitted jobs/DAGs • Retrieve and display logging information about submitted jobs/DAGs • Retrieve checkpoint states of a submitted checkpointable job • Start a local listener for an interactive job • The WMS tries to optimize the usage of resources Job scheduling

  4. Submission • For a computation job there are two main types of request: submission and cancellation. In particular the meaning of the submission request is to pass the responsibility of the job to the WM. • The WM will then pass the job to an appropriate CE for execution, taking into account the requirements and the preferences expressed in the job description. • The decision of which resource should be used is the outcome of a matchmaking process between the submission requests and the available resources. • The availability of resources for a particular task depends not only on the state of the resources, but also on the utilization policies that the resource administrators and/or the administrator of the VO the user belongs to have put in place. Job scheduling

  5. WMS Components • WMS is currently composed of the following parts: • User Interface (UI) : access point for the user to the WMS; this is the place where the user interacts with WMS • Resource Broker (RB) : the broker of GRID resources, responsible to find the “best” resources where to submit jobs • Job Submission Service (JSS) : provides a reliable submission system, i.e. delivers jobs to the computing elements chosen by the resource broker, resubmission is attempted in case of failure according to the job owner request. • Information cache : a repository of resource information that is available in read only mode to the matchmaking engine and whose update is the result of either the arrival of notifications or active polling of resources or some arbitrary combination of both. • Logging and Bookkeeping services (LB) : service provides support for the job monitoring functionality: it stores logging and bookkeeping information concerning events generated by the various components of the WMS. Using this information, the LB service keeps a state machine view of each job. • Task Queue • Proxy renewal: a Proxy Renewal Service is available to assure that, for all the lifetime of a job, a valid user proxy exists within the WMS, and this proxy renewal service relies on theMyProxy service for renewing credentials associated to the request. Job scheduling

  6. Task queue • It gives the possibility to keep a submission request for a while if no resources are immediately available that match the job requirements. • Non-matching requests will be retried either periodically (in an eager scheduling approach) or as soon as notifications of available resources appear in the ISM (in a lazy scheduling – i.e. pull - approach). • Alternatively such situations could only lead to an immediate abort of the job for lack of a matching resource. Job scheduling

  7. Proxy certificate • A job gets associated a valid proxy certificate (the submitting user’s one) when it is submitted by the WMS-User Interface. • Validity of such a certificate is set by default to 12 hours unless a longer validity is explicitly requested by the user when generating the proxy. Problems could occur if the job spends on CE (in a queue or running) more time than lifetime of its proxy certificate. • In order to submit long-running jobs, users can either generate proxy credentials with an appropriate lifetime or (more safely) rely on the features of the MyProxy server. The underlying idea is that the user registers in a MyProxy server a valid long-term certificate proxy that will be used by the WMS to perform a periodic credential renewal for the submitted job; in this way the user is no longer obliged to create very long lifetime proxies when submitting jobs lasting for a great amount of time. • The MyProxy credential repository system consists of a server and a set of client tools that can be used to delegate and retrieve credentials to and from a server. Normally, a user would • 1. start by using the myproxy_init client program along with the permanent credentials necessary to contact the server • 2. delegate a set of proxy credentials to the server along with authentication information and retrievalrestrictions. • Commands: myproxy-init / myproxy-info /myproxy-destroy/ myproxy-get-delegation /myproxy-change-pass-phrase Job scheduling

  8. Job Preparation • Information to be specified • Job characteristics (e.g. executable, stdin, etc.) • Requirements and Preferences of the computing system (e.g. CPU speed, multi-processor machines, …) • Software dependencies (i.e. needed software to be installed already on machines which will eventually execute the job) • Job Data requirements(e.g. input data, output storage element, etc.) • Optimizationcreteria • All this is specified using a Job Description Language (JDL) Job scheduling

  9. Job Description Language (JDL) 1/5 • Based upon Condor’s CLASSified ADvertisement language (ClassAd): a simple expression-based language to specify both resources and requests. • ClassAd is a fully extensible language • ClassAd is constructed with the classad construction operator [] • It is a sequence of attributes separated by semi-colons. • An attribute is a pair (key, value), where value can be a Boolean, an Integer, a list of strings, … • <attribute> = <value>; • e.g. [ attr1 = value1; attr2 = value2; ... attrn = valuen; ] So, the JDL allows to define a set of attribute, the WMS takes into account when making its scheduling decision Job scheduling

  10. Job Description Language (JDL) 2/5 • The supported attributes are grouped in two categories: • Job (Attributes) Define the job itself • Resources • Taken into account by the RB for carrying out the matchmaking algorithm • Computing Resource (Attributes) Used to build expressions of Requirements and/or Rank attributes by the user Have to be prefixed with “other.” • Data and Storage resources (Attributes) Input data to process, SE where to store output data, protocols spoken by application when accessing SEs Job scheduling

  11. Job Description Language (JDL): relevant attributes 3/5 • Executable (mandatory) • The command name • Arguments (optional) • Job command line arguments • StdInput, StdOutput, StdErr (optional) • Standard input/output/error of the job • Environment (optional) • List of environment settings • InputSandbox (optional) • List of files on the UI local disk needed by the job for running • The listed files will automatically staged to the remote resource • OutputSandbox (optional) • List of files, generated by the job, which have to be retrieved Job scheduling

  12. Job Description Language (JDL): relevant attributes 4/5 • Requirements • Job requirements on computing resources • Specified using attributes of resources published in the Information Service • If not specified, default value defined in UI configuration file is considered • Default: other.Active (the resource has to be able to accept jobs) • Rank • Expresses preference (how to rank resources that have already met the Requirements expression) • Specified using attributes of resources published in the Information Service • If not specified, default value defined in the UI configuration file is considered • Default: -other.EstimatedTraversalTime (the lowest estimated traversal time) Job scheduling

  13. Job Description Language (JDL): “data” attributes 5/5 • InputData (optional) • Refers to data used as input by the job: these data are published in the Replica Catalog and stored in the SEs) • PFNs and/or LFNs • ReplicaCatalog (mandatory if InputData has been specified with at least one Logical File Name) • The Replica Catalog Identifier • DataAccessProtocol (mandatory if InputData has been specified) • The protocol or the list of protocols which the application is able to speak with for accessing InputData on a given SE • OutputSE (optional) • The Uniform Resource Identifier of the output SE • RB uses it to choose a CE that is compatible with the job and is close to SE Job scheduling

  14. Examples JDL File (1) Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “LF:testbed0-00019”; ReplicaCatalog = “ldap://sunlab2g.cnaf.infn.it:2010/ \ lc=test, rc=WP2 INFN Test, dc=infn, dc=it”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.MaxCpuTime”; Job scheduling

  15. Examples JDL File (2) • Such a JDL would make the myexe executable be transferred on a remote CE whose queue is managed by the PBS batch system and be run taking the myinput.txt file (also copied form the UI node) as input. Job scheduling

  16. DAG description • where n1.jdl, n2.jdl and n3.jdl are in turn job descriptions representing the nodes of the DAG and the dependencies attributes states that nodeB and nodeC cannot start before nodeA has been successfully executed Job scheduling

  17. Input/output sandbox • It is important to note that the input and output sandboxes are intended for relatively small files (few megabytes) like scripts, standard input, and standard output streams. • If you are using large input files or generating large output files, you should instead directly read from or write to a storage element. • As each submitting user is assigned by the WMS with a limited quota on the WMS machine disk, abuse of the input and output sandboxes will shortly make the quota fill-up and the WMS not accept further jobs submission for the given user. • Input sandbox: • let’s suppose to have a job that needs for the execution a certain set of files having a small size and available on the submitting machine. Let’s also suppose that for performance reasons it is preferable not going through the data transfer services for the staging of these files on the executing node. Then the user can use the InputSandbox attribute to specify the files that have to be copied from the submitting machine to the execution CE via the WMS. • Output sandbox: • For the standard output and error of the job the user shall instead always specify just file names (without any directory path) through the StdOutput and StdError JDL attributes. To have them copied back on the WMS-UI machine it suffices to list them in the OutputSandbox and use after job completion the job-output command described later in this document. Job scheduling

  18. Requirement and rank • The parameters Requirements and Rank control the resource matching for the job. • The expression given for the requirements specifies the constraints necessary for a job to run. • If more than one resource matches the job requirements, then the rank is used to determine which is the most desirable resource i.e. the one to which the job is submitted (the higher the rank value the better is the resource). • Both, the Requirements and the rank attributes, can be arbitrary expressions which use the parameters published by the resources in the Information System • Examples: • to express that a job requires at least 25 minutes of CPU time and 100 minutes of real time, the expression is: Requirements = other.GlueCEPolicyMaxCPUTime >= 1500 && other.GlueCEPolicyMaxWallClockTime >= 6000; • GlueHostApplicationSoftwareRunTimeEnvironment is usually used to describe application software packages which are installed on a site. For example: Requirements = Member(other.GlueHostApplicationSoftwareRunTimeEnvironment ,"ALICE-3.07.01"); • Rank = - other.GlueCEStateEstimatedResponseTime • Rank = other.GlueCEStateFreeCPUs ; Job scheduling

  19. WMS main commands (1/2) • job-list-match: Displays the list of identifiers of the resources (and the corresponding ranks - if requested) on which the user is authorized and satisfying the job requirements included in the JDL. This only works for jobs; for DAGs you have to issue this commands on the single nodes JDLs. • job-submit submits a job/DAG to the grid. It requires a JDL file as input and returns a job/DAG Identifier. • job-status This command prints the status of a job/DAG previously submitted using glite-job-submit. • The job status request is sent to the LB (Logging and Bookkeeping service) that provides the requested information. • When issued for a DAG it provides the status information for the DAG itself and all of its nodes. It is also possible to retrieve the status of individual nodes of a DAG simply passing their own identifiers to the command. • The LB service using the job/DAG related events sent by each WMS component handling the request, keeps a state machine view of each job/DAG. Job scheduling

  20. WMS main commands (2/2) • job-output The glite-job-output command can be used to retrieve the output files of a job/DAG that has been submitted with a job description file including the Output Sandbox attribute. • After the submission, when the job/DAG has terminated its execution, the user can upload the files generated by the job/DAG and temporarily stored on the Resource Broker machine as specified by the OutputSandbox attribute, issuing the job-output with as input the ID returned by the job submission command • As a DAG does not have its own output sandbox, when the command is issues for such a request retrieves the output sandboxes of all the DAG nodes. • job-cancel This command cancels a job previously submitted using glite-job-submit. Before cancellation, it prompts the user for confirmation. • It is not allowed to issue a cancel request for a node of a DAG: you have to cancel the whole DAG using the provided handle instead. Job scheduling

  21. Job States • SUBMITTED: the user has submitted the job via UI • WAITING. the RB has received the job • READY: A CE, which matches job requirements, has been selected, and the job is transferred to the JSS • SCHEDULED: the JSS has sent the job to the CE • RUNNING: the job is running on the CE • DONE: this state has different meanings: • DONE (ok) : the execution has terminated on the CE (WN) with success • DONE (failure) : the execution has terminated on the CE (WN) with some problems • DONE (cancelled) : the job has been cancelled with success • OUTPUTREADY: the output sandbox is ready to be retrieved by the user • reflects the time difference between end of computation on CE and the moment RB got necessary notification about job termination. • CLEARED: the user has retrieved all output files successfully, and the job bookkeeping information is purged some time after the job enters in this state. • ABORTED: the job has failed • The job may fail for several reasons one of them is external to its execution (no resource found). Job scheduling

  22. State Diagram SUBMITTED WAITING READY SCHEDULED ABORTED DONE(cancelled) RUNNING DONE(failed) DONE(ok) OUTPUTREADY CLEARED Job scheduling

  23. Job identifier • The Job (and DAG) Identifiers produced by the workload management software are of the form: https://edt003.cnaf.infn.it:9000/NyIYrqE\_a8igk4f0CLXNKA • The first part of the Id (https://edt003.cnaf.infn.it:9000 in the example above) is the endpoint URL of the LB server holding the job/DAG logging and bookkeeping information and this allows the WMS to know which LB server has to be contacted for monitoring a given job/DAG. • The second part (NyIYrqE_a8igk4f0CLXNKA) generated by the WMS-UI taking into account some client local information ensures instead grid-wide uniqueness of the identifier. Job scheduling

  24. UI JDL Job Submission Scenario Replica Catalogue (RC) Information Service (IS) Resource Broker (RB) Storage Element (SE) Logging & Bookkeeping (LB) Job Submission Service (JSS) Compute Element CE) Job scheduling

  25. Input Sandbox UI JDL Job Submit Event A Job Submission Example Replica Catalogue (RC) Information Service (IS) Job Status submitted Resource Broker (RB) Storage Element (SE) Logging & Bookkeeping (LB) Job Submission Service (JSS) Compute Element (CE) Job scheduling

  26. UI JDL waiting A Job Submission Example Replica Catalogue (RC) Information Service (IS) Job Status submitted Resource Broker (RB) Storage Element (SE) Logging & Bookkeeping (LB) Job Submission Service (JSS) Compute Element (CE) Job scheduling

  27. UI JDL ready A Job Submission Example Replica Catalogue (RC) Information Service (IS) Job Status submitted waiting Resource Broker (RB) Storage Element (SE) Logging & Bookkeeping (LB) Job Submission Service (JSS) Compute Element (CE) Job scheduling

  28. UI JDL scheduled BrokerInfo A Job Submission Example Replica Catalogue (RC) Information Service (IS) Job Status submitted waiting ready Resource Broker (RB) Storage Element (SE) Logging & Bookkeeping (LB) Job Submission Service (JSS) Compute Element (CE) Job scheduling

  29. UI JDL Input Sandbox running Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting ready scheduled Resource Broker (RB) Storage Element (SE) Logging & Bookkeeping (LB) Job Submission Service (JSS) Compute Element (CE) Job scheduling

  30. UI JDL running Job Status Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting ready scheduled Resource Broker (RB) Storage Element (SE) Logging & Bookkeeping (LB) Job Submission Service (JSS) Computing Element (CE) Job scheduling

  31. UI JDL done Job Status Job Status A Job Submission Example submitted Replica Catalogue Information Service waiting ready scheduled Resource Broker running Storage Element Logging & Bookkeeping Job Submission Service Compute Element Job scheduling

  32. UI JDL outputready Output Sandbox Job Status Job Status A Job Submission Example submitted Replica Catalogue Information Service waiting ready scheduled Resource Broker running Storage Element done Logging & Bookkeeping Job Submission Service Compute Element Job scheduling

  33. UI JDL Output Sandbox cleared Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting ready scheduled Resource Broker (RB) running Storage Element (SE) done Logging & Bookkeeping (LB) Job Submission Service (JS) outputready Compute Element (CE) Job scheduling

  34. Job resubmission • If something goes wrong, the RB tries to reschedule and resubmit the job (possibly to a different resource) • Maximum number of resubmissions (considering all the resources matching the requirements): min(RetryCount, RB_submission_retries) • RetryCount: JDL attribute • RB_submission_retries: attribute in the RB configuration file • E.g., to disable job resubmission for a particular job: RetryCount=0; in the JDL file Job scheduling

  35. User Interface configuration file • Can be set if user is not happy with default one • Most relevant attributes: • RB(s) • When submitting a job, the first specified RB is tried, if the operation fails the second one is considered, etc. • LBserver(s) • The LB to be used for a job is chosen by the RB • So when a dg-job-status <dg-jobid> is issued, the LB to contact is specified in the dg-jobid • This list specifies the LB(s) that must be contacted when issuing a dg-job-status –all / dg-job-get-logging-info –all (to have information for all the jobs belonging to that user) • Default JDL Requirements • other.active • Default JDL Rank • - other.EstimatedTraversalTime Job scheduling

  36. Job Submission Phases • User logs in on the User Interface • User issues a grid-proxy-init and enters his certificate’s password, getting a valid proxy certificate • User configures the Job Description Language file describing the submission profile • User issues a: job-submit HelloWorld.jdl • and gets back from the system a unique Job Identifier (JobId) • User issues a: job-status JobId • to get logging information about the current status of his Job • When the “OutputReady” status is reached, the user can issue a job-get-output JobId to retrieve the output generated. The system returns the name of the temporary directory where the job output can be found on the User Interface machine. Job scheduling

  37. Scheduling (1/3) • Scheduling is the core functionality of the WMS. It has to find the best suitable computing resource (CE) where the job will be executed. • It interacts with Data Management service and Information Service. They supply the job scheduler with all the information required for the resolution of the matches. • The CE chosen by the job scheduler has to match the job requirements (e.g. runtime environment, data access requirements, and so on). • If two or more CEs satisfy all the requirements, the one with the best Rank is chosen. • The job scheduler has to deal with three possible scenarios Job scheduling

  38. Scheduling (2/3) Scenario 1: Direct Job Submission • Job is scheduled on a given CE (specified in the dg-job-submit command via –r option) • RB doesn’t perform any matchmaking algorithm Scenario 2: Job Submission without data-accessRequirements • Neither CEnor input data are specified. • The scheduler starts the matchmaking algorithm, which consists of two phases: • Requirements check (RB contacts the local cache or the Information Service to check which CEs satisfy all the requirements) • If more than one CE satisfies the job requirements, the CE with the best rank is chosen by the scheduler Job scheduling

  39. Scheduling (3/3) Scenario 3: Job Submission with also data-access Requirements • CE is not specified in the JDL • The scheduler interacts with Data Management service to find out the most suitable CE taking into account also the SEs where both input data sets are physically stored and output data sets should be staged on completion of job execution • The scheduler strategy consists of submitting jobs close to data • The main two phases of the match making algorithm remain unchanged: • Requirements check • Rank computation • What changes with respect to the second scenario? Now, the scheduler restricts the search to the CEs that satisfy the data-access requirements (for example, which are close to data, i.e. part of the same Grid site) Job scheduling

  40. References • EGEE User’s Guide: WMS Service, EGEE-JRA1-TEC-572489 (https://edms.cern.ch/document/572489/1), see Section 1 and 2. • EGEE Middleware Architecture and planning, EU Deliverable DJRA1.4, EGEE-DJRA1.1-594698-v1.0, (https://edms.cern.ch/document/594698/), see Section 8 • ClassAd (https://www.cs.wisc.edu/condor/classad) Job scheduling

More Related