260 likes | 341 Vues
Learn how to submit jobs on the grid using the EGEE Resource Broker, monitor job statuses, retrieve outputs, and utilize APIs for seamless job management. Understand JDL file creation and essential grid services.
E N D
EGEE MiddlewareThe Resource Broker EGEE project members
Contents • Short review of concepts • Requirements of the applications communities • Overview of the main grid services • A closer look EGEE ResourceBroker
Input “sandbox” DataSets info Output “sandbox” SE & CE info Job Submit Event Job Query Publish Job Status Storage Element Current production middleware LCG FileCatalogue (LFC) “User interface” Information Service Resource Broker Author. &Authen. Input “sandbox” + Broker Info Output “sandbox” Logging & Book-keeping Computing Element Job Status EGEE ResourceBroker
Building on basic tools and Information Service Example JDL file Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; … Submit job to grid via the “resource broker”, edg_job_submit my.jdl EGEE ResourceBroker
The user’s interface to the Grid Command-line interface to Proxy server Job operations To submit a job Monitor its status Retrieve output Data operations Upload file to SE Create replica Discover replicas Other grid services Also C++ and Java APIs To run a job user creates a JDL (Job Description Language) file UI JDL User Interface node EGEE ResourceBroker
Building on basic tools and Information Service Submit job to grid via the “resource broker (RB)”, edg_job_submit my.jdlReturns a “job-id” used to monitor job, retrieve output Example JDL file Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:/grid/VOname/mydir/testbed0.00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”; EGEE ResourceBroker
Building on basic tools and Information Service Submit job to grid via the “resource broker”, edg_job_submit my.jdlReturns a “job-id” used to monitor job, retrieve output Example JDL file Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:/grid/VOname/mydir/testbed0-00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”; lfn: logical file name RB uses Catalog to find replica locations EGEE ResourceBroker
Building on basic tools and Information Service Submit job to grid via the “resource broker”, edg_job_submit my.jdlReturns a “job-id” used to monitor job, retrieve output Example JDL file Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:testbed0-00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”; Uses BDII Information System EGEE ResourceBroker
Job submission UI RB node LFC Network Server Workload Manager Inform. Service Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job Status UI RB node LFC Network Server Workload Manager Inform. Service UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) WMS: Workload Management System Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000; Rank = other.GlueCEStateFreeCPUs; Job Status UI RB node submitted Replica Location Server Network Server Workload Manager Inform. Service Job Contr. - CondorG Job Description Language (JDL) to specify job characteristics and requirements CE characts & status SE characts & status Computing Element Storage Element
NS: network daemon responsible for accepting incoming requests submitted waiting UI RB node Job Status LFC Network Server Job Input Sandbox files Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI RB node Job Status LFC Network Server Job Workload Manager Inform. Service RB storage WM: responsible to take the appropriate actions to satisfy the request Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status LFC Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage Where must this job be executed ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status LFC Network Server Matchmaker: responsible to find the “best” CE where to submit a job Match- Maker/ Broker Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status Where are (which SEs) the needed data ? LFC Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage What is the status of the Grid ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status LFC Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage CE choice Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI RB node Job Status LFC Network Server Workload Manager Inform. Service RB storage Job Adapter Job Contr. - CondorG CE characts & status JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.) SE characts & status Computing Element Storage Element
Job submission submitted waiting UI ready RB node Job Status LFC Network Server Workload Manager Inform. Service RB storage Job Job Contr. - CondorG JC: responsible for the actual job management operations (done via CondorG) CE characts & status SE characts & status Computing Element Storage Element
Job submission submitted waiting UI ready scheduled RB node Job Status LFC Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox files CE characts & status SE characts & status Job Computing Element Storage Element
submitted waiting UI ready scheduled running Job RB node Job Status LFC Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox “Grid enabled” data transfers/ accesses Computing Element Storage Element
submitted waiting UI ready scheduled running done RB node Job Status LFC Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox files Computing Element Storage Element
submitted waiting UI ready scheduled running done RB node Job Status edg-job-get-output <dg-job-id> LFC Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox Computing Element Storage Element
UI RB node Job Status submitted LFC Network Server waiting ready Output Sandbox files Workload Manager Inform. Service RB storage scheduled Job Contr. - CondorG running done cleared Computing Element Storage Element
Job monitoring UI RB node edg-job-status <dg-job-id> edg-job-get-logging-info <dg-job-id> Network Server LB: receives and stores job events; processes corresponding job status Workload Manager Job status Logging & Bookkeeping Job Contr. - CondorG Log Monitor Log of job events LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB Computing Element
Possible job states EGEE ResourceBroker