1 / 5

Improving ARC backends: Condor and SGE/GE LRMS interface

Improving ARC backends: Condor and SGE/GE LRMS interface. Adrian Taga University of Oslo. LRMS bakends in ARC. supported: PBS&variants, LSF, Condor, SGE, LL perl/sh wrappers around the command line interface of LRMS advantage: easy to modify No clear structure Scripts for:

phong
Télécharger la présentation

Improving ARC backends: Condor and SGE/GE LRMS interface

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving ARC backends: Condor and SGE/GE LRMS interface Adrian Taga University of Oslo

  2. LRMS bakends in ARC • supported: PBS&variants, LSF, Condor, SGE, LL • perl/sh wrappers around the command line interface of LRMS • advantage: easy to modify • No clear structure Scripts for: • Populating InfoSystem • Job control (submit, get status, kill)

  3. Queues in Condor Why have queues? • Inhomogeneous clusters Problem: Condor has no notion of queues Solution: use ClassAd mechanism to partition the cluster. Example: Queues defined based on memory size [queue/large] requirements="(Opsys == "linux" && Arch == "intel"" requirements=" && (Disk > 30000000 && Memory > 2000)" [queue/small] requirements="(Opsys == "linux" && Arch == "intel”" requirements=" && (Disk > 30000000 && Memory <= 2000 && Memory > 1000)"

  4. Error reporting Many LRMS backends • Inconsistent reporting of LRMS errors • Clear diagnosis is needed for ATLAS prodsys Scenario: job exceeded resource limits • PBS & Co – natively reports • WallTime, CpuTime, Memory limit exceeded • Exitcode < 256 from job, • Exitcode ≥ 256 resource limit hit • Condor, SGE - provide no clear diagnosis, workarounds needed

  5. Thank you

More Related