1 / 8

ATLAS DC2 seen from Prague Tier2 center - some remarks

ATLAS DC2 seen from Prague Tier2 center - some remarks. Atlas sw workshop September 2004. Hardware in Prague available for ATLAS. Golias: 32 dual CPU nodes PIII1.13GHz, 1GB RAM upgraded since July: + 49 dual CPU Xeon 3.06 GHz, 2 GB RAM (WN) 3TB disk space reserved for atlas

carney
Télécharger la présentation

ATLAS DC2 seen from Prague Tier2 center - some remarks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004 chudoba@fzu.cz

  2. Hardware in Prague available for ATLAS • Golias: • 32 dual CPU nodes PIII1.13GHz, 1GB RAM • upgraded since July: + 49 dual CPU Xeon 3.06 GHz, 2 GB RAM (WN) • 3TB disk space reserved for atlas • PBSPro batch system • lcgatlasprod queue reserved for atlas VO members, high priority • Skurut: • 16 dual CPU nodes PIII700MHz, 1GB RAM • OpenPBS batch system • queues: lcgpbs-short, long, infinite, used mainly by atlas • 2 independent CEs in LCG2 chudoba@fzu.cz

  3. Jobs waiting for input or output replication, sometimes hanging ‘forever’: Example: Job Id Queue User Node CPUTime WallTime 34031.golias lcgatlasprod atlas001 golias30 03:09:28 43:30:39 34035.golias lcgatlasprod atlas002 golias03 04:17:38 43:19:18 34113.golias lcgatlasprod atlas002 golias10 03:00:41 41:52:11 34127.golias lcgatlasprod atlas001 golias11 04:19:11 41:21:46 34583.golias lcgatlasprod atlassgm goliasx56 00:00:17 26:01:14 ... Not yet cured: running jobs, 20.9.2004: Job Id Queue User Node CPUTime WallTime 55162.golias lcgatlasprod atlassgm goliasx42 00:00:03 102:19:45 58528.golias lcgatlasprod atlas001 golias02 11:22:40 11:33:13 58529.golias lcgatlasprod atlas001 golias03 00:00:16 11:33:49 ... Usually such long jobs are killed either by administrator or by PBS time limit chudoba@fzu.cz

  4. July 1 – September 21 number of jobs in DQ: 1349 done 1231 failed = 2580 jobs number of jobs in DQ:362 done572 failed = 934 jobs chudoba@fzu.cz

  5. Job distribution • almost always not enough jobs on GOLIAS ATLAS • SKURUT usage much better chudoba@fzu.cz

  6. Memory usage atlas jobs on GOLIAS, july – september (part) 2004 chudoba@fzu.cz

  7. CPU Time Xeon 3.06GHz PIII1.13GHz hours hours PIII700MHz queue limit: 48 hours later changed to 72 hours chudoba@fzu.cz hours

  8. Miscellaneous • no job name in the local batch system – difficult to identify • no (?) documentation where to look for log files, which logs are relevant • lost jobs due to CPU time limit - no warning • lost jobs due to one missconfigured node - spotted from local logs and by Simone too • some jobs loop forever – where to send this information? chudoba@fzu.cz

More Related