250 likes | 256 Vues
ALICE computing Focus on STEP09 and analysis activities. Latchezar Betev Réunion LCG-France, LAPP Annecy May 19, 2009. Outline. General resources/T2s User analysis and storage on the Grid (LAF is covered by Laurent’s presentation) WMS Software distribution STEP 09
E N D
ALICE computingFocus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP Annecy May 19, 2009
Outline • General resources/T2s • User analysis and storage on the Grid (LAF is covered by Laurent’s presentation) • WMS • Software distribution • STEP 09 • Operations support
French computing centres contribution for ALICE T1 – CCIN2P3 6 T2 + T2 federation (GRIF)
Relative CPU share Last 2 months ~1/2 from T2s!
Relative contribution – T2s • T2 share of the resources is substantial • (globally) T2s provide ~50% of the CPU capacity for ALICE, they should also provide ~50% of the disk capacity • The T0/T1 disk is mostly MSS buffer, therefore completely different function • T2 role in the ALICE computing model • MC production • User analysis • MC+RAW ESDs replicas are kept on T2 disk storage
Focus on analysis • Grid responsiveness for user analysis • ALICE uses common Task Queue for all grid jobs, internal prioritization • Pilot Jobs are indispensible part of the schema • They check the ‘sanity’ of the WN environment (and die if something is wrong) • Pull the ‘top priority’ jobs for execution first
Grid response time - user jobs Type 1 - jobs with little input data (MC) Average waiting time – 11 minutes Average running time – 65 minutes 62% probability of waiting time <5 minutes Type 2 – large input data ESDs/AODs (analysis) Average waiting time – 37 minutes Average running time – 50 minutes Response time proportional to number of replicas
Grid response time – user jobs (2) • Type 1 (MC) can be regarded as ‘maximum Grid responce efficiency’ • Type 2 (ESDs/AODs) can be improved • Trivial - more data replication (not an option – not enough storage capacity) • Analysis train – grouping many analysis tasks in a common data set – improves the efficiency of tasks and resources utilization (CPU/Wall + storage load) • Non-local data access through xrootd global redirector • Inter-site SE cooperation, common file namespace • Off-site access to storage from a job – is that really ‘off limits’?
Storage stability • Critical for the analysis – nothing helps if the storage is down • A site can have half of the WNs off, but not half of the storage servers… • Impossible to know before the client tries to access the data • Unless we allow the off-site access… • ALICE computing model foresees 3 active replicas of all ESDs/AODs
Storage stability (2) • T2 storage stability test under load (user tasks + production)
Storage availability scores • Storage type 1 – average 73.9% • Probability of all three alive (3 replicas) = 41% • This defines the job waiting time and success rate • xrootd native – average 92.8% • Probability of all three alive (3 replicas) = 87% • The above underlines the importance of extremely reliable storage, in the absence of infinite storage resources as compensation
Storage continued • Storage availability/stability remains one of the top priorities for ALICE • For strategic directions see Fabrizio’s talk • All other parameters being equal (protocol access speed and security): ALICE recommends wherever feasible a pure xrootd installation • Ancillary benefit from site admin point of view – no databases to worry about + storage cooperation through global redirector
Workload management: WMS and CREAM • WMS + gLite CE • Relatively long period of understanding the service parameters • Big effort by GRIF experts to provide a French WMS, now with high stability and reliability • Similar installations at other T1s (several at CERN) • Still ‘inherits’ the gLite CE limitations • CREAM CE • The future, gLite CE days are numbered • Strategic direction of the WLCG
Workload management (2) • CREAM CE (cont’d) • ALICE requires a CREAM CE at every centre, to be deployed before start of data taking • Much better scalability, shown by extensive tests • Hands-off operation after initial (still time-consuming) installation • Excellent support by CNAF developers
Software deployment • General need for improvement of software deployment tools • Software distribution is a ‘Class 1 service’ – shared software area WNs and VO-box • Always a (security related) point for critique • Heterogeneous queues: mixed 32- and 64-bit hosts, various Linux flavors, other system library differences, hence need for various application software versions • In addition, the shared area (typically NFS) • Is often overloaded • Single point of failure • One ‘bad installation’ is fatal for the entire site operation
Packaging & size • Combine all the required grid packages into distributions • Full installation: 155 MB, mysql, ldap, perl, java... • VO-box: 122 MB, monitor, perl, interfaces, • User: 55 MB, API client, gsoap, xrootd • Worker node: 34 MB, min perl, openssl, xrootd • Experiment software: • AliRoot: 160 MB • ROOT: 60 MB • GEANT3: 25MB 300 MB to run jobs 16
Use existing technology More than 150 million users!! http://bittorrent.com 17
Torrent technology Download & seed alitorrent.cern.ch Download & seed Tracker Download & seed Site B No inter-site seeding Seeder Site A Download & seed Download & seed Download & seed
Application software path • Torrent files created from the build system • One seeder at CERN • Standard tracker and seeder. • Get torrent client from ALICE web server • Aria2c • Download the files and install them • Seed the files while the job runs 19
ALICE activities calendar – STEP’09 STEP’09 Cosmics data taking WMS and CREAM CE Reprocessing Pass 2 Analysis train Replication of RAW Data taking RAW deletion 12 01 02 03 04 05 06 07 08 09 10 11 12 2008 2009
ALICE STEP’09 activities • Replication T0->T1 • Planned together with Cosmics data taking, must be moved forward, or • We can repeat the last year’s exercise, same rates (~100MB/sec), same destinations • Re-processing with data recalls from tape at T1 • Highly desirable exercise, the data is already at the T1 MSS • CCIN2P3 MSS/xrootd setup is being organized, we can export fresh RAW data into the buffer
ALICE STEP’09 activities (2) • Non-Grid activity – transfer rate tests (up to 1.25GB/sec) DAQ@P2 to CASTOR • Validation of new CASTOR and xrootd transfer protocol for RAW • Will go on just before or overlap with STEP’09 • CASTOR v.2.1.8 already deployed • The transfer rate test will be coupled with first pass reco@T0 and second pass reco@T1
Grid operation – site support • We need more help from the regional experts and site administrators • Proactively looking at local services problems • With data taking around the corner, the pressure to identify and fix problem will be mounting • STEP09 will hopefully demonstrate this (albeit for a short time) • The data taking will be 9 months of uninterrupted operation!
Grid operation – site support • Two day training session on 26/27 May • VO-box setup and operation (gLite and AliEn services) • Common problems and solutions • Monitoring • Storage • The training will be also on EVO • All regional experts and site administrators are strongly encouraged to participate • More than 40 people have registered already
Summary • Grid operation/middleware • The main focus is on reliable storage – not yet there • After initial ‘teething’ pains, the WMS is under control • CREAM CE must be everywhere and operational before data taking • In general, everyone needs services which ‘just run’, with minimal intervention and debugging • Grid operation/expert support • STEP’09 is the last ‘large’ exercise before data taking • Still, it will show only if there are big holes • The long LHC run will put extraordinary load on all experts • Training is organized for all – current status of software and procedures