Enhancing Distributed Analysis with GANGA on ATLAS Tier-2 Infrastructure

Distributed Analysis Experience using GANGA on an ATLAS Tier2 infrastructure F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference, 4 December 2007, Catania, Italy

Outline: • ATLAS Grid computing and facilities • Distributed Analysis model in ATLAS • GANGA overview • IFIC Tier-2 infrastructure • Resources and services • Data transfers • Job priority • A demonstration of using GANGA • Conclusion

ATLAS Grid computing • Atlas computing based on hierarchical architecture of Tiered • Atlas computing operates uniformly on heterogeneous grid environment based on three grid infrastructures • Grids have different middle-ware, replica catalogs and tools to submit jobs CERN Tier-1s Tier-2s

ATLAS facilities • Event Filter Farm at CERN • • Located near the Experiment, assembles data into a stream to the Tier-0 • • Tier-0 at CERN • • Derive first pass calibration with 24 hours • • Reconstructed of the data keeping up with data taking • • Tier-1s distributed worldwide (10 centers) • • Reprocessing of full data with improved calibrations 2 months after data taking • • Managed Tape Access: RAW, ESD • Disk Access: AOD, Fraction of ESD • • Tier2s distributed worldwide ( 40 + centers) • • MC Simulation, producing ESD, AOD • • User physics analysis, Disk Store (AOD) • • CERN Analysis Facility • • Primary purpose: Calibration • • Limited access to ESD and RAW • • Tier-3s distributed worldwide . Physics analysis

Distributed Analysis model in ATLAS • The Distributed Analysis model is based on the ATLAS computing model: • Data for analysis will be available distributed on all Tier-1 and Tier-2 centers • Tier-2s are open for analysis jobs • The computing model foresees 50 % of grid resources to be allocated for analysis • User jobs are sent to data • large input datasets (100 GB up to several TB) • Results must be made available to the user (NTuples or similar) • Data is added with meta-data and bookkeeping in catalogs

Distributed Analysis model in ATLAS • ATLAS strategy is based on making use of the whole resources • Solution must deal with the challenge of the heterogeneous grid infrastructures • NorduGrid: • backend for ARC submission is integrated • OSG/Panda: • Recently integrated a backend for Panda • GANGA front-end supports all ATLAS Grid flavors

The idea behind GANGA • The naive idea of submitting jobs to Grid assume the following steps: • Prepare the “Job Description Language” file for job configuration • Find suitable Athena software application • Locate the datasets on different storage elements • Job splitting, monitoring and book-keeping • GANGA combines several different components providing a front-end client for interacting with grid infrastructures • It is a user-friendly job definition and management tool • Allows simple switching between testing on a local batch system and large-scale data processing on Grid distributed resources

GANGA features • GANGA is based on a simple, but flexible, job abstraction • A job is constructed from a set of building blocks, not all required for each job • Support for several applications: • Generic Executable • ATLAS Athena software • Root • Support for several back-ends: • LCG/gLite Resource Broker • OSG/PANDA • Nordugrid/ARC Middleware • Batch (LSF, PBS, etc)

Resources and services • Equipments: (Santiago Gonzalez de la Hoz´ talk) • CPU  132 KSi2k • Disk  34TB Disk • Tape  tape robot of 140 TB • Services: • 2 SRM Interface, 2 CE, 4 UI, 1 BDII, 1 RB • 1 PROXY,1 MON, 2 GridFTP, 2 QUATTOR • QUATTOR: install and configure the resources • Network: • Connectivity from the site to network • is about one Gpbs • The facility serves the dual purpose of • producing simulated data and analysis data Racks Robot

Data Transfers (I) • Data management is a crucially aspect of Distributed Analysis • Managed by DDM system  known as DQ2 • Data is being distributed to Tier-1 and Tier-2 centers for analysis • Through several exercises organized by the ATLAS collaboration • IFIC is participating in this effort with the aim to: • Have datasets available at site IFIC for analysis • Test the functionality and performance of data transfer mechanisms • IFIC contribution in the data transfer activities is the following: • SC4 (Service Challenge 4; October 2006) • Functional Tests (August 2007) • M4: Comics' Run: August 23 – September 3 • M5 :Comics' Run scheduled for October 16-23

Data Transfers (II) • The datasets exported to IFIC is store in the Lustre based Storage Element • They are available in distributed manner through: • Registering in the Local LFC catalog • Archiving thought-out the whole grid using the DDM central catalog • In addition: • information on the stored datasets is provided by the IFIC web page: • http://ific.uv.es/atlas-t2-es/ific/main.html

Job Priority • Analysis jobs perform in parallel to the production jobs • Need a mechanism to steer the resource consumption of the ATLAS community • Job Priority • Objective: allow enforcement of job priorities based on VOMS groups/roles, using the Priorities Working Group schema • Development and deployment done at IFIC : • Define local mappings for groups/roles and Fair Share (FS) • atlas:atlas 50 % of all Atlas VO users • atlb:/atlas/Role=production  50 % of Atlas production activity • atlc:/atlas/Role=software  no FS (but more priority, only 1 job at a time) • atld:/atlas/Role=lcgadmin no FS (but more priority, only 1 job at a time)

Introduction • Objective • Testing IFIC Tier-2 analysis facility • Producing Top N-tuples from large ttbar dataset (6M events,~217 GB ) • Benefit to perform Top analysis studies • Requirement: • Fast and easy large scale production  grid environment • Runs everywhere  use ATLAS resources • Easy user interface • Hide the grid infrastructure • Our setup: • GANGA version 4.4.2 • Athena 12.0.6, TopView-00-12-13-03 • EventView Group Area 12.0.6.8 GANGA

Observations and issues • Before sending jobs to Grid some operations have been done: • Find out where the dataset is complete (this dataset has 2383 files) • Be sure that the selected site is a good one. • Jobs are sent correctly to the selected sites (good ones with complete replica) • General issues: • In general, jobs fail even on good sites  Re-submits are necessary until successful • GANGA submission failure due to missing CE-SE correspondence • Often the jobs fail because the site on which they are executed is not properly configured • Speed issue in submitting sub-jobs using LCG RG • WMS gLite bulk submission • At some sites jobs end up in Long queue  job priority missing • Currently no solution  kill and restart again the jobs

Performance • General: • Jobs at IFIC are finished within 1-2 hours  fast execution 1h to run 1M events • Some jobs were ran successfully also in other sites (Lyon, FZK) • Very high efficiency running on those sites where the dataset is available • and no site configuration problem

Results • Some re-combining output N-tuples were analyzed with the Root framework for reconstructing the top quark mass from the hadronic decay

Conclusion • Experience in configuring and deploying IFIC site is shown • Lesson learned from using GANGA: • GANGA is a Lightweight easy grid job submission tool • GANGA performs a great job in configuring, splitting jobs, scheduling input and output files • The Distributed Analysis using GANGA depends strongly on the data distribution and sites quality configuration • Speed of submission was a major issue with LCG RB • need of WMS gLite deployment  Bulk submission feature

Enhancing Distributed Analysis with GANGA on ATLAS Tier-2 Infrastructure