1 / 1

ATLAS Distributed Analysis

ATLAS Distributed Analysis. ATLAS Production System. In order to handle the task of ATLAS Data Challenges, an automated production system was designed. The ATLAS production system consists of 4 components. Don Quijote II. Eowyn. CondorG.

grazia
Télécharger la présentation

ATLAS Distributed Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Distributed Analysis ATLAS Production System In order to handle the task of ATLAS Data Challenges, an automated production system was designed. The ATLAS production system consists of 4 components Don Quijote II Eowyn CondorG The production database, which contains abstract job definitions; The Eowyn supervisor that reads the production database for job definitions and present them to the different Grid executors in an easy-to-parse XML format; The Executors, one for each Grid flavor, that receive the job-definitions in XML format and convert them to the job description language of that particular Grid; Don Quijote II, the Atlas Distributed Data Management System (DDM), moves files from their temporary output locations to their final destination on some Storage Element and registers the files in the Replica Location Service of that Grid. Panda/OSG The ATLAS production system has been successfully used to run production of ATLAS jobs at an unprecedented scale. On successful days there were more then 10000 jobs processed by the system. The experiences obtained operating the system, which includes several grid systems, are considered to be essential also to perform analysis using Grid resources. International Conference on Computing in High Energy and Nuclear Physics DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González (Santiago.GonzalezdelaHoz@cern.ch), D. Liko (Dietrich.Liko@cern.ch), A. Nairz (Armin.Nairz@cern.ch), G. Mair (Gregor.Mair@cern.ch), F. Orellana (Frederik.Orellana@cern.ch), L. Goossens (Luc.Goossens@cern.ch) CERN, European Organization for the Nuclear Research 1211 Genève 23, Switzerland Silvia Resconi (Silvia.Resconi@mi.infn.it) INFN-Istituto Nazionale di Fisica Nucleare, Sezione di Milano, Italy Alessandro De Salvo (Alessandro.DeSalvo@roma1.infn.it) INFN-Istituto Nazionale di Fisica Nucleare, Sezione di Roma, Italy Introduction Detector for the study of high-energy proton-proton collisions. The offline computing will have to deal with an output event rate of 200 Hz. i.e 109 events per year, with an average event size of 1.6 MB. Storage: - Raw recording rate 320 MBytes/sec - Accumulating at 5-8 PetaBytes/year - 20 PetaBytes of disk - 10 PetaBytes of tape Processing: - 40,000 of today’s fastest PCs • In 2002 ATLAS computing planned a first series of Data Challenges (DC’s) in order to validate its: • - Computing Model • - Software • - Data Model • The ATLAS collaboration decided to perform the DCs using the Grid middleware developed in several Grid projects (Grid flavours) like: • - LHC Computing Grid project (LCG), to which CERN is committed • - GRID3/OSG • - NorduGRID Setup for Distributed Analysis • Using latest version of Production System: • - Supervisor: Eowyn • - Executors: Condor-G • Lexor (currently under testing) • - Data Management: plain LCG tools, LFC catalog • DQ2 currently being integrated • - Development database (devdb) • dedicated DA database already in place • Generic analysis transformation has been created: • - compiles user code/package on the worker node • - processes Analysis Object Data (AOD) input files • - produces histogram + n-tuple file as outputs • User Interface: AtCom4 - • The ATLAS Commander (AtCom) is used as a graphical user interface. • Currently used for task and job definitions: • - task: contains summary • information about the jobs to • be run (input/output • datasets, transformation • parameters, resource + • environment requirements, • etc). • - job: concrete parameters • needed for running, but no • Grid-specifics • - Following the ProdDB schema • and xml description Experience running the Analysis Two datasets were used: - 50 events per file, a total of 7000 files for first dataset and 1000 for second one. - 80 jobs (70 for the first one and 10 for the second one) with 100 input files each were defined with AtCom. - these 80 jobs were submitted with Eowyn-Lexor/CondorG and they ran in several LCG sites. Each job produced three output files (ntuple, histogram and log) stored at Castor (CERN, IFIC-Valencia, Taiwan). - ROOT has been used to merge these histogram output files, in a post-processing step. • - Analysis has been done • running our own supervisor • and Lexor/CondorG instance. - Delays due to data transfer • are not an issue any more • because AOD input is • available on-site and jobs • are sent to those sites only. • - System setup is not able • yet to support long queues • (simulation) and short queues • (analysis) in parallel: • - queues are filled • with simulation jobs • - long pending times • for analysis jobs • VO-based user authentication in AtCom: • - user certificate and proxy • used to enable DB access • - no proxy delegation, following • current ProdSys strategy • (jobs still run under (a few) • production managers‘ IDs) • It communicates directly with the Oracle database and provides support to define jobs and monitor their status R4bl The two algorithms of choice have been ttbar and a SUSY reconstructions algorithms provided by the physics group. In case of the ttbar reconstruction the algorithm is already part of the release and has been used as installed already on the Grid sites. In case of the SUSY algorithm the source code of the algorithm has been transferred together with the job and the shared library was obtained by compilation on the worker node. - The analysis has been launched over 350000 events in the ttbar reconstruction and over 50000 events in the SUSY case. - W->jj and Z reconstructed masses, after merging the histogram files, were produced through the ATLAS production system (“a la Grid”) with 100 input files each. - With free resources, system was able to process 10k-event jobs in about 10 min (total). 13-17 February 2006, Mumbai, India http://uimon.cern.ch/twiki/bin/view/Atlas/DA-Activities

More Related