1 / 32

Data Challenge Needs

Data Challenge Needs. RWL Jones. Data challenges. Goal validate our computing model and our software How? Iterate on a set of DCs of increasing complexity start with data which looks like real data Run the filtering and reconstruction chain Store the output data into our database

ziazan
Télécharger la présentation

Data Challenge Needs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Challenge Needs RWL Jones ATLAS UK Physics Meeting

  2. Data challenges • Goal • validate our computing model and our software • How? • Iterate on a set of DCs of increasing complexity • start with data which looks like real data • Run the filtering and reconstruction chain • Store the output data into our database • Run the analysis • Produce physics results • To understand our computing model • Performances, bottle necks, etc… • To check and validate our software ATLAS UK Physics Meeting

  3. But: • Today we don’t have ‘real data’ • Needs to produce ‘simulated data’ first so: • Physics Event generation • Simulation • Pile-up • Detector response • Plus reconstruction and analysis will be part of the first Data Challenges ATLAS UK Physics Meeting

  4. ATLAS Kits • Each DC will have an associated kit • Current kit 1.3.0 for tests, will be replaced for DC0; testing 2.0.3 in October • Default kit excludes compilers, but an `all in’ kit exists – more intrusive in OS but usable by more OS versions • So far, no Grid/Globus tools included ATLAS UK Physics Meeting

  5. ATLAS kit 1.3.0 • Tar file with ATLAS software to be sent to remote GRID sites and used in DC0 • Main requirements:  NO AFS  NO root privileges to install the software  Possibility to COMPILE the code  NOT to big Tar file  Should run on Linux platform ATLAS UK Physics Meeting

  6. First version of the ATLAS kit It installs: SRT (= Software Release Tools) version 0.3.2 a subset of ATLAS release 1.3.1 :  main Atlas Applications code + Makefiles : DiceMain, DicePytMain AtreconMain (Dice = G3 based ATLAS simulation program) (Atrecon = ATLAS Reconstruction program) ATLAS packages and libraries needed for compilation CLHEP version 1.6.0.0 ATLAS UK Physics Meeting

  7. It requires: Linux OS (at the moment tested on Redhat 6.1, 6.2, 7.1) Mandrake has problems • CERNLIB 2000 installed • If you need CERBLIB2000, use kit1 • If you are on RedHat 7.1, need compilers in kit2 It provides: all instructions to install / compile / run in a README file example jobs to run full simulation and reconstruction plus example datacards (DICE, Atrecon)  some scripts to set environment variables, to compile and run ATLAS UK Physics Meeting

  8. It can be downloaded :  http://pcatl0a.mi.infn.it/~resconi/kit/atlas_kit.html ATLAS_kit.tar.gz (~ 90 MB)  then execute: gtar -xvzf ATLAS_kit.tar.gz  it will open a directory /ATLAS of ~ 500 MB  It has been installed and tested on GRID machines + non-ATLAS machines  sites involved in first tests of the ATLAS kit: Milan, Rome, Glasgow, Lund providing feedback…  Lacks verification kit, analysis tools WORK IN PROGRESS ATLAS UK Physics Meeting

  9. What about Globus? • Not needed in current kit (but needed if you want to be part of the DataGrid Test Bed) • Will be needed for DC1 (?!) – should be in the Kit if so • If installing now, take the version from the GridPP website – the Hey CD-ROM is out of date (RPMs will be available) ATLAS UK Physics Meeting

  10. DC0: start: 1 November 2001end : 12 December 2001 • 'continuity' test through the software chain • aim is primarily to check the state of readiness for Data Challenge 1 • 100k Z+jet events, or similar – several times • software works: • issues to be checked include • G3 simulation on PC farm • 'pile-up' handling • what trigger simulation is to be run (ATRIG?) • reconstruction running. • data must be written/read to/from the database ATLAS UK Physics Meeting

  11. DC1: start: 1 February 2002end : 30 July 2002 • scope increases significantly beyond DC0 • Several sample of up to 107 events • Should involve CERN & outside-CERN sites • as a goal, be able to run: • O(1000) PC’s • 107 events • Simulation • Pile-up • Reconstruction • 10-20 days ATLAS UK Physics Meeting

  12. Aims of DC1(1) • Provide a sample of 107 events for HLT studies • improve previous statistics by a factor 10 • Study performance of Athena and algorithms for use in HLT • HLT TDR due for the end of 2002 ATLAS UK Physics Meeting

  13. Aims of DC1(2) • Try out running 'reconstruction' and 'analysis' on a large scale. • learn about our data model • I/O performances • Bottle necks • Note • Simulation and pile-up will play an important role ATLAS UK Physics Meeting

  14. Aims of DC1(3) • Understand our ‘distributed’ computing model • GRID • Use of GRID tools • Data management • Dbase technologies • N events with different technologies • distributed analysis • Access to data ATLAS UK Physics Meeting

  15. Aims of DC1(4) • Provide samples of physics events to check and extend some of the Physics TDR studies • data generated will be mainly ‘standard model’ • checking Geant3 versus Geant4 • understand how to do the comparison • understand ‘same’ geometry ATLAS UK Physics Meeting

  16. DC2 start: January 2003end : September 2003 • scope depends on the ‘success’ of DC0/1 • goals • use of ‘Test-Bed’ • 108 events, complexity at ~50% of 2006-7 system • Geant4 should play a major role • ‘hidden’ new physics • test of calibration procedures • extensive use of GRID middleware • Do we want to add part or all of: • DAQ • LVl1, Lvl2, Event filter ATLAS UK Physics Meeting

  17. DC Overview Board Work Plan Definition DCExecutionBoard DC DefinitionCommittee(DC2) RTAG WP WP WP WP WP The ATLAS Data Challenges Project Structure Organisation CSG NCB Reviews ATLAS Data Challenges Reports Resource Matters DataGridProject TIERs OtherComputingGridProjects ATLAS UK Physics Meeting

  18. Event generation • The type of events has to be defined • Several event generators will probably be used • For each of them we have to define the version • in particular Pythia • Robust? • Event type & event generators have to be defined by • HLT group (for HLT events) • Physics community • Depending on the output we can use the following frameworks • ATGEN/GENZ • for ZEBRA output format • Athena • for output in OO-db (HepMC) • Zebra to HepMc convertor already exists. ATLAS UK Physics Meeting

  19. Simulation • Geant3 or Geant4? • DC0 and DC1 will still rely on Geant3 – G4 version not ready • Urgently need Geant4 experience • Geometry has to be defined (same as G3 for validation) • Use standard events for validation • The `physics’ is improved • for Geant3 simulation, “Slug/Dice” or “Atlsim” framework • In both cases output will be Zebra • for Geant4 simulation, probably use the FADS/Goofy framework • output will be ‘Hits collections’ in OO-db ATLAS UK Physics Meeting

  20. Pile-up • Add to the ‘Physics event’ “N” ‘minimum bias events’ • N depends on the luminosity • Suggested • 2-3 at L = 1033 • 6 at L = 2 x 1033 • 24 at L = 1034 • N depends of the detector • In the calorimeter NC is ~ 10 times bigger than for other detectors • Matching events for different pile-up in different detectors a real headache! • The ‘minimum bias’ events should be generated first, they will then be picked-up randomly when the merging is done • This will be a high I/O operation • Efficiency technology dependent (sequential or random access files) ATLAS UK Physics Meeting

  21. Reconstruction • Reconstruction • Run in Athena framework • Input should be from OO-db • Output in OO-db: • ESD • AOD • TAG • Atrecon could be a back-up possibility • To be decided ATLAS UK Physics Meeting

  22. Data management • Many ‘pieces’ of infrastructure still to be decided • Everything related to the OO-db (Objy or/and ORACLE) • Tools for creation, replication, distribution • What do we do with ROOT I/O • Which fraction of the events will be done with ROOT I/O • Thousands of files will be produced and need “bookkeeping” and a “catalog” • Where is the “HepMC” truth data ? • Where is the corresponding “simulated” or AOD data ? • Selection and filtering? • Correlation between different pieces of information? ATLAS UK Physics Meeting

  23. Data management • Several technologies will be evaluated, so we will have to duplicate some data • Same data in ZEBRA & OO-db • Same data in ZEBRA FZ and ZEBRA random-access (for pile-up) • We need to quantify this overhead • We have also to “realize” that the performances will depend on the technology • Sequential versus random access files ATLAS UK Physics Meeting

  24. DC0 planning • For DC0, probably in September software week, decide on the strategy to be adopted: • Software to be used • Dice geometry • Reconstruction adapted to this geometry • Database • Infrastructure • Gilbert hopes (hmmm) that we will have in place ‘tools’ for: • Automatic job-submission • Data catalog and book keeping • allocation of “run numbers” and of “random numbers” ( book keeping) • The ‘validation’ of components must be done now ATLAS UK Physics Meeting

  25. Currently available software, June-Sept2001. ATHENA ZEBRA Particle lev. simulation Fast det.simulation atgen-b fortran, Hp only ---------------- Py5.7Jetset74 +code dedicated to B-physics ---------------- Lujets->GENZ bank Detector simulation Atlfast++ reads GENZ convert to HepMc produce Dice: slug+geant3 fortran produce GENZ+KINE bank Ntuples New geom TDR geom Reconstruction C++ reads GENZ +kine convert to HepMc produce Reconstruction ZEBRA Atrecon fortran,c++ read GENZ+kine produce Ntuples Ntuples ATLAS UK Physics Meeting

  26. Simulation software to be available Nov-Dec2001. HepMc?? ATHENA ATHENA Fast det.simulation Particle lev. simulation Detector simulation Atlfast++ reads HepMc produce GeneratorModules C++, linux ---------------- Py6 +code dedicated to B-physics ---------------- PYJETS->HepMc --------------- EvtGen BaBar package ( later). Dice: slug+geant3 fortran produce GENZ+KINE bank Ntuples Reconstruction ZEBRA C++ reads GENZ +kine convert to HepMc produce Ntuples ATLAS UK Physics Meeting

  27. Analysis • Analysis tools evaluation should be part of the DC • Required for test of the Event Data Model • Essential for tests of Computing Models • Output for HLT studies will be only few hundred events • ‘Physics events’ would be more appropriate for this study • ATLAS Kit must include analysis tools ATLAS UK Physics Meeting

  28. Storage and CPU issues in DC1 • Testing storage technology will inflate data volume/event (easiest to re-simulate) • Testing software chains will inflate CPU usage/event • The size of the events with pile-up depends on the luminosity • 4 MB per event @ L= 2 x 1033 • 15 MB per event @ L= 1034 • The time to do the pile-up depends also on the luminosity • 55 s (HP) per event @ L= 2 x 1033 • 200 s (HP) per event @ L= 1034 ATLAS UK Physics Meeting

  29. Issues for DC1 • Manpower is the most precious resource; coherent generation will be a significant job for each participating site • Do we have enough hardware resources in terms of CPU, disk space, tapes, data servers … Looks OK • Entry-requirement for generation O(100) CPUs (NCB) – clouds • What will we do with the data generated during the DC? • Keep it on CASTOR? Tapes? • How will we exchange the data? • Do we want to have all the information at CERN?, everywhere? • What are the networking requirements? ATLAS UK Physics Meeting

  30. ATLAS interim manpower request from GridPP • Requested another post for DC co-ordination and management tools, running DCs, and Grid integration and verification. Looking at declared manpower, this is insufficient in pre-Grid era! • Further post for Replication, Catalogue and MSS integration for ATLAS ATLAS UK Physics Meeting

  31. Interim ATLAS request • Grid-aware resource discovery and job submission for ATLAS; essential all this be progammatic by DC2. Overlap with LHCb? • Should add to this post(s) for verification activities, which is a large part of the work • Should also ask for manpower for verification packages ATLAS UK Physics Meeting

  32. Joint meeting with LHCb • Common project on experiment code installation Grid-based tools • Event selection and data discovery tools (GANGA is an LHCb proposed prototype layer between Gaudi/Athena and Datastores and catalogues) ATLAS UK Physics Meeting

More Related