1 / 50

ATLAS Data Challenges

ATLAS Data Challenges. NorduGrid Workshop Uppsala November 11-13; 2002 Gilbert Poulard ATLAS DC coordinator CERN EP-ATC. Outline. Introduction DC0 DC1 Grid activities in ATLAS DCn’s Summary DC web page: http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/DC/index.html. Data challenges.

taylor
Télécharger la présentation

ATLAS Data Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Data Challenges NorduGrid Workshop Uppsala November 11-13; 2002 Gilbert Poulard ATLAS DC coordinator CERN EP-ATC

  2. Outline • Introduction • DC0 • DC1 • Grid activities in ATLAS • DCn’s • Summary DC web page: http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/DC/index.html G. Poulard - NorduGrid Workshop

  3. Data challenges • Why? • In the context of the CERN Computing Review it has been recognized that the Computing for LHC was very complex and requested a huge amount of resources. • Several recommendations were made, among them: • Create the LHC Computing Grid (LCG) project • Ask the experiments to launch a set of Data Challenges to understand and validate • Their computing model; data model; software suite • Their technology choices • The scalability of the chosen solutions G. Poulard - NorduGrid Workshop

  4. ATLAS Data challenges • In ATLAS it was decided • To foresee a “serie” of DCs of increasing complexity • Start with data which looks like real data • Run the filtering and reconstruction chain • Store the output data into the ‘ad-hoc’ persistent repository • Run the analysis • Produce physics results • To study • Performance issues, persistency technologies, analysis scenarios, ... • To identify • weaknesses, bottle necks, etc… (but also good points) • In using both the hardware (prototype) the software and the middleware developed and/or deployed by the LCG project G. Poulard - NorduGrid Workshop

  5. ATLAS Data challenges • But it was also acknowledged that: • Today we don’t have ‘real data’ • Need to produce ‘simulated data’ first • So: • Physics Event generation • Detector Simulation • Pile-up • Reconstruction and analysis • will be part of the first Data Challenges • we need also to “satisfy” the requirements of the ATLAS communities • HLT, Physics groups, ... G. Poulard - NorduGrid Workshop

  6. ATLAS Data challenges • In addition it is understood that the results of the DCs should be used to • Prepare a computing MoU in due time • Perform a new Physics TDR ~one year before the real data taking • The retained schedule was to: • start with DC0 in late 2001 • Considered at that time as a preparation one • continue with one DC per year G. Poulard - NorduGrid Workshop

  7. DC0 • Was defined as • A readiness and continuity test • Have the full chain running from the same release • A preparation for DC1; in particular • One of the main emphasis was to put in place the full infrastructure with Objectivity (which was the base-line technology for persistency at that time) • It should also be noted that there was a strong request from the physicists to be able to reconstruct and analyze the “old” physics TDR data within the new Athena framework G. Poulard - NorduGrid Workshop

  8. DC0: Readiness & continuity tests (December 2001 – June 2002) • “3 lines” for “full” simulation • 1) Full chain with new geometry (as of January 2002) Generator->(Objy)->Geant3->(Zebra->Objy)->Athena rec.->(Objy)->Analysis • 2) Reconstruction of ‘Physics TDR’ data within Athena (Zebra->Objy)->Athena rec.-> (Objy) -> Simple analysis • 3) Geant4 • Robustness test Generator-> (Objy)->Geant4->(Objy) • “1 line” for “fast” simulation Generator-> (Objy) -> Atlfast -> (Objy) Continuity test:Everything from the same release for the full chain (3.0.2) G. Poulard - NorduGrid Workshop

  9. Schematic View of Task Flow for DC0 Objectivity/DB Objectivity/DB Zebra H  4 mu AOD Hits/Digits MCTruth Hits/Digits MCTruth Atlsim/G3 Athena Athena HepMC Hits/Digits MCTruth AOD Hits/Digits MCTruth Pythia 6 HepMC Atlsim/G3 Athena Athena Hits/Digits MCTruth Hits/Digits MCTruth AOD HepMC Atlsim/G3 Athena Athena Data Conversion Event generation Detector Simulation Reconstruction G. Poulard - NorduGrid Workshop

  10. DC0: Readiness & continuity tests (December 2001 – June 2002) • Took longer than foreseen • Due to several reasons • Introduction of new tools • Change of the base-line for persistency • Which has as a major consequence to divert some of the man power • Under-evaluation of the statement • “have everything from the same release” • Nevertheless we learnt a lot • Was completed in June 2002 G. Poulard - NorduGrid Workshop

  11. ATLAS Data Challenges: DC1 • Original goals (November 2001): • reconstruction & analysis on a large scale • learn about data model; I/O performances; identify bottlenecks … • data management • Use/evaluate persistency technology (AthenaRoot I/O) • Learn about distributed analysis • Need to produce data for HLT & Physics groups • HLT TDR has been delayed to mid 2003 • Study performance of Athena and algorithms for use in HLT • High statistics needed • Scale: few samples of up to 107 events in 10-20 days, O(1000) PC’s • Simulation & pile-up will play an important role • Introduce new Event Data Model • Checking of Geant4 versus Geant3 • Involvement of CERN & outside-CERN sites: Worldwide excersise • use of GRID middleware as and when possible and appropriate • To cope with different sets of requirements and for technical reasons (including software development, access to resources) decided to split DC1 into two phases G. Poulard - NorduGrid Workshop

  12. ATLAS DC1 • Phase I (April – August 2002) • Primary concern is delivery of events to HLT community • Put in place the MC event generation & detector simulation chain • Put in place the distributed MonteCarlo production • Phase II (October 2002 – January 2003) • Provide data with (and without) ‘pile-up’ for HLT studies • Introduction & testing of new Event Data Model (EDM) • Evaluation of new persistency technology • Use of Geant4 • Production of data for Physics and Computing Model studies • Testing of computing model & of distributed analysis using AOD • Use more widely GRID middleware G. Poulard - NorduGrid Workshop

  13. DC1 preparation • First major issue was to get the software ready • New geometry (compared to December-DC0 geometry) • New persistency mechanism • … Validated • … Distributed • “ATLAS kit” (rpm) to distribute the software • And to put in place the production scripts and tools (monitoring, bookkeeping) • Standard scripts to run the production • AMI bookkeeping database ( Grenoble) • Magda replica-catalog (BNL) G. Poulard - NorduGrid Workshop

  14. DC1 preparation: software (1) • New geometry (compared to December-DC0 geometry) • Inner Detector • Beam pipe • Pixels: Services; material updated; More information in hits; better digitization • SCT tilt angle reversed (to minimize clusters) • TRT barrel: modular design • Realistic field • Calorimeter • ACBB: material and readout updates • ENDE: dead material and readout updated (last minute update to be avoided if possible) • HEND: dead material updated • FWDC: detailed design • End-cap Calorimeters shifted by 4 cm. • Cryostats split into Barrel and End-cap • Muon • AMDB p.03 (more detailed chambers cutouts) • Muon shielding update G. Poulard - NorduGrid Workshop

  15. ATLAS Geometry • Inner Detector • Calori meters • Muon System G. Poulard - NorduGrid Workshop

  16. ATLAS/G3 Few Numbers at a Glance • 25,5 millions distinct volume copies • 23 thousands different volume objects • 4,673 different volume types • Few hundred pile-up events possible • About 1 million hits per event on average G. Poulard - NorduGrid Workshop

  17. DC1 preparation: software (2) • New persistency mechanism • AthenaROOT/IO • Used for generated events • Readable by Atlfast and Atlsim • Simulation still using zebra G. Poulard - NorduGrid Workshop

  18. DC1/Phase I preparation: kit; scripts & tools • Kit • “ATLAS kit” (rpm) to distribute the software • It installs release 3.2.1 (all binaries) without any need of AFS • It requires : • Linux OS (Redhat 6.2 or Redhat 7.2) • CERNLIB 2001 (from DataGrid repository) cern-0.0-2.i386.rpm (~289 MB) • It can be downloaded : • from a multi-release page (22 rpm's; global size ~ 250 MB ) • “tar” file also available • Scripts and tools (monitoring, bookkeeping) • Standard scripts to run the production • AMI bookkeeping database G. Poulard - NorduGrid Workshop

  19. DC1/Phase I Task Flow • As an example, for 1 sample of di-jet events: • Event generation: 1.5 x 107 events in 150 partitions • Detector simulation: 3000 jobs Zebra Athena-Root I/O Di-jet Hits/ Digits MCTruth Atlsim/Geant3 + Filter HepMC 105 events (5000 evts) (~450 evts) Pythia 6 Hits/ Digits MCTruth Atlsim/Geant3 + Filter HepMC Hits/ Digits MCtruth Atlsim/Geant3 + Filter HepMC Event generation Detector Simulation G. Poulard - NorduGrid Workshop

  20. DC1 preparation: validation & quality control • We defined two types of validation • Validation of the sites: • We processed the same data in the various centres and made the comparison • To insure that the same software was running in all production centres • We also checked the random number sequences • Validation of the simulation: • We used both “old” generated data & “new” data • Validation datasets: di-jets, single ,e, ,H4e/2/2e2/4 • About 107 evts reconstructed in June, July and August • We made the comparison between “old” and “new” simulated data G. Poulard - NorduGrid Workshop

  21. DC1 preparation: validation & quality control • This was a very “intensive” activity • Many findings: simulation or software installation sites problems (all eventually solved) • We should increase the number of people involved • It is a “key issue” for the success! G. Poulard - NorduGrid Workshop

  22. Example:  jets distribution (di-jets sample) New sim sample Old sim sample 2 Comparison Reappearance of an old dice version in a site installed software G. Poulard - NorduGrid Workshop

  23. Data Samples I • Validation samples (740k events) • single particles (e, g, m, p), jet scans, Higgs events • Single-particle production (30 million events) • single p (low pT; pT=1000 GeV with 2.8<h<3.2) • single m (pT=3, …, 100 GeV) • single e and g • different energies (E=5, 10, …, 200, 1000 GeV) • fixed h points; h scans (|h|<2.5); h crack scans (1.3<h<1.8) • standard beam spread (sz=5.6 cm); fixed vertex z-components (z=0, 4, 10 cm) • Minimum-bias production (1.5 million events) • different h regions (|h|<3, 5, 5.5, 7) G. Poulard - NorduGrid Workshop

  24. Data Samples II • QCD di-jet production (5.2 million events) • different cuts onET(hard scattering) during generation • large production of ET>11, 17, 25, 55 GeV samples, applying particle-level filters • large production of ET>17, 35 GeV samples, without filtering, full simulation within |h|<5 • smaller production of ET>70, 140, 280, 560 GeV samples • Physics events requested by various HLT groups (e/g, Level-1, jet/ETmiss, B-physics, b-jet,m; 4.4 million events) • large samples for the b-jet trigger simulated with default (3 pixel layers) and staged (2 pixel layers) layouts • B-physics (PL) events taken from old TDR tapes G. Poulard - NorduGrid Workshop

  25. Australia Austria Canada CERN Czech Republic Denmark France Germany Israel Italy Japan Norway Russia Spain Sweden Taiwan UK USA ATLAS DC1/Phase I: July-August 2002Goals : Produce the data needed for the HLT TDR Get as many ATLAS institutes involved as possibleWorldwide collaborative activityParticipation : 39 Institutes in 18 countries G. Poulard - NorduGrid Workshop

  26. ATLAS DC1 Phase I : July-August 2002 • CPU Resources used : • Up to 3200 processors (5000 PIII/500 equivalent) • 110 kSI95 (~ 50% of one Regional Centre at LHC startup) • 71000 CPU*days • To simulate one di-jet event : 13 000 SI95sec • Data Volume: • 30 Tbytes • 35 000 files • Output size for one di-jet event (2.4 Mbytes) • Data kept at production site for further processing • Pile-up • Reconstruction • Analysis G. Poulard - NorduGrid Workshop

  27. ATLAS DC1 Phase I : July-August 2002 3200 CPU‘s 110 kSI95 71000 CPU days 39 institutions in 18 countries 5*10*7 events generated 1*10*7 events simulated 3*10*7 single particles 30 Tbytes 35 000 files G. Poulard - NorduGrid Workshop

  28. ATLAS DC1 Phase 1 : July-August 2002 G. Poulard - NorduGrid Workshop

  29. ATLAS DC1 Phase II • Provide data with and without ‘pile-up’ for HLT studies • new data samples (huge amount of requests) • Pile-up in Atlsim • “Byte stream” format to be produced • Introduction & testing of new Event Data Model (EDM) • This should include new Detector Description • Evaluation of new persistency technology • Use of Geant4 • Production of data for Physics and Computing Model studies • Both ESD and AOD will be produced from Athena Reconstruction • We would like to get the ‘large scale reconstruction’ and the ‘data-flow’ studies ready but not be part of Phase II • Testing of computing model & of distributed analysis using AOD • Use more widely GRID middleware (have a test in November) G. Poulard - NorduGrid Workshop

  30. Pile-up • First issue is to produce the pile-up data for HLT • We intend to do this now • Code is ready • Validation is in progress • No “obvious” problems G. Poulard - NorduGrid Workshop

  31. Luminosity Effect Simulation • Aim Study Interesting Processing at different Luminosity L • Separate Simulation of Physics Events & Minimum Bias Events • Merging of • Primary Stream (Physics) • Background Stream (Pileup) Primary Stream Background Stream (KINE,HITS) (KINE,HITS N(L) 1 DIGITIZATION Bunch Crossing (DIGI) G. Poulard - NorduGrid Workshop

  32. Pile-up features • Different detectors have different memory time requiring very different number of minimum bias events to be read in • Silicons, Tile calorimeter: t<25 ns • Straw tracker: t<~40-50 ns • Lar Calorimeters: 100-400 ns • Muon Drift Tubes: 600 ns • Still we want the pile-up events to be the same in different detectors ! G. Poulard - NorduGrid Workshop

  33. Higgsinto twophotonsnopile-up G. Poulard - NorduGrid Workshop

  34. Higgsinto twophotonsL=10^34pile-up G. Poulard - NorduGrid Workshop

  35. Pile-up production • Scheduled for October-November 2002 • Both low (2 x 1033) and high luminosity (1034) data will be prepared • Resources estimate: • 10000 CPU days (NCU) • 70 Tbyte of data • 100000 files G. Poulard - NorduGrid Workshop

  36. ATLAS DC1 Phase II (2) • Next steps will be to • run the reconstruction within Athena framework • Most functionality should be there with release 5.0.0 • Probably not ready for ‘massive’ production • Reconstruction ready by the end of November • produce the “byte-stream” data • perform the analysis of the AOD • In parallel the dedicated code for HLT studies is being prepared (PESA release 3.0.0) • Geant4 tests with a quite complete geometry should be available by mid-December • Large scale Grid test is scheduled for December • “Expected “end” date 31st January 2003 • “Massive” reconstruction is not part of DC1 Phase II G. Poulard - NorduGrid Workshop

  37. ATLAS DC1 Phase II (3) • Compared to Phase I • More automated production • “Pro-active” use of the AMI bookkeeping database to prepare the jobs and possibly to monitor the production • “Pro-active” use of the “magda” replica catalog • We intend to run the “pile-up” production as much as possible where the data is • But we have already newcomers (countries and institutes) • We do not intend to send all the pile-up data to CERN • Scenari to access the data for reconstruction and analysis are being studied • Use of Grid tools is ‘seriously’ considered G. Poulard - NorduGrid Workshop

  38. ATLAS DC1/Phase II: October 2002-January 2003Goals : Produce the data needed for the HLT TDR Get as many ATLAS institutes involved as possibleWorldwide collaborative activityParticipation : 43 Institutes • Australia • Austria • Canada • CERN • China • Czech Republic • Denmark • France • Germany • Greece • Israel • Italy • Japan • Norway • Russia • Spain • Sweden • Taiwan • UK • USA G. Poulard - NorduGrid Workshop

  39. ATLAS Planning for Grid Activities • Advantages of using the Grid: • Possibility to do worldwide production in a perfectly coordinated way, using identical software, scripts and databases. • Possibility do distribute the workload adequately and automatically, without logging in explicitly to each remote system. • Possibility to execute tasks and move files over a distributed computing infrastructure by using one single personal certificate (no need to memorize dozens of passwords). • Where we are now: • Several Grid toolkits are on the market. • EDG – probably the most elaborated, but still in development. • This development goes way faster with the help of the users running real applications. G. Poulard - NorduGrid Workshop

  40. Present Grid Activities • Atlas already used Grid test-beds in DC1/1 • 11 out of 39 sites ( ~5% of the total production) used Grid middleware: • NorduGrid (Bergen, Grendel, Ingvar, ISV, NBI, Oslo,Lund, LSCF) • all production done on the Grid • US Grid test-bed (Arlington, LBNL, Oklahoma; more sites will join in the next phase) • used for ~10% of US DC1 production (10% = 900 CPUdays) G. Poulard - NorduGrid Workshop

  41. .... in addition • ATLAS-EDG task-force • with 40 members from ATLAS and EDG (led by Oxana Smirnova) • used the EU-DataGrid middleware to rerun 350 DC1 jobs in some Tier1 prototype sites: CERN, CNAF, Lyon, RAL, NIKHEF and Karlsruhe ( CrossGrid site) • done in the first half of September) • Good results have been achieved: • A team of hard-working people across the Europe • ATLAS software is packed into relocatable RPMs, distributed and validated • DC1 production script is “gridified”, submission script is produced • Jobs are run at a site chosen by the resource broker • Still work needed (in progress) for reaching sufficient stability and easiness of use • Atlas-EDG continuing till end 2002, interim report with recommendations is being drafted G. Poulard - NorduGrid Workshop

  42. Grid in ATLAS DC1/1 US-ATLAS EDG Testbed Prod NorduGrid G. Poulard - NorduGrid Workshop

  43. Plans for the near future • In preparation for the reconstruction phase (spring 2003) we foresee further Grid tests in November • Perform more extensive Grid tests. • Extend the EDG to more ATLAS sites, not only in Europe. • Test a basic implementation of a worldwide Grid. • Test the inter-operability between the different Grid flavors. • Inter-operation = submit a job in region A, the job is run in region B if the input data are in B; the produced data are stored; the job log is made available to the submitter. • The EU project DataTag has a Work Package devoted specifically to interoperation in collaboration with US IvDGL project: the results of the work of these projects is expected to be taken up by LCG (GLUE framework). G. Poulard - NorduGrid Workshop

  44. Plans for the near future (continued) • ATLAS is collaborating with DataTag-IvDGL for interoperability demonstrations in November • How far we can go we will see during the next week(s) when we will discuss with technical experts. • The DC1 data will be reconstructed (using ATHENA) early 2003: the scope and way of using Grids for distributed reconstruction will depend on the results of the November/December tests. • ATLAS is fully committed to LCG and to its Grid middleware selection process • our “early tester” role has been recognized to be very useful for EDG. • We are confident that it will be the same for LCG. G. Poulard - NorduGrid Workshop

  45. Long Term Planning • Worldwide Grid tests are essential to define in detail the ATLAS distributed Computing Model. • ATLAS members are already involved in various Grid activities and take also part in inter-operability tests. In the forthcoming DCs this will become an important issue. • All these tests will be done in close collaboration with the LCG and the different Grid projects. G. Poulard - NorduGrid Workshop

  46. DC2-3-4-… • DC2: Q3/2003 – Q2/2004 • Goals • Full deployment of EDM & Detector Description • Geant4 replacing Geant3 (fully?) • Pile-up in Athena • Test the calibration and alignment procedures • Use LCG common software (POOL; …) • Use widely GRID middleware • Perform large scale physics analysis • Further tests of the computing model • Scale • As for DC1: ~ 107 fully simulated events • DC3: Q3/2004 – Q2/2005 • Goals to be defined; Scale: 5 x DC2 • DC4: Q3/2005 – Q2/2006 • Goals to be defined; Scale: 2 X DC3 G. Poulard - NorduGrid Workshop

  47. Summary (1) • We learnt a lot from DC0 and the preparation of DC1 • The involvement of all people concerned is a success • The full production chain has been put in place • The validation phase was “intensive”, “stressing” but it is a “key issue” in the process • We have in hands the simulated events required for the HLT TDR • Use of Grid tools looks very promising G. Poulard - NorduGrid Workshop

  48. Summary (2) • For DC1/Phase II: • Pile-up preparation is in good shape • The introduction of the new EDM is a challenge by itself • Release 5 (November 12) should provide the requested functionality • Grid tests are scheduled for November/December • Geant4 tests should be ready by mid-December G. Poulard - NorduGrid Workshop

  49. Summary (3) • After DC1 • New Grid tests are foreseen in 2003 • ATLAS is fully committed to LCG • As soon as LCG-1 will be ready (June 2003) we intend to actively participate to the validation effort • Dates for next DCs should be aligned to the deployment of the LCG and Grid software and middleware G. Poulard - NorduGrid Workshop

  50. Summary (4): thanks to all DC-team members A-WP1: Event generation A-WP3: Geant4 Simulation A-WP4: Pile-up A-WP2: Geant3 simulation A-WP5: Detector response A-WP7: Event filtering A-WP8: Reconstruction A-WP6: Data Conversion A-WP11: Tools A-WP9: Analysis A-WP10: Data Management A-WP12: Teams Production Validation …. A-WP14: Fast Simulation A-WP13: Tier Centres G. Poulard - NorduGrid Workshop

More Related