1 / 23

ATLAS Data Challenge on NorduGrid

ATLAS Data Challenge on NorduGrid. CHEP2003 – UCSD Anders Wäänänen waananen@nbi.dk. NorduGrid project. Launched in spring of 2001, with the aim of creating a Grid infrastructure in the Nordic countries. Idea to have a Monarch architecture with a common tier 1 center

Télécharger la présentation

ATLAS Data Challenge on NorduGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Data Challenge on NorduGrid CHEP2003 – UCSD Anders Wäänänen waananen@nbi.dk

  2. NorduGrid project • Launched in spring of 2001, with the aim of creating a Grid infrastructure in the Nordic countries. • Idea to have a Monarch architecture with a common tier 1 center • Partners from Denmark, Norway, Sweden, and Finland • Initially meant to be the Nordic branch of the EU DataGrid (EDG) project • 3 full-time researchers with few externally funded

  3. Motivations • NorduGrid was initially meant to be a pure deployment project • One goal was to have the ATLAS data challenge run by May 2002 • Should be based on the the Globus Toolkit™ • Available Grid middleware: • The Globus Toolkit™ • A toolbox – not a complete solution • European DataGrid software • Not mature for production in the beginning of 2002 • Architecture problems

  4. Input “sandbox” UI JDL Input “sandbox” Output “sandbox” Job Submit Job Query Brokerinfo Job Status Output “sandbox” Job Status A Job Submission Example Replica Catalogue Information Service Resource Broker Author. &Authen. Storage Element Job Submission Service Logging & Book-keeping Compute Element

  5. Architecture requirements • No single point of failure • Should be scalable • Resource owners should have full control over their resources • As few site requirements as possible: • Local cluster installation details should not be dictated • Method, OS version, configuration, etc… • Compute nodes should not be required to be on the public network • Clusters need not be dedicated to the Grid

  6. User interface • The NorduGrid user interface provides a set of commands for interacting with the grid • ngsub – for submitting jobs • ngstat – for states of jobs and clusters • ngcat – to see stdout/stderr of running jobs • ngget – to retrieve the results from finished jobs • ngkill – to kill running jobs • ngclean – to delete finished jobs from the system • ngcopy – to copy files to, from and between file servers and replica catalogs • ngremove – to delete files from file servers and RC’s

  7. ATLAS Data Challenges • A series of computing challenges within Atlas of increasing size and complexity. • Preparing for data-taking and analysis at the LHC. • Thorough validation of the complete Atlas software suite. • Introduction and use of Grid middleware as fast and as much as possible.

  8. Data Challenge 1 • Main goals: • Need to produce data for High Level Trigger & Physics groups • Study performance of Athena framework and algorithms for use in HLT • High statistics needed • Few samples of up to 107 events in 10-20 days, O(1000) CPU’s • Simulation & pile-up • Reconstruction & analysis on a large scale • learn about data model; I/O performances; identify bottlenecks etc • Data management • Use/evaluate persistency technology (AthenaRoot I/O) • Learn about distributed analysis • Involvement of sitesoutsideCERN • use of Grid as and when possible and appropriate

  9. DC1, phase 1: Task Flow • Example: one sample of di-jet events • PYTHIA event generation: 1.5 x 107 events split into partitions (read: ROOT files) • Detector simulation: 20 jobs per partition, ZEBRA output Athena-Root I/O Zebra Hits/ Digits MCTruth Atlsim/Geant3 + Filter Di-jet HepMC (~450 evts) (5000 evts) 105 events Pythia6 Atlsim/Geant3 + Filter Hits/ Digits MCTruth HepMC Atlsim/Geant3 + Filter Hits/ Digits MCtruth HepMC Detector Simulation Event generation

  10. DC1, phase 1: Summary • July-August 2002 • 39 institutes in 18 countries • 3200 CPU’s , approx.110 kSI95 – 71000CPU-days • 5 × 107 events generated • 1 × 107 events simulated • 30 Tbytes produced • 35 000 files of output

  11. DC1, phase1 for NorduGrid • Simulation • Dataset 2000 & 2003 (different event generation) assigned to NorduGrid • Total number of fully simulated events: • 287296 (1.15 × 107 of input events) • Total output size: 762 GB. • All files uploaded to a Storage Element (University of Oslo) and registered in the Replica Catalog.

  12. Job xRSL script & (executable=”ds2000.sh”) (arguments=”1244”) (stdout=”dc1.002000.simul.01244.hlt.pythia_jet_17.log”) (join=”yes”) (inputfiles=(“ds2000.sh” “http://www.nordugrid.org/applications/dc1/2000/dc1.002000.simul.NG.sh”)) (outputfiles= (“atlas.01244.zebra” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.zebra”) (“atlas.01244.his” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.his”) (“dc1.002000.simul.01244.hlt.pythia_jet_17.log” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.log”) (“dc1.002000.simul.01244.hlt.pythia_jet_17.AMI” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.AMI”) (“dc1.002000.simul.01244.hlt.pythia_jet_17.MAG” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.MAG”)) (jobname=”dc1.002000.simul.01244.hlt.pythia_jet_17”) (runtimeEnvironment=”DC1-ATLAS”) (replicacollection=”ldap://grid.uio.no:389/lc=ATLAS,rc=NorduGrid,dc=nordugrid,dc=org”) (maxCPUTime=2000)(maxDisk=1200) (notify=”e waananen@nbi.dk)

  13. NorduGrid job submission • The user submits a xRSL-file specifying the job-options. • The xRSL-file is processed by the User-Interface. • The User-Interface queries the NG Information System for resources and the NorduGrid Replica-Catalog for location of input-files and submits the job to the selected resource. • Here the job is processed by the Grid Manager, which downloads or links files to the local session directory. • The Grid Manager submits the job to the local resource management system. • After simulation finishes, the Grid-Manager moves requested output to Storage Elements and registers these into the NorduGrid Replica-Catalog.

  14. RSL RSL RSL RC MDS SE SE NorduGrid job submission Gatekeeper GridFTP Grid Manager

  15. NorduGrid Production sites

  16. NorduGrid Pileup • DC1, pile-up: • Low luminosity pile-up for the phase 1 events • Number of jobs: 1300 • dataset 2000: 300 • dataset 2003: 1000 • Total output-size: 1083 GB • dataset 2000: 463 GB • dataset 2003: 620 GB

  17. Pileup procedure • Each job downloaded one zebra-file from dc1.uio.no of approximate • 900MB for dataset 2000 • 400MB for dataset 2003 • Use locally present minimum-bias zebra-files to "pileup" events on top of the original simulated ones present in the downloaded file. The output size of each file was about 50 % bigger than the original downloaded file i.e.: • 1.5 GB for dataset 2000 • 600 GB for dataset 2003 • Upload output-files to dc1.uio.no and dc2.uio.no SE‘s • Register into the RC.

  18. Other details • At peak production, up to 200 jobs were managed by the NorduGrid at the same time. • Has most of Scandinavian production clusters under its belt (2 of them are in Top 500) • However not all of them allow for installation of ATLAS Software • Atlas job manager Atlas Commander support the NorduGrid toolkit • Issues • Replica Catalog scalability problems • MDS / OpenLDAP hangs – solved • Software threading problems – partly solved • Problems partly in Globus libraries

  19. NorduGrid DC1 timeline • April 5th 2002 • First ATLAS job submitted (Athena Hello World) • May 10th 2002 • First pre-DC1-validation-job submitted (ATLSIM test using Atlas-release 3.0.1) • End of May 2002 • Now clear that NorduGrid mature enough to handle real production • Spring 2003 (now) • Keep running Data challenges and improve the toolkit

  20. Quick client installation/job run • As a normal user (non system privileges required): • Retrieve nordugrid-standalone-0.3.17.rh72.i386.tgz tar xfz nordugrid-standalone-0.3.17.rh72.i386.tgz cd nordugrid-standalone-0.3.17 source ./setup.sh • Get a personal certificate: grid-cert-request • Install certificate per instructions • Get authorized on a cluster • Run a job grid-proxy-init ngsub '&(executable=/bin/echo)(arguments="Hello World")‘

  21. Resources • Documentation and source code are available for download • Main Web site: • http://www.nordugrid.org/ • ATLAS DC1 with NorduGrid • http://www.nordugrid.org/applications/dc1/ • Software repository • ftp://ftp.nordugrid.org/pub/nordugrid/

  22. The NorduGrid core group • Александр Константинов • Balázs Kónya • Mattias Ellert • Оксана Смирнова • Jakob Langgaard Nielsen • Trond Myklebust • Anders Wäänänen

More Related