1 / 44

Working with Grid Sites in ATLAS Alessandro De Salvo Alessandro.DeSalvo@roma1fn.it 27-10-2017

This article provides an overview of grid concepts and how to work with grid certificates in the ATLAS project. It discusses accessing datasets, managing files, setting up software, and using FAX. The article also includes links and contact information.

jmortensen
Télécharger la présentation

Working with Grid Sites in ATLAS Alessandro De Salvo Alessandro.DeSalvo@roma1fn.it 27-10-2017

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Working with Grid Sites in ATLASAlessandro De Salvo Alessandro.DeSalvo@roma1.infn.it27-10-2017 Outline Grid concepts Working with Grid certificates The Atlas VO Getting info on the datasets Managing files Setting up the ATLAS software from CVMFS Using FAX Links and contacts A. De Salvo – 27 Oct 2017

  2. SECTION 1 Grid concepts

  3. What is a grid? • Relation to WWW? • Uniform easy access to shared information • Relation to distributed computing? • Local clusters • WAN (super)clusters • Condor • Relation to distributed file systems? • NFS, AFS, GPFS, Lustre, PanFS… • A grid gives selected user communities uniform access to distributed resources with independent administrations • Computing, data storage, devices, …

  4. Why is it called grid? • Analogy to power grid • You do not need to knowwhere your electricity comes from • Just plug in your devices • You should not need to knowwhere your computing is done • Just plug into the grid for your computing needs • You should not need to knowwhere your data is stored • Just plug into the grid for your storage needs

  5. Ian Foster’s checklist http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf Globus • A grid coordinates resources that are notsubject to centralizedcontrol, • using standard, open, general-purpose protocols and interfaces, • to deliver non-trivialqualities of service. • Response time • Throughput • Capacity • Availability • Security • Co-allocation of multiple resource types for complex work • Utility of the combined system significantly greater than the sum of its parts

  6. What is Cloud Computing? • Transparent use of generic computingresources off-site • Dynamically provisioned • Metered • Neutral to applications • Rent-a-center Computer or data center Internet • Amazon EC2 • Amazon S3 • Sun • Private Clouds • … Site

  7. What is grid computing? Site Site Internet Site Site Site Site Site

  8. What is grid computing about? • A grid facilitates collaboration between members of a supported distributed community • They can form a Virtual Organization within that grid • A grid allows distributed resources to be shareduniformly and securely for common goals • Computing • Data storage • A grid can support multipleVirtual Organizations in parallel • Sites, computer and data centers make selections according to the projects in which they participate • The quality of service may differ per VO

  9. How does a grid work? • Middlewaremakes multiple computer and data centers look like a single system to the user • Security • Information system • Data management • Job management • Monitoring • Accounting • Not easy!

  10. Where can we use grids? • Scientific collaborations • Can also serve in spreading know-how to developing countries • Industry? Commerce? • Research collaborations • Intra-company grids • Mostly cloud computing • Grid research may provide open standards, technologies • Homes? Schools? • E-learning • Internet Service Providers  cloud computing • Government? Hospitals? Other public services? • Beware of sensitive/private data

  11. There are many grids • EGEE – Enabling Grids for E-sciencE • Europe and beyond • OSG – Open Science Grid • USA and beyond • National • INFNGrid (It), GridPP/NGS (UK), D-Grid (De), NAREGI (Jp), … • Regional • NorduGrid (Nordic countries), BalticGrid (Baltic region),SEEGrid (S-E Europe), EUMedGrid (Mediterranean), … • Interregional • EELA (Europe + Latin America), EUIndiaGrid, EUChinaGrid • WLCG – Worldwide LHC Computing Grid • Federation of EGEE, OSG, Nordic Data Grid Facility, … • Grids of Clouds • Private/scientific Clouds • Commercial Clouds

  12. Projects collaborating with EGEE

  13. There are many communities • High-energy physics • Astrophysics • Fusion • Computational chemistry • Biomed – biological and medical research • Health-e-Child – linking pediatric centers • WISDOM – “in silico” drug and vaccine discovery • Earth sciences • UNOSAT – satellite image analysis for the UN • Digital libraries • E-learning • Industrial partners in EGEE • CGGVeritas – geophysical services • Philips

  14. WLCG Tier-2 sites • > 140 computing centers • 35 countries • Hierarchical and regional organization • 12 large centers for primary data management • CERN = Tier-0 • 11 Tier-1 centers • 10 countries • Fast network links • 38 federations of smaller Tier-2 centers Tier-1 centers Canada TRIUMF Taiwan ASGC NL SARA- NIKHEF USA BNL Tier-0CERN UK RAL France CCIN2P3 Spain PIC Italy CNAF Nordic countries NDGF USA FNAL Germany FZK

  15. WLCG Tier-1 centers

  16. WLCG sites

  17. MSS MSS MSS The ATLAS computing model ~Pb/sec Event Builder Event Filter • Some data for calibration and monitoring to institutes • Calibrations flow back • Calibration • First processing Tier 0 T0 MSS Tier 1 • Reprocessing • Group analysis Spanish RegionalCentre (PIC) US Regional Centre UK Regional Centre (RAL) Italian RegionalCentre (CNAF) MSS 622Mb/s Tier 2 Tier2 S Centre Tier2 Centre Tier2 Centre Tier2 Centre • Analysis • Simulation Institute 2 Institute 3 Institute 4 Institute 1 Average Tier 2 has ~25 physicists working on one or more channels Physics data cache Desktop Workstations

  18. SECTION 2 Grid, Certificates and the ATLAS VO

  19. The VO Mechanism • The Virtual Organization mechanism provides a way to give authorization to the user during the task instantiation • VOs are used to organize the credentials of sets of users • When a user submit a request to the grid his/her credentials are compared with the informations coming from the VOMS (Virtual Organization Management Service) server • VOMS populated using the informations obtained from users and managed by a VO administrator • A user included in a VO will be authorized to use all of the resources assigned to that particular VO • Different user privileges, depending on the role and group affiliation • VO: A collection of people, resources, policies and agreements

  20. The VO implementation VOMS(Admin+Server) voms2.cern.ch OSG LCG NorduGrid CERN is currently providing 2 VOMS servers (one with the old lists and one with the new). BNL is combining info in the prod server. VOMS(Admin+Server) vo.racf.bnl.gov VOMS(Admin+Server) lcg-voms2.cern.ch VOMS-Admin (registration service) lcg-voms2.cern.ch bnl-atlas-sync Arrows signify dependencies (not dataflow)

  21. Certificates • Help about the ATLAS VO, certificates and CAs • https://www.racf.bnl.gov/docs/howto/grid/voatlas • The certificates usually come in an encrypted packaged form (pkcs12) • The .p12 files may be imported directly to the browsers • To access the grid services the certificate must be splitted into a user certificate and a private key • Both files will have to be stored into a directory called $HOME/.globus • To split a pkcs12 certificate called my_cert.p12 into the cert & key openssl pkcs12 -nokeys -clcerts -in my_cert.p12 -out usercert.pem openssl pkcs12 -nocerts -in my_cert.p12 -out userkey.pem chmod 644 usercert.pem chmod 600 userkey.pem • When generating usercert.pem and userkey.pem you’ll be asked for a password to protected them • This password will be used to submit jobs to the grid • To package the userkey.pem and usercert.pem into a pcks12 file, with name “My certificate” (optional, only used to select your certificate with a reasonable name in the browsers) openssl pkcs12 -export -inkey userkey.pem -in usercert.pem -out my_cert.p12 -name "My certificate"

  22. Proxies • A proxy is a temporary delegation of the user’s credentials to the services • You will be able to submit Grid/Panda jobs only when you have a valid proxy • By default the proxies have a validity of 12 hours • Maximum time allowed by the server is 96h • To open a proxy, from a grid-enabled machine voms-proxy-init -voms atlas • Open a proxy with a specific group or role voms-proxy-init -voms atlas:/atlas/it voms-proxy-init -voms atlas:/atlas/phys-higgs/Role=production • To check your proxy informations voms-proxy-info • To destroy a proxy voms-proxy-destroy

  23. How to choose the correct groups/roles in the VO • Groups • /atlas/usatlas • OSG (Open Science Grid) users only • Only US people may apply to this • /atlas/<country code> • Funding agencies • /atlas/phys,perf,trig,… • Physics and performance groups, oly needed for group analysis and privileged access to data • Roles • production • managers of the official ATLAS productions • pilot • Analysis pilots (PANDA)

  24. LCG resourcesauthorization tools User selects a VO resourcesauthorization tools OSG resources resourcesauthorization tools VO Manager The ATLAS VO registration process UserRegistration (VOMS-Admin) User NorduGrid Atlas VOMS Checks against theCERN HR database, notification to theVO administrator

  25. Atlas VO FAQs (incomplete list) • What should I do if I need to renew or change my certificate in the VO? • You need to add the new certificate DN via VOMS-Admin • https://voms2.cern.ch:8443/voms/atlas/admin/home.action • https://lcg-voms2.cern.ch:8443/voms/atlas/admin/home.action • What if I need to change VO or leave ATLAS? • You should contact the VO managers, in order to be unregistered from the old VO, and register again to the new VO • The VOMRS registration server rejects my registration attemp saying that the email I’m using does not correspond to any ATLAS user at CERN • Check if you are correctly registered as an ATLAS user at CERN • Use the email address you have registered at CERN • Other VO-related problem • Please contact project-lcg-vo-atlas-admin@cern.ch

  26. Use of the Grid in Atlas • In order to use the grid as an Atlas member you need • A personal certificate, correctly installed in your machine(see the ATLAS computing workbook https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/WorkBook) • To be correctly registered into the ATLAS VO • To have access to a grid-enabled front-end machine or just access to CVMFS • https://twiki.atlas-canada.ca/bin/view/AtlasCanada/ATLASLocalRootBase • Most common applications of the grid in Atlas • PANDA (Production AND Analysis) • http://bigpanda.cern.ch • Official MC productions and data reconstruction • User analysis • Distributed Storage • Monitoring and historical views (dasboards) • http://dashboard.cern.ch/atlas/ • All ADC monitoring pages • http://adc-monitoring.cern.ch/

  27. SECTION 3 Accessing data stored in the Grid

  28. Where are the files stored? • The production files are stored in several Storage Elements, scattered around the world • Direct access to the files is allowed for the ATLAS VO users • Different tools to access the files, depending of the file location • File access (sites with Posix filesystems) • Xrootd • https • Tools for • Searching for a specific file • Getting the file locally (local storage element or local machine)

  29. How are the files organized? • The filesstored in the sites are organized in reserved disk spaces, calledSpace Tokens • DATADISK/DATATAPE  Real data • GROUPDISK  Group Analysis data (nowincluded in DATADISK) • LOCALGROUPDISK  Local Analysis Group data • PRODDISK  Production data buffer • SCRATCHDISK  Temporaryanaysis data • Users can ask for replication in remote sites to LOCALGROUPDISK Space Tokens • Everybody can read from here, butonly the central data management systemisable to write • The otherspacetokens are notmanageable by users, onlycentral production can ask for replicationsthere • ALL the spacetokens are ATLAS-wide readable • The results of the analysisjobs must be stored in the ATLASSCRATCHDISKSpace Token • Cleaned from time to time (normally 30 days) • Users are responsible to replicate the data stored in SCRATCHDISK to the LOCALGROUPDISK of a given site

  30. Data Placement Tier-2 sites • PD2P • Dynamic placement of the data at the T2s, based on the data popularity • Up to now the jobsrun in the siteswhere data are stored • The localitywill break whenwewillfully use the remote accessprotocols, likeXrootd and Https Tier-1 centers Canada TRIUMF Taiwan ASGC NL SARA- NIKHEF USA BNL Tier-0CERN UK RAL France CCIN2P3 Spain PIC Italy CNAF Nordic countries NDGF USA FNAL Germany FZK Italian Cloud

  31. The Atlas Distributed Data Management system (Rucio) • File-level granularity • Rucio accounts • A Rucio user is identified by his credentials, like X509 certificates, username/password, or token • Data Identifiers • Files, datasets and containers follow an identical naming scheme which is composed of two strings: the scope and a name. The combination of both is called a data identifier (DI) • Replica management • Replica management is based on replication rules defined on logical files • Accounting and quota • Quota is a policy limit which the system applies to an account • Rucio accounts will only be accounted for the files they set replication rules on

  32. Getting files using DDM • The ATLAS Distributed Data Management system, codenamed Rucio • http://rucio.cern.ch/ • Using DQ2, assuming you already have a grid enviroment and a valid VOMS proxy (with a nickname) • Setup DQ2 (in a clean shell, or use the rucio clients with localSetupRucio) export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh’ setupATLAS localSetupRucioClients • Get the list of the datasets matching a pattern (wildcards like ‘*’ are allowed) rucio list-dids <dataset name> • Get the list of the files in a dataset rucio list-files <dataset name> • Get the path of the files in a dataset rucio list-file-replicas [--rse <RSE>] <dataset name> • Get a dataset(if the destination directory is omitted the files are copied in the local directory) rucio download <dataset name> • Users cannot access files on tape or on Tier0 privileged pools • The files must be replicated to some other location before they can be accessed • Replication is possible for all the files and data to be accessed locally on remote sites • Destination must be LOCALGROUPDISK space token

  33. The Rucio UI • Rucio UI: the new user access point to DataTransfer • http://rucio.cern.ch/ • Monitoring of subscriptions/rules, spacetoken usage, etc. • R2D2: the new Data Transfer interface • Reports

  34. SECTION 4 Setting up the ATLAS software from CVMFS

  35. CVMFS • Dynamic software distribution model via CVMFS • Virtual software installation by means of an HTTP File System • Data Store • Compressed Chunks (Files) • Eliminates Duplicates • File Catalog • Directory Structure • Symlinks • SHA1 of Regular Files • Digitally Signed • Time to Live • Nested Catalogs • Distribution of the condition files via CVMFS • Export the experiment software as read-only • Mounted in the remote nodes via the fuse module • Local cache for faster access • Benefits of a squid hierarchy to guarantee performance, scalability and reliability • Same squid type as the one used for Frontier

  36. Setting up the ATLAS software via CVMFS [1] • Simple setup via ATLASLocalRootBase export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' setupATLAS • Many software componentsavailable lsetuplsetup <tool1> [ <tool2> ...] (seelsetup -h): lsetupagis ATLAS Grid Information System lsetupasetup (or asetup) to setup an Athena release lsetupatlantisAtlantis: event display lsetupeiclientEvent Index lsetup emi EMI: gridmiddlewareuserinterface lsetup fax FederatedXRootD data storageaccess (FAX) lsetup ganga Ganga: job definition and management client lsetuplcgenvlcgenv: setup tools from cvmfs SFT repository lsetup panda Panda: Production ANd Distributed Analysis lsetuppodProof-on-Demand (obsolete) lsetuppyamipyAMI: ATLAS Metadata Interface python client lsetuprcsetup (or rcSetup) to setup an ASG release lsetuproot ROOT data processing framework lsetupruciodistributed data management system client lsetupsft setup tools from SFT repo (use lcgenvinstead) lsetupxrootdXRootD data access advancedToolsadvancedtools menu diagnosticsdiagnostictools menu helpMe more help printMenu show this menu showVersions show versions of installed software

  37. Setting up the ATLAS software via CVMFS [2] • Pros • CVMFS is available in all the sites • ATLASLocalRootBase is sharing the releases with the standard production/analysis jobs • Nightlies available and updated every night in CVMFS • Simple setup, access to all the tools needed for the analysis • Even the gridd middleware can be setup • Constantly updated • Software pre-configured statically or dynamically configuring • Automatic local site settings • Faster access and less disk space used than traditional shared filesystems • Cons • CVMFS is not available offline, unless the files are already in cache

  38. SECTION 5 Working with FAX

  39. FAX: the Federated ATLAS Storage System using XRootD • FAX (Federated ATLAS storage systems using XRootD) brings Tier 1, Tier 2 and Tier 3 storage resources together into a common namespace, accessible from anywhere • Based on the XRootD protocol and data distribution infrastructure • Client software tools like ROOT or xrdcp can use FAX to reach storage services regardless of location • Increases in network bandwidth and data structure aware caching mechanisms (such as TTreeCache) make this possible • FAX can be used as failover in jobs (not enabled by default) Goal reached ! >96% data covered ATLAS Jamboree – Dec 2014 39

  40. Using FAX: basic introduction [1] • Pre-requisites • CVMFS • VOMS proxy • Set up FAX using CVMFS and ROOT export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh localSetupFAX --rootVersion=current-SL6 • Check dataset availability in FAX fax-is-dataset-covered <scope>:<dataset name> Example: fax-is-dataset-covered mc15_13TeV:mc15_13TeV.410000.PowhegPythiaEvtGen_P2012_ttbar_hdamp172p5_nonallhad.merge.DAOD_TOPQ1.e3698_s2608_s2183_r6765_r6282_p2413 • Copy a dataset to the local storage • Supports multiple streams, retries, partial dataset copy, skipping non-root files, timeouts and more fax-get <scope>:<dataset name>Example: fax-get mc15_13TeV:mc15_13TeV.410000.PowhegPythiaEvtGen_P2012_ttbar_hdamp172p5_nonallhad.merge.DAOD_TOPQ1.e3698_s2608_s2183_r6765_r6282_p2413 • Find global logical file names (gLFNs) fax-get-gLFNs <scope>:<dataset name> Example: fax-get-gLFNs mc15_13TeV:mc15_13TeV.410000.PowhegPythiaEvtGen_P2012_ttbar_hdamp172p5_nonallhad.merge.DAOD_TOPQ1.e3698_s2608_s2183_r6765_r6282_p2413

  41. Using FAX: basic introduction [2] • Copy a file from FAX to local disk • The $STORAGEPREFIX depends on your local storage element xrdcp $STORAGEPREFIX/atlas/rucio/<scope>:<file name> /tmp/myLocalCopy.root • Open and inspect a file with ROOT • Using a FAX enabled storage element TFile *f = TFile::Open("root://grid-cert-03.roma1.infn.it//atlas/rucio/<scope>:<file name>") • Using a redirector (her using the IT redirector) TFile *f = TFile::Open("root://atlas-xrd-it.cern.ch//atlas/rucio/<scope>:<file name>") Example:TFile *f = TFile::Open("root://atlas-xrd-it.cern.ch//atlas/rucio/mc15_13TeV:DAOD_TOPQ1.06405917._000001.pool.root.1") • Using FAX from a prun job • Instead of giving the --inDSmyDataset option, provide it with --pfnListmy_list_of_gLFNS.txt, where my_list_of_gLFNS.txt is the output of fax-get-gLFNs

  42. SECTION 6 Links and contacts

  43. Documentation • Wiki pages • ATLAS Computing Workbook • https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/WorkBook • ATLAS Distributed Data Management • https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DistributedDataManagement • Distributed Analysis Support (DAST) • https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/AtlasDAST • ATLASLocalRootBase • https://twiki.atlas-canada.ca/bin/view/AtlasCanada/ATLASLocalRootBase • LCG/EGEE and the ATLAS VO • https://www.racf.bnl.gov/docs/howto/grid/voatlas

  44. Contacts • Software distribution • atlas-grid-install@cern.ch • Atlas VO • project-lcg-vo-atlas-admin@cern.ch • DDM • https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMOperationsGroup • LCG • http://www.ggus.org • Production System • atlas-project-adc-operations@cern.ch • Distributed Analysis Support • hn-atlas-dist-analysis-help@cern.ch • ATLAS Italy computing contacts • atlas-it-t2-op@lists.infn.it (T2 support) • atl-usercalc@lists.infn.it (ATLAS Italy Computing list)

More Related