Efficient Management of Distributed Computing Resources with DIRAC

Managing distributed computing resources with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 12-17 September 2011, NEC’11, Varna

Outline • DIRAC Overview • Main subsystems • Workload Management • Request Management • Transformation Management • Data Management • Use in LHCb and other experiments • DIRAC as a service • Conclusion

Introduction • DIRAC is first of all a framework to build distributed computing systems • Supporting Service Oriented Architectures • GSI compliant secure client/service protocol • Fine grained service access rules • Hierarchical Configuration service for bootstrapping distributed services and agents • This framework is used to build all the DIRAC systems: • Workload Management • Based on Pilot Job paradigm • Production Management • Data Management • etc

Production Manager Physicist User Matcher Service EGI/WLCG Grid CREAM CE GISELA Grid NDG Grid EGEE Pilot Director NDG Pilot Director EELA Pilot Director CREAM Pilot Director

User credentials management • The WMS with Pilot Jobs requires a strict user proxy management system • Jobs are submitted to the DIRAC Central Task Queue with credentials of their owner (VOMS proxy) • Pilot Jobs are submitted to a Grid WMS with credentials of a user with a special Pilot role • The Pilot Job fetches the user job and the job owner’s proxy • The User Job is executed with its owner’s proxy used to access SE, catalogs, etc • The DIRAC Proxy manager service ensures the necessary functionality • Proxy storage and renewal • Possibility to outsource the proxy renewal to the MyProxy server

Direct submission to CEs • Using gLite WMS now just as a pilot deployment mechanism • Limited use of brokering features • For jobs with input data the destination site is already chosen • Have to use multiple Resource Brokers because of scalability problems • DIRAC is supporting direct submission to CEs • CREAM CEs • Can apply individual site policy • Site chooses how much load it can take (Pull vs Push paradigm) • Direct measurement of the site state watching the pilot status info • This is a general trend • All the LHC experiments declared abandoning eventually gLite WMS

DIRAC sites On-site Director Off-site Director • Dedicated Pilot Director per (group of) site(s) • On-site Director • Site managers have full control • Of LHCb payloads • Off-site Director • Site delegates control to the central service • Site must only define a dedicated local user account • The payload submission through the SSH tunnel • In both cases the payload is executed with the owner credentials

DIRAC Sites • Several DIRACsites in productionin LHCb • E.g. Yandex • 1800 cores • Second largest MC production site • Interesting possibility for small user communities or infrastructures e.g. • contributing local clusters • building regional or university grids

WMS performance • Up to 35K concurrent jobs in ~120 distinct sites • Limited by the resources available to LHCb • 10 mid-range servers hosting DIRAC central services • Further optimizations to increase the capacity are possible • Hardware, database optimizations, service load balancing, etc

Belle (KEK) use of the Amazon EC2 • VM scheduler developed for Belle MC production system • Dynamic VM spawning taking spot prices and TQ state into account Thomas Kuhr, Belle

Belle Use of the Amazon EC2 • Various computing resource combined in a single production system • KEK cluster • LCG grid sites • Amazon EC2 • Common monitoring, accounting, etc Thomas Kuhr, Belle II

Belle II Raw Data Storage and Processing • Starting at 2015 after the KEK update • 50 ab-1 by 2020 • Computing model • Data rate 1.8 GB/s ( high rate scenario ) • Using KEK computing center, grid and cloud resources • Belle II distributed computing system is based on DIRAC MC Production and Ntuple Production Ntuple Analysis Thomas Kuhr, Belle II

Support for MPI Jobs • MPI Service developedfor applications in theGISELA Grid • Astrophysics, BioMed,Seismology applications • No special MPI support onsites is required • MPI software installed by Pilot Jobs • MPI ring usage optimization • Ring reuse for multiple jobs • Lower load on the gLite WMS • Variable ring sizes for different jobs • Possible usage for HEP applications: • Proof on demand dynamic sessions

Coping with failures • Problem: distributed resources and services are unreliable • Software bugs, misconfiguration • Hardware failures • Human errors • Solution: redundancy and asynchronous operations • DIRAC services are redundant • Geographically: Configuration, Request Management • Several instances for any service

Request Management system • A Request Management System (RMS) to accept and execute asynchronously any kind of operation that can fail • Data upload and registration • Job status and parameter reports • Request are collected by RMS instances on VO-boxes at 7 Tier-1 sites • Extra redundancy in VO-box availability • Requests are forwarded to the central Request Database • For keeping track of the pending requests • For efficient bulk request execution

DIRAC Transformation Management • Data driven payload generation based on templates • Generating data processing and replication tasks • LHCb specific templates and catalogs

Data Management • Based on the Request Management System • Asynchronous data operations • transfers, registration, removal • Two complementary replication mechanisms • Transfer Agent • user data • public network • FTS service • Production data • Private FTS OPN network • Smart pluggable replication strategies

Transfer accounting (LHCb)

ILC using DIRAC • ILC CERN group • Using DIRAC Workload Management and Transformation systems • 2M jobs run in the first year • Instead of 20K planned initially • DIRAC FileCatalog was developed for ILC • More efficient than LFC for common queries • Includes user metadata natively

DIRAC as a service • DIRAC installation shared by a number of user communities and centrally operated • EELA/GISELA grid • gLite based • DIRAC is part of the grid production infrastructure • Single VO • French NGI installation • https://dirac.in2p3.fr • Started as a service for grid tutorials support • Serving users from various domains now • Biomed, earth observation, seismology, … • Multiple VOs

DIRAC as a service • Necessity to manage multiple VOs with a single DIRAC installation • Per VO pilot credentials • Per VO accounting • Per VO resources description • Pilot directors are VO aware • Job matching takes pilot VO assignment into account

DIRAC Consortium • Other projects are starting to use or evaluating DIRAC • CTA, SuperB, BES, VIP(medical imaging), … • Contributing to DIRAC development • Increasing the number of experts • Need for user support infrastructure • Turning DIRAC into an Open Source project • DIRAC Consortium agreement in preparation • IN2P3, Barcelona University, CERN, … • http://diracgrid.org • News, docs, forum

Conclusions • DIRAC is successfully used in LHCb for all distributed computing tasks in the first years of the LHC operations • Other experiments and user communities started to use DIRAC contributing their developments to the project • The DIRAC open source project is being built now to bring the experience from HEP computing to other experiments and application domains

Backup slides

LHCb in brief Experiment dedicated to studying CP-violation Responsible for the dominance of matter on antimatter Matter-antimatter difference studied using the b-quark (beauty) High precision physics (tiny difference…) Single arm spectrometer Looks like a fixed-target experiment Smallest of the 4 big LHC experiments ~500 physicists Nevertheless, computing is also a challenge….

LHCb Computing Model

Tier0 Center • Raw data shipped in real time to Tier-0 • Resilience enforced by a second copy at Tier-1’s • Rate: ~3000 evts/s (35 kB) at ~100 MB/s • Part of the first pass reconstruction and re-reconstruction • Acting as one of the Tier1 center • Calibration and alignment performed on a selected part of the data stream (at CERN) • Alignment and tracking calibration using dimuons (~5/s) • Used also for validation of new calibration • PID calibration using Ks, D* • CAF – CERN Analysis Facility • Grid resources for analysis • Direct batch system usage (LXBATCH) for SW tuning • Interactive usage (LXPLUS)

Tier1 Center • Real data persistency • First pass reconstruction and re-reconstruction • Data Stripping • Event preselection in several streams (if needed) • The resulting DST data shipped to all the other Tier1 centers • Group analysis • Further reduction of the datasets, μDST format • Centrally managed using the LHCb Production System • User analysis • Selections on stripped data • Preparing N-tuples and reduced datasets for local analysis

Tier2-Tier3 centers • No assumption of the local LHCb specific support • MC production facilities • Small local storage requirements to buffer MC data before shipping to a respective Tier1 center • User analysis • No assumption of the user analysis in the base Computing model • However, several distinguished centers are willing to contribute • Analysis (Stripped) data replication to T2-T3 centers by site managers • Full or partial sample • Increases the amount of resources capable of running User Analysis jobs • Analysis data at T2 centers available to the whole Collaboration • No special preferences for local users

Efficient Management of Distributed Computing Resources with DIRAC

Efficient Management of Distributed Computing Resources with DIRAC

Presentation Transcript

Distributed Computing with Python

Distributed computing

Distributed Computing

DISTRIBUTED COMPUTING

Distributed Computing

DIRAC Distributed Computing Services

DISTRIBUTED COMPUTING

Distributed Computing

Distributed Computing

Distributed Computing with Adaptive Heuristics

Managing Cloud Resources: Distributed Rate Limiting

Distributed Computing with DAFFIE

Distributed Computing with DAFFIE

Distributed Computing

DISTRIBUTED COMPUTING

Distributed Computing

Distributed Computing

Managing Requirements with Distributed Teams

Distributed computing

Distributed Computing Resources