ATLAS Computing TDR

ATLAS ComputingTDR Lamberto Luminari CSN1 – Napoli, 22 Settembre 2005

Computing TDR: modifiche al Comp. Model • Giorni di operazione nel 2007: • 100 -> 25-50 • Accesso alle risorse nei vari centri: • Access to the Tier-0 facility is granted only to people in the central production group and those providing the first-pass calibration. • Access to the Tier-1 facilities is essentially restricted to the production managers of the working groups and to the central production group for reprocessing. • In principle, all members of the ATLAS virtual organisation have access to a given Tier-2. In practice (and for operational optimization), heightened access to CPU and resources may be given to specific working groups at a particular site, according to a local policy agreed with the ATLAS central administration in a way that the ATLAS global policy is enforced over the aggregate of all sites. An example may be that DPD for the Higgs working group may be replicated to a subset of Tier-2 facilities, and the working group members have heightened access to those facilities. Lamberto Luminari - ATLAS Comp.

Computing TDR: modifiche al Comp. Model (2) • Tier-3 Resources • There will be a continuing need for local resources within an institution to store user ntuple-equivalents and allow work to proceed off the Grid. Clearly, the user expectations will grow for these facilities, and a site would already provide typically terabytes of storage for local use. Such ‘Tier-3’ facilities (which may be collections of desktops machines or local institute clusters) should be Grid-enabled, both to allow job submission and retrieval from the Grid, and to permit resources to be used temporarily and with agreement as part of the Tier-2 activities. Such resources may be useful for simulation or for the collective analysis of datasets shared with a working group for some of the time. The size of Tier-3 resources will depend on the local user community size and other factors, such as any specific software development or analysis activity foreseen in a given institute, and are therefore neither centrally planned nor controlled. It is nevertheless assumed that every active user will need O(1 TB) of local disk storage and a few kSI2k of CPU capacity to efficiently analyse ATLAS data. Lamberto Luminari - ATLAS Comp.

Computing TDR: AOD production • … As AOD events will be read many times more often than ESD and RAW data, AOD events are physically clustered on output by trigger or physics channel or other criteria that reflect analysis access patterns. This means that an AOD production job, unlike an ESD production job, produces many output files. The baseline streaming model is that each AOD event is written to exactly one stream: AOD output streams comprise a disjoint partition of the run. All streams produced in first-pass reconstruction share the same definition of AOD. On the order of 10 streams are anticipated in first-pass reconstruction… • … Alternate models have been considered, and could also be viable. It is clear from the experience of the TeVatron experiments that a unique solution is not immediately evident. The above scenario reflects the best current understanding of a viable scheme, taking into account the extra constraints of the considerably larger ATLAS dataset. It relies heavily on the use of event collections and the TAG system. These methods are only undergoing their first serious tests at the time of writing. However, the system being devised is flexible, and can (within limits) sustain somewhat earlier event streaming and modestly overlapping streams without drastic technical or resource implications. Lamberto Luminari - ATLAS Comp.

Computing TDR: Offline software Several orthogonal domain decompositions have been identified: • The first spans the ATLAS detector subsystems: • Inner detector ( pixel det. + silicon strip det. + transition radiation tracker). • Liquid argon calorimeter. • Tile calorimeter. • Muon spectrometer. The primary data processing activities that must be supported for all of these detector subsystems are: • Event generation, simulation, digitization, pile-up, detector reconstruction, combined reconstruction, physics analysis, high level triggering, online monitoring, calibration and alignment processing. • Further domain decompositions cover the infrastructure needed to support the software development activity, and components that derive from the overall architectural vision. The overall structure is the following: • Framework and Core Services (event processing framework based on plug-compatible components and abstract interfaces). • Event generators, simulation, digitization and pile-up. • Event selection, reconstruction and physics analysis tools. • Calibration and alignment. • Infrastructure (services that support the software development process). Lamberto Luminari - ATLAS Comp.

Offline software: Athena Component Model Lamberto Luminari - ATLAS Comp.

Offline software: Athena Major components • Application Manager: the overall driving intelligence that manages and coordinates the activity of all other components within the application. • Algorithms and Sequencers: algorithms provide the basic per-event processing capability of the framework. A Sequencer is a sequence of Algorithms, each of which might itself be another Sequencer. • Tools: a tool is similar to an Algorithm, but differs in that it can be executed multiple times per event. • Transient Data Stores: all the data objects are organized in various transient data stores depending on their characteristics and lifetimes (e.g. event data, detector conditions data, etc…) • Services: provide services needed by the Algorithms. In general these are high-level, designed to support the needs of the physicist. Examples are the message-reporting system, different persistency services, random-number generators, etc. • Selectors: components that perform selection (e.g., the Event Selector provides functionality for selecting the input events that the application will process. • Converters: responsible for converting data from one representation to another. One example is the transformation of an object from its transient form to its persistent form and vice versa. • Utilities: C++ classes that provide general support for other components. Lamberto Luminari - ATLAS Comp.

Offline software: Simulation data flow Lamberto Luminari - ATLAS Comp.

Offline software: Reconstruction chains Lamberto Luminari - ATLAS Comp.

Offline Software for HLT and Monitoring Lamberto Luminari - ATLAS Comp.

Computing TDR: Databases and Data Man. (Project) • There are two broad categories of data storage in ATLAS: file-based data and database-resident data or more specifically, relational database-resident data. The two storage approaches are complementary and are used in appropriate contexts in ATLAS: • File storage is used for bulky data such as event data and large conditions data volumes; for contexts in which the remote connectivity (usually) implied by database storage is not reliably available; and generally for cases where simple, lightweight storage is adequate. • Database storage is used where concurrent writes and transactional consistency are required; where data handling is inherently distributed, typically with centralized writers and distributed readers; where indexing and rapid querying across moderate data volumes is required; and where structured archival storage and query-based retrieval is required. Vendor neutrality in the DB interface (with implemented support for Oracle, MySQL and SQLite) has been addressed through the development of the Relational Access Layer (RAL) within the POOL project. • COOL (developed in a collaboration between LCG Application Area and ATLAS) is another DB-based storage service layered over RAL and is the basis for ATLAS conditions data storage. It provides for interval-of-validity based storage and retrieval of conditions. Lamberto Luminari - ATLAS Comp.

Computing TDR: Databases and Data Management • Use of the conditions database online for subdetector and HLT configuration presents considerable performance challenges. Parallel read performance is beyond the capacity of one database server and replication will have to be used to share the load amongst many slave servers: • One interesting possibility comes from the Frontier project, developed to dis-tribute data using a web-caching technology, where database queries are translated into http requests for web-page content, which can be cached using conventional web proxy server technology. This is particular suitable for distributed read-only access, when updates can be forced by flushing the proxy caches,. • Conditions data will also have to be distributed worldwide, for subsequent reconstruction passes, user analysis and subdetector calibration tasks: • The LCG 3D (Distributed Deployment of Data-bases) project is prototyping the necessary techniques, based on conventional database replication, with an architecture of Oracle servers at Tier 0 (CERN) and Tier-1 centres, and MySQL-based replicas of subsets of the data at Tier-2 sites and beyond. The use of the RAL database backend-independent access library by COOL and other database applications will be particularly important here, to enable such cross-platform replication. Lamberto Luminari - ATLAS Comp.

Computing TDR: GRID-based prod. system Local cat. Global cat. prodDB AMI dms (data man. system) Don Quijote Windmill supervisor supervisor supervisor supervisor supervisor LCG executor LCG executor NG executor G3 executor LSF executor Lexor Capone Dulcinea RLS RLS RLS LCG NG Grid3 LSF Lamberto Luminari - ATLAS Comp.

Production system performances Jobs per day on the LCG-2 infrastructure Rome prod DC2 Lamberto Luminari - ATLAS Comp.

Computing TDR: Tier-0 Operations Lamberto Luminari - ATLAS Comp.

Replica dei dati • RAW: • Una replica completa dei raw data risiede nei Tier-1 (~1/10 per Tier1) • Campioni di eventi sono memorizzati anche nei Tier-2e, in misura minore, nei Tier3 • ESD: • Tutte le versioni degli ESD sono replicate e risiedono in almeno due dei Tier1 • Gli ESD primari e i RAW data associati sono assegnati ai ~10 Tier1 con un meccanismo di roundrobin • Campioni di eventi sono memorizzati anche nei Tier2e, in misura minore, nei Tier3 • AOD: • Sono replicati completamente in ogni Tier1e parzialmente nei Tier-2 (~1/3 – 1/4). • Alcune stream possono essere memorizzate nei Tier3 • TAG: • I database dei TAG sono replicati in tutti i Tier1e Tier-2 • DPD: • Nei Tier1,Tier2e Tier3 ~PB/sec Event Builder 10 GB/sec Event Filter~7.5 MSI2k 320 MB/sec Tier05. MSI2k - 5 PB/y ~ 75MB/s Tier18. MSI2k - 2 PB/y ~10 622Mb/s links Tier2~1.5 MSI2k ~4/Tier1 622Mb/s links Tier3 Lamberto Luminari - ATLAS Comp.

Computing TDR: Resource Requirement Evolution Tier-0 CAF Lamberto Luminari - ATLAS Comp.

Comp. TDR: Resource Requirement Evolution (2) Tier-1 Tier-2 Lamberto Luminari - ATLAS Comp.

Computing System Commissioning • Richiesti dai gruppi di fisica 108 eventi simulati con il last layout e con le conoscenze sulla risposta dei rivelatori dai run di cosmici, da studiare a fondo prima della partenza del run a luglio 2007 • 6 mesi di produzioni sostenute a partire da fine estate 2006 • Risorse di calcolo necessarie (calcolate a partire dalla partecipazione alle attivita’ per il Physics workshop con simulazione, ricostruzione e analisi di 7*106 eventi = 15 volte piu' eventi in un tempo ~4 volte piu’ lungo): • 4 * potenza di calcolo disponibile per il Physics workshop Lamberto Luminari - ATLAS Comp.

Milestone 2006 • 1. Gennaio 2006: • * production release per il commissioning del sistema di computing e • studi iniziali sui raggi cosmici • * completamento dell'implementazione dell'Event Data Model per la • ricostruzione • 2. Febbraio 2006: • * inizio del Data Challenge 3, anche chiamato Commissioning del sistema • di computing (Computing System Commissioning) • 3. Aprile 2006: • * integrazione dei componenti di ATLAS con il Service Challenge 4 • di LCG • 4. Luglio 2006: • * production release per i run di raggi cosmici (autunno 2006) • 5. Dicembre 2006: • * production release per i primi data reali con i protoni. Lamberto Luminari - ATLAS Comp.

Attività prevista nei centri italiani • Ricostruzione: • Muon Detector (LE, NA, PV), Calorimetri (MI, PI), Pixel Detector (MI) • Calibrazioni/allineamento/detector data: • MDT (LNF, RM1-3), RPC (LE, NA, RM2), Calorimetri (MI, PI), Pixel Detector (MI) • Cond. DB (CS), Det. Descr. DB (LE, PI), Det. Mon. (CS, NA, UD) • Studi di performance: • Muoni (CS, LE, LNF, NA, PI, PV, RM1-2-3) • Tau/jet/EtMiss/egamma (GE, MI, PI) • Analisi: • Higgs sia SM che MSSM (CS, LNF, MI, PI, PV, RM1) • Susy (LE, MI, NA) • Top (PI, UD) • Fisica del B (CS, GE, PI) • Simulazioni connesse alle attività suddette • Studi sul modello di analisi ~PB/sec Event Builder 10 GB/sec Event Filter~7.5 MSI2k 320 MB/sec Tier05. MSI2k - 5 PB/y ~ 75MB/s Tier18. MSI2k - 2 PB/y ~10 622Mb/s links Tier2~1.5 MSI2k ~4/Tier1 622Mb/s links Tier3 Lamberto Luminari - ATLAS Comp.

Risorse necessarie nei Tier-2/3 italiani • Nei Tier-2: • Simulazioni per computing system commissioning • Copia degli AOD (108 eventi * 100KB = 10 TB) con diversi sistemi di streaming (esclusivi e inclusivi) per (studi del modello di) analisi • Campioni di eventi in formato RAW e ESD per calibrazioni e sviluppo algoritmi di ricostruzione • Calibration centers • Attività di analisi organizzate • 450 KSI2K (250 già disponibili a fine 2005) • 80 TB (30 già disponibili a fine 2005) • Nei Tier-3: • Attività di analisi individuali e caotiche • 40 KSI2K • 10 TB ~PB/sec Event Builder 10 GB/sec Event Filter~7.5 MSI2k 320 MB/sec Tier05. MSI2k - 5 PB/y ~ 75MB/s Tier18. MSI2k - 2 PB/y ~10 622Mb/s links Tier2~1.5 MSI2k ~4/Tier1 622Mb/s links Tier3 Lamberto Luminari - ATLAS Comp.

Risorse complessive dei Tier-2 ATLAS (Comp. TDR) Lamberto Luminari - ATLAS Comp.

Valutazione costi Tier-2 (acquisto anno corrente) Lamberto Luminari - ATLAS Comp.

Progetti di realizzazione dei Tier-2 • 7/9 Identificazione referenti locali ATLAS per i progetti • 12/9 Formazione commissione tecnica di supporto • 12-20/9 Input da coordinatori di attività e da commissione tecnica di supporto • 23/9Primo draft progetti locali • 26-29/9 Esame preliminare progetti e feedback • 30/9 Versione “completa”(?) dei progetti • 3-4/10 Workshop Comm. Calcolo -> verifica “tecnica” progetti • 5/10 Riunione (virtuale) discussione progetti • 10/10 CSN1 Lamberto Luminari - ATLAS Comp.

Richieste 2006 (non Tier-2) Lamberto Luminari - ATLAS Comp.

Stima evoluzione dei costi utilizzata Lamberto Luminari - ATLAS Comp.

ATLAS jobs run at each LCG site Lamberto Luminari - ATLAS Comp.

ATLAS Computing TDR

ATLAS Computing TDR

Presentation Transcript

The ATLAS Computing Challenge

ATLAS computing in Geneva

ATLAS: Computing Model Document

ALICE Computing TDR

The ATLAS Computing Model

ATLAS computing in Geneva

ATLAS and Grid Computing

US ATLAS Computing Operations

The ATLAS Computing Model

ATLAS computing in Russia

ATLAS Computing

ATLAS Computing. Introduction + Overview

ATLAS Distributed Computing

The ATLAS Computing Model

The LHCb Computing TDR

CMS Computing TDR (e non solo…)

U.S. ATLAS Computing

ATLAS Computing

US ATLAS Computing Operations

The ATLAS Computing Model

ATLAS Distributed Computing Tutorial