Distributed Data Access and Resource Management in D0 SAM System at Fermi National Accelerator Lab

Distributed Data Access and Resource Management in the D0 SAM System Terekhov, Fermi National Accelerator Laboratory,for the SAM project:L.Carpenter, L.Lueking, C.Moore, J.Trumbo, S.Veseli, M.Vranicar, S.White, V.White

Plan of Attack • The domain • D0 overview and applications • SAM as a Data Grid • Metadata • File replication • Initial resource management • SAM and generic Grid technologies • Comprehensive resource management

D0: A Virtual Organization • High Energy Physics (HEP) collider experiment, multi-institutional • Collaboration of 500+ scientists, 72+ institutions, 18+ countries • Physicists generate, analyze data • Coordinated resource sharing (networks, MSS, etc) for common problem (physics analysis) solving

Applications and Data Intensity • Real data taking from the detector • Monte-Carlo data simulation • Reconstruction • Analysis • The gist of experimental HEP • Extremely I/O intensive • Recurrent processing of datasets: caching highly beneficial

Data Handling as the Core of D0 Meta-Computing • HEP Applications are data-intensive (see below) • Computational Economy is extremely data-centric b/c costs are driven by DH resources • SAM: primarily and historically a DH system: a working Data Grid prototype • Job control inclusion is in the Grid context (the D0-PPDG project)

SAM as a Data Grid High Level Services Replication Cost Estimation Replica Selection Data Replication Comprehensive Resource Management Generic Grid Services Core Services Resource Management Mass Storage Systems Metadata (External to SAM) Based on: A.Chervenak, I.Foster, C. Kesselman, C.Salisbury, S.Tuecke, The Data Grid: Towards an Architecture for the Distributed Management And Analysis of Large Scientific Datasets, To appear in Journal of Network and Computer Applications

Standard Grid Metadata • Application metadata • creation info and processing history • data types (tiers, streams, etc D0-specific) • files are self-describing • Replica metadata • each file has zero or more locations • volume ID’s and location details for RM - part of the interface with Mass Storage System

Standard Grid Metadata, cont’d • System Configuration Metadata • HW configuration: locations and capacities of disks and tapes (network and disk bandwidths) • resource ownership and allocation: • partition of disk, MSS bandwidths, etc by group • fair shares parameters for resource allocation and job scheduling (FSAS) • cost criteria (weight factors) for FSAS

Advanced Metadata • Dataset management (to a great advantage of the user) • Job history (crash recovery mechanisms) • File replica access history (used by RM) • Resource utilization history (persistency in RM and accountability) • See our complete data model for more details

Data Replica Management • Processing Station is a (locally distributed, semi-autonomous) collection of HW resources (disk, CPU, etc). A SW component • Local data replication for parallel processing in a single batch system - within a Station • Global data replication - worldwide data exchange among Stations and MSS’s

Local Data Replication • consider a cluster, physically distributed disk cache • logical partitioning by research groups • each group executes independent cache replacement algorithm (FIFO, LRU, many flavors) • Replica catalog is updated in the course of the cache replacement • Access history of each local replica is maintained persistently in the MD

Local Data Replication, cont’d • While Resource Managers strive to have jobs and their data being in proximity (see below), the Batch System does not always dispatch jobs wherever the data lies • Station executes intra-cluster data replication on demand, fully user-transparently

User (producer) Routing + Caching = Global Replication Mass Storage System Station Site Replica WAN Data flow

Principles of Resource Management • Implement experiment policies on prioritization and fair sharing in resource usage, by user categories (access modes, research group etc) • Maximize throughput in terms of real work done (i.e. user jobs and not system internal jobs such as data transfers)

Fair Sharing • Allocation of resources and scheduling of jobs • The goal is to ensure that, in a busy environment, each abstract user gets a fixed share of “resources” or gets a fixed share of “work” done

FS and Computational Economy • Jobs, when executed, incur costs (through resource utilization) and realize benefits (through getting work done) • Maintain a tuple (vector) of cumulative costs/benefits for each abstract user and compare them to his allocated fair share to set priority higher/lower • Incorporated all known resource types and benefit metrics, totally flexible

The Hierarchy of Resource Managers Sites Connected by WAN Global RM Experiment Policies, Fair Share Allocations, Cost Metrics Stations And MSS’s Connected By LANs Site RM Batch queues and disks Station – Local RM

Job Control: Station Integration with the Abstract Batch System Sam submit Job Manager (Project Master) Local RM (Station Master) invoke Client jobEnd submit setJobCount/stop Sam condition satisfied Process Manager (SAM wrapper script) Batch System User Task dispatch invoke resubmit

Cached data, File transfer queues, Site RM weather conditions SAM as a Data Grid High Level Services Replication Cost Estimation DH-Batch system integration, Fair Share Allocation, MSS access control Network access control Replica Selection Preferred locations Caching, Forwarding, Pinning Data Replication Comprehensive Resource Management Core Services Mass Storage Systems Metadata Resource Management Replica catalog, System configuration, Cost/Benefit metrics (External to SAM) Batch System internal RM MSS internal RM (External to SAM)

SAM Grid Work (D0-PPDG) • Enhance the system by adding Grid services (Grid authentication, replica selection, etc) • Adapt the system to generic Grid services • Replace proprietary tools and internal protocols with those standard to the Grid • Collaborate with Computer Scientists to develop new Grid technologies, use SAM as a testbed for testing/validating them

Initial PPDG Work: Condor/SAM Job Scheduling, Preliminary Architecture Job Management Condor MMS Condor Condor/SAM-Grid adapter CondorG Standard Grid Protocols Costs of job placements? Schedule Jobs SAM/Condor-Grid adapter SAM Data Management Data and DH Resources Sam submit SAM Abstract Batch System

Conclusions • D0 SAM is not only a production meta-computing system, but a functioning Data Grid prototype, with data replication and resource management being in advanced/mature stage • Work continues to fully Grid-enable the system • Some of our components/services will hopefully be of interest to the Grid community

Distributed Data Access and Resource Management in D0 SAM System at Fermi National Accelerator Lab

Distributed Data Access and Resource Management in D0 SAM System at Fermi National Accelerator Lab

Presentation Transcript

Distributed Linear Programming and Resource Management for Data Mining in Distributed Environments

Resource Management in Distributed Systems

Distributed Resource Management: Distributed Shared Memory

Distributed Data Management

Resource Management in Distributed Systems: Distributed File Systems

Resource Management in Cloud Computing and Distributed Computing

Distributed Access Management

Storage Resource Broker Persistent Management of Distributed Data

Resource management system for distributed environment

QoS-aware Resource Management in Distributed System

Intelligent Distributed Data Management in Earth System Science

The Data Access Layer for D0 Run II Design and Features of SAM

Data Management, Storage and Access Optimization in High Performance Distributed Environment

Intelligent Distributed Data Management in Earth system science

Distributed Data Management

Distributed Data Management

SAM and D0 Grid Computing

Distributed Resource Management: Distributed Shared Memory

D0 SAM – status and needs

Resource Management in Distributed Systems

SAM for D0 - a Fully Distributed Data Access System