POOL Project Overview

POOL Project Overview Dirk Düllmann CERN Openlab storage workshop 17th March 2003

What is POOL? • POOL is the LCG Persistency Framework • Pool of persistent objects for LHC • Started by LCG-SC2 in April ’02 • Common effort in which the experiments take over a major share of the responsibility • for defining the system architecture • for development of POOL components • ramping up over the last year from 1.5 to ~10FTE

POOL and the LCG Architecture Blueprint • POOL is a component based system • A technology neutral API • Abstract C++ interfaces • Implemented reusing existing technology • ROOT I/O for object streaming • complex data, simple consistency model (write once) • RDBMS for consistent meta data handling • simple data, transactional consistency • POOL does not replace any of it’s components technologies • It integrates them to provides higher level services • Insulates physics applications from implementation details of components and technologies used today

Pool as a LCG component • Persistency is just one of several projects in the LCG Applications Area • Sharing a common architecture and s/w process • as described in the Blueprint and Persistency RTAG documents • Persistency is important… • …but not important enough to allow for uncontrolled direct dependencies eg of experiment code on its implementation • Common effort in which the experiments take over a major share of the responsibility • for defining the overall and detailed architecture • for development of Pool components

LCG Blueprint Software Decomposition

POOL Work Package breakdown • Based on outcome of SC2 persistency RTAG • File Catalog • keep track of files (and their physical and logical names) and their description • resolve a logical file reference (FileID) into a physical file • pool::IFileCatalog • Collections • keep track of (large) object collection and their description • pool::Collection<T> • Storage Service • stream transient C++ objects into/from storage • resolve a logical object reference into a physical object • Object Cache (DataService) • keep track of already read objects to speed up repeated access to the same data • pool::IDataSvc and pool::Ref<T>

POOL Internal Organisation

POOL and the GRID • GRID mostly deals with data of file level granularity • File Catalog connects POOL to Grid Resources • eg via our EDG-RLS backend • POOL Storage Service deals with intra file structure • need connection via standard Grid File access • Both File and Object based Collections are seen as important End User concepts • POOL offers a consistent interface to both types • Need to understand to what extend these can be provided in a Grid environment

Exp. DB Services Book Keeping Production Workflow Grid (File) Services File Description Replica Location Remote File I/O? How does POOL fit into the environment POOL client on a CPU Node • POOL will be mainly used from experiment frameworks • mostly as client library loaded from user application • Production Manager • Creates and maintains shared file catalogs and (event) collections • eg add the catalog fragment for the new simulation data to the published analysis catalog • End User • Uses shared collections • eg iterate over collection X User Application Experiment Framework RDBMS Services Collection Description? POOL Collection Location? Collection Access remote access via ROOT I/O

POOL File Catalog Logical Naming Object Lookup • POOL uses GUID implementation for FileID • unique and immutable identifier for a file (generated at create time) • allows to produce sets of file with internal references without requiring a central ID allocation service • catalog fragments created independently can later be merged without modification to data files. • Object lookup is based only on right side box! • Logical filenames are supported but not required

File Catalog & Descr Extraction Grid File Storage Local File Catalog Local Files Result Publishing Local Processing New Files New Catalog & Descr Use Case: Working in Isolation • The user extracts a set of interesting files and a catalog fragment describing them from a (central) grid based catalog into a local (eg XML based) catalog. • Selection is performed based on file or collection descriptions • After disconnecting from the grid the user executes some standard jobs navigating through the extracted data. • New output files are registered into the local catalog • Once the new data is ready for publishing and the user is connected the new catalog fragment is submitted to the grid based catalog.

Use Case: Farm Production Production Node 1 Production Node 2 Production Node n Local File Catalog Local File Catalog Local File Catalog Local Files Local Files Local Files • Production manager may pre-register output files with the catalog (eg a “local” MySQL or XML catalog) • File ID, physical filename job ID and optionally also logical filenames • A production job runs and creates files and their catalog entries locally. • During the production the catalog can be used to cleanup files (and their registration) from unsuccessful jobs based on their associated job ID. • Once the data quality checks have been passed the production manager decides to publishes the production catalog fragment to the grid based catalog. Post Processing New Files New Catalog & Descr Result Publishing Grid Cataloge File Catalog & Descr Grid File Storage

POOL Storage Hierarchy • A application may access databases (eg ROOT files) from a set of catalogs • Each database has containers of one specific technology (eg ROOT trees) • Smart Pointers are used • to transparently load objects into a client side cache • define object associations across file or technology boundaries

Ref<T> Data Service Data Cache Client Client Client Ref<T> Data Cache Data Service Ref<T> Data Service Client Data Access

.h .xml ROOTCINT GCC-XML Code Generator DictionaryGeneration CINT dictionary code LCG dictionary code Gateway I/O CINT dictionary LCGdictionary Other Clients Data I/O Reflection Dictionary:Population/Conversion

Project Status & Plans • First four POOL releases delivered planned functionality on time • Aggressive schedule so far focusing on adding functionality • no consistent attempt of performance optimisation yet • Functional complete (LCG-1 feature set) POOL V1.0 release scheduled for April • several functional extensions compared to V0.4 • automated system tests are being • Bug fix and performance release POOL V1.1 in June • Aim to be ready for first deployment together with LCG-1 environment • Will release • Work on proof of concept storage service re-implementation based on an RDBMS back end starting

Summary • The LCG Pool project provides a hybrid store integrating object streaming (eg Root I/O) with RDBMS technology (eg MySQL) for consistent meta data handling • Strong emphasis on component decoupling and well defined communication/dependencies • Transparent cross-file and cross-technology object navigation via C++ smart pointers • Integration with Grid technology (via EDG-RLS) • but preserving networked and grid-decoupled working modes • Next two releases (V1.0-functionality and V1.1-reliability & performance) will be crucial for POOL acceptance • Need tight coupling with experiment development and production teams to validate the feature set • Assume tight integration with LCG deployment activities

How to find out more about POOL? • POOL Home Page http://lcgapp.cern.ch/project/persist/ • POOL savannah portal http://lcgappdev.cern.ch/savannah/projects/pool

POOL Project Overview

POOL Project Overview

Presentation Transcript

Project Overview

Project Overview

Project Overview

Kendrick Pool Heating Project

Project Overview

Project Overview

Project Overview

PROJECT OVERVIEW

Project Overview

Scuba Pool Design Project

Project Overview

Project Overview

Project Overview

Project Overview

Project Overview