190 likes | 368 Vues
CLEO III Datastorage. Martin Lohner Cornell University CHEP 2000. Overview. CLEO III Experiment Trivia Use of Commercial Software in CLEO Datastorage as part of the CLEO III Data Access System Datastorage Design Decisions to limit Complexity Summary. CLEO III Trivia.
E N D
CLEO III Datastorage Martin Lohner Cornell University CHEP 2000
Overview • CLEO III Experiment Trivia • Use of Commercial Software in CLEO • Datastorage as part of the CLEO III Data Access System • Datastorage Design Decisions to limit Complexity • Summary Martin Lohner, Cornell U A254 CHEP 2000
CLEO III Trivia • On Cornell Campus, Ithaca, NY, USA, fed by a e+e- accelerator, CESR, taking data at ~4S Upsilon (10.6 GeV) • Lean-mean collaboration w/150 physicists from 25 insts. • Engineering data taking since Dec ‘99 • Physics data taking scheduled for mid-April ‘00 • 20 TB in the first year, 200 TB of data over 5 years • Event size 40 kB at 100 Hz 4MB/s • How to store such a large dataset with efficient access? Martin Lohner, Cornell U A254 CHEP 2000
Setting the Stage • Datastorage is mission-critical for many years • Probably longer than most database companies will last • Resources limited at a (relatively) small experiment • Shortage of code development personnel • Uncertainly in future of commercial databases Martin Lohner, Cornell U A254 CHEP 2000
Why a Database? Why Objectivity? • Ease of Management. Scalability. • Who wants to know where those files are? • And which file contains what run? • Efficient access to sub-components • e.g. only access tracks rather than entire event • Does an OODBMS fit the bill? Why not an RDBMS? • Performance? • A number of ongoing and proposed HEP experiments (most notably BaBar) have adopted Objectivity to store Terabytes and Petabytes of data. Martin Lohner, Cornell U A254 CHEP 2000
Use of Commercial Software in CLEO • Before (CLEO-II) never relied on commercial software • Now: • Objectivity/DB for Datastorage • Visigenics (Corba) for middleware • Dangers: • binaries instead of source code • tightly coupled to OS versions and compilers • would like Objectivity for Alpha/Linux • lifetime of company vs. lifetime of experiment • rely on manuals and customer support • find a bug: trial&error; report it, can’t fix it yourself • CLEO III online stores data in our own binary format instead of directly to Objectivity database. Martin Lohner, Cornell U A254 CHEP 2000
CLEO III Data Access System Datastorage is part of the CLEO III Data Access System: • described further in A216 (poster) • is designed to be input/output data format independent • Data-bus consisting of Records (e.g. Event Record) • synchronized with respect to each other to provide consistent view of the CLEO detector at one instant in time • Records can be served by Sources, written to Sinks • Any storage format plugs in via a concrete Source and/or Sink a la device driver Martin Lohner, Cornell U A254 CHEP 2000
CLEO III Data Access System (cont.) • Separation between transient and persistent objects: • user analysis written in terms of transient objects • independent of storage formats! • No drawback except for potential performance penalty -- NOT • we disallow links between objects (except via index-list objects) • data is served on demand (via proxies) • Main data access application “Suez”: • skeleton program, run job setup and control • dynamic loading and/or static linking of modules • Database code loaded as “Objectivity Source/Sink” module Martin Lohner, Cornell U A254 CHEP 2000
Database Layout: CLEO concepts • Natural unit of CLEO III data: the Record • Record contains different types of data • e.g. Event Record contains Tracks and Showers • Sets of Records make up “Streams” • e.g. Event Stream, Geometry Alignment Stream • Sets of Records are grouped in data-taking “Runs” • accelerator fill, same run conditions, same run number Martin Lohner, Cornell U A254 CHEP 2000
Database Layout (cont.) • Translate to Database: • Records become Record-“Headers” w/ links to different data types • Different Streams of Records saved independently • Everything grouped by Run Martin Lohner, Cornell U A254 CHEP 2000
Database Layout (cont.) • Clustering by Event classification: • hadronic, bhabha, tau etc. • Tags with fast-selection criteria • e.g. number of tracks Martin Lohner, Cornell U A254 CHEP 2000
Schema Management, StorageHelpers, Compression • Schema is type information of stored data in database • Schema changes are non-trivial • one Schema for the entire federation of databases • changing=evolving types requires updating the stored objects • avoid corrupting the Schema at all costs • User data types in official Schema? • Storing data as real types prevents compression at object level • then can only do compression at database=file level. Martin Lohner, Cornell U A254 CHEP 2000
Schema Mngmnt, StorageHelpers, Compression (cont.) Different Approach: • All data types stored as Binary Blobs • Only data access layer knows how to interpret blobs • we do store storage information (compression info, etc.) • No direct links between objects • want to support other storage formats (e.g. sequential access files) • instead use index-list objects (“Lattice”) • Allows compression at object level • Conversion blob to transient object via StorageHelpers • see C215 in poster session • basic serialization approach Martin Lohner, Cornell U A254 CHEP 2000
Data Organization • Objy has fixed limits on amount of objects, containers, databases in a federation: • No intention to store all data from day one in one federation • Divide data into “data sets” (run ranges) in separate federations • natural division are data taking periods between shutdowns (run 1-1105 = fdb1, run 1106-2452 = fdb2, etc.) • Have to require the schema to be the same for all federations • Necessary to allow access to several federations in one job • Our schema is simple (data are binary blobs) • Storage of “Constants” in separate federation • different uses, different sizes • access to second federation via Corba Martin Lohner, Cornell U A254 CHEP 2000
User Data • We have not fully addressed how to handle User Data • have ideas, but no definite plan yet • Objectivity allows access to only one federation in process • We don’t want to store User Data in the official database • Forced to use Corba to access Constants in another federation • Why are we not worried? • Binary Blobs: User Data don’t impact Schema • Our ultra-flexible and storage-independent Data Access System allows handling of multiple sources and sinks (different storage formats) in the same job! • We will most likely need another format • based on historical CLEO formats • will CLEO collaborators install Objy at their home institutions? Martin Lohner, Cornell U A254 CHEP 2000
Concurrency Issues • Objectivity locking is done at container level • Objy standard mode allows many readers XOR one writer • Objy MROW mode allows many readers AND ONE writer • can lead to logical data corruption if used improperly • In Reconstruction want to parallelize task: • have many processes update the database • update separate containers -- no problem • update central objects/containers -- problem (e.g. compression information stored centrally) • potential lock collisions • could preallocate in standalone job -- maintenance issue! Martin Lohner, Cornell U A254 CHEP 2000
Objectivity and Mass Storage • Objy AMS server • with Veritas Storage Migrator (=HSM) • on top of tape robot holding AIT (AIT-II) tapes • Objy 5.2: • OOFS layer allows hooks into underlying file system • prior to 5.2 had to deal with timeouts (>25s) due to HSM latency • plan to use Defer-Request Protocol to deal with time-outs • plan to use Redirect-Request Protocol for load-balancing Martin Lohner, Cornell U A254 CHEP 2000
Current Status • Support Solaris 2.x and OSF1 4.x; Linux/Intel soon • found name clashes w/ persistent Objy STL vs normal STL (abandoned persistent STL for our own implementation in terms of “ooVArray”) • new Objectivity 5.2 Java-style collection classes look promising • Tested in full-blown Mock-data reconstruction challenge in summer 1999 and various other tests. • Deployed database with our engineering runs • ~200GB worth of data • Another Mock-data challenge planned shortly after CHEP Martin Lohner, Cornell U A254 CHEP 2000
Summary • Described various design decisions to limit complexity of our data storage system • CLEO III Data Access System is format-independent! • User code independent of storage formats (no recomp/relinking) • Any number of storage formats can be used in the same job • Objectivity is “just another” storage format Major advantage of our system!! • Data storage in Objy as Binary Blobs • more like a data location manager than true object store • no schema evolution problems • storage of user data? • Stress-tested and now deployed for data • Found good performance with Objy 5.2! Martin Lohner, Cornell U A254 CHEP 2000