610 likes | 732 Vues
Persistency at LHC. Vincenzo Innocente CERN. History is as old as Persistency. Sources and Contributions. Presentations at last RD45 workshop Presentations at the “Architecture Working Group” Experiments’ Web pages Contributions to this Workshop Focus on LHC experiments’ prototypes
E N D
Persistency at LHC Vincenzo Innocente CERN History is as old as Persistency
Sources and Contributions • Presentations at last RD45 workshop • Presentations at the “Architecture Working Group” • Experiments’ Web pages • Contributions to this Workshop • Focus on • LHC experiments’ prototypes • New generation experiments (BaBar, STAR, RunII) experience and plans Vincenzo Innocente LCB workshop
Persistency in General
A process saves its state to be later re-used by the same process a different process running the same executable a different process running a different executable Ideal persistency: Core Dump! Persistency: what for? Process 1 Process 3 Process 2 Volatile Memory Permanent Storage Vincenzo Innocente LCB workshop
Use Cases • Extended (in space and time) virtual memory: • proprietary format optimized for computational and storage performance of a single application • Import/Export in a heterogeneous environment • “standard” application-independent format • conversion to/from internal application format • Management of different versions (identification, query mechanism) and of concurrency (locking) • proprietary internal mechanism • rely on the file system • DBMS Vincenzo Innocente LCB workshop
Caveat Caveat • Conversion is always required • What makes the difference is at which level is done • Operating System (or below) • Persistency Service Provider • Application Framework • Application Code • Doing at a given level does not imply that it has not been done also at a lower level • Doing it at higher levels introduces flexibility but reduce performances • Doing it at a lower level improves performances but requires high integration (binds to a given solution) Concurrency is not only for banks... Myprog.cc changed on disk; really edit the buffer? (emacs not oracle) Use Cases • Extended (in space and time) virtual memory: • proprietary format optimized for computational and storage performance of a single application • Import/Export in a heterogeneous environment • “standard” application-independent format • conversion to/from internal application format • Management of different versions (identification, query mechanism) and of concurrency (locking) • proprietary internal mechanism • rely on the file system • DBMS Vincenzo Innocente LCB workshop
Objects are atomic entities have a state (data members including relationships) provide services (methods) Persistent objects survive process boundaries when “retrieved” have the same state provide the same services as they were “stored” Object Persistency Event Event Event Volatile Memory Permanent Storage Event Event Vincenzo Innocente LCB workshop
Object Persistency • Persistency • Objects retain their state between two program contexts • Storage entity is a complete object • State of all data members • Object class • OO Language Support • Abstraction • Inheritance • Polymorphism • Parameterised Types (Templates) Vincenzo Innocente LCB workshop
OO Language Binding • User had to deal with copying between program and I/O representations of the same data • User had to traverse the in-memory structure • User had to write and maintain specialised code for I/O of each new class/structure type • Tight Language Binding • ODBMS allow to use persistent objects directly as variables of the OO language • C++, Java and Smalltalk (heterogeneity) • I/O on demand: No explicit store & retrieve calls Vincenzo Innocente LCB workshop
Problems with Naïve OP • Storing services (methods ready to run) is non trivial • persistency services are just object-data store • configuration management takes care of code • frameworks can use dynamic loading to match data & code • Clean and performant object design is difficult: • Different (partial) representations of the state of an object may be required to cope with computational, storage and I/O efficiencies (and code development efficiency) • Object design and implementation evolve, persistent objects stay the same • “Old” persistent objects need to be converted Vincenzo Innocente LCB workshop
More Problems with Naïve OP • Object granularity does not match raw I/O granularity (which in turn is device dependent) • small objects should be physically clusterized according to users’ access patterns • Object logical relationships do not necessarily reflect access patterns (old rows vs columns dilemma) • How objects become persistent • At construction time (user can control clustering) • By reachability: An object becomes persistent when “attached” to an already persistent object (clustering control difficult) Vincenzo Innocente LCB workshop
Physical Model and Logical Model • Physical model may be changed to optimise performance • Existing applications continue to work Vincenzo Innocente LCB workshop
T a p e Application Algorithm Application Algorithm Application Cache disk I/O Buffer Realistic Object Persistency Conversion from/to computational optimal format? compression? object file page object objects page Conversion from/to machine dependent format new shape Vincenzo Innocente LCB workshop
Components of a POM • Storage manager • manage the physical structure on “disk” • Transaction/concurrency manager • client transaction, journaling, locking mechanisms • (or rely on OS and file system protections) • RTTI system • identifies the concrete type of object to retrieve/store • Converters • from storage format to “user” format and viceversa • machine-dependencies, schema-evolutions, user-hooks Vincenzo Innocente LCB workshop
Components of a POM • Application Cache manager • dynamic memory management with garbage collection • Tools and (G)UI • naming, indexing, query mechanisms • interactive browsing and query • development tools • administration tools Vincenzo Innocente LCB workshop
Objectivity/DB ODBMS close to ODMG standard (library not framework) • Storage Manager based on fixed physical hierarchy slot-page-container-database(file)-federation • Lock-server and journals to manage transactions • Proprietary parsing of extension of C++ (ooddlx) • Objects are converted when “opened” • schema-evolution effects: automatic or user defined • Basic naming, indexing and query mechanisms • Crude Browsing and administration tools • but Objy is integrated with some third-party frameworks Vincenzo Innocente LCB workshop
ROOT Application Framework with embedded I/O • Storage Manager based on • logical hierarchy Tbasket-branch-tree • physical “logical-records” in files • No transactions, no concurrency management • Parsing of C++ subset via CINT • Objects are converted when retrieved (Streamer) • Automatically or by user (schema-evolution only by user) • Basic naming, indexing or query mechanisms • and CINT scripting • “Paw”erful interactive environment Vincenzo Innocente LCB workshop
(Wrapped O)RDBMS • Powerful, reliable and efficient storage managers with full concurrency and transaction management • SQL query mechanisms with transparent (hidden) indexing and naming • User friendly, fully integrated browsers and tools (for relational tables) • Poor object integration (developers should be both OO and ER experts at the same time) Vincenzo Innocente LCB workshop
Persistency in HEP
User Tag (N-tuple) Tracker Alignment Ecal calibration Tracks Event Collection Collection Meta-Data Electrons Event HEP Data • Environmental data • Detector and Accelerator status • Calibrations, Alignments • Event-Collection Meta-Data (luminosity, selection criteria, …) • … • Event Data, User Data Vincenzo Innocente LCB workshop
Environmental Data Version C Geometry Version B Version A Version C Alignment Version B Version A Version B Calibration Version A time Parameters Snapshot for Environmental data items valid for the currently processed event. Vincenzo Innocente LCB workshop
Event Structure & Placement (BaBar) Event Header Tag Evs Tag Sim Header Raw Header Emc Header Trk Header Pid Header Beta Header Hdr Sim Data Sim Raw Data Raw Emc Data Trk Data Pid Data Rec Emc Data Trk Data Pid Data Beta Data Esd Trk Data Pid Data Beta Data Aod Databases Vincenzo Innocente LCB workshop
BaBar Event Structure • Decoupling of placement & navigation • Hierarchical Placement Regions • Sim (Simulated Data). ~100kBytes/event • Tru (Simulated Truth Data) ~40kBytes/event • Raw (Raw Data) ~30kBytes/event • Rec (Reconstructed Data) ~100kBytes/event • Esd (Event Summary Data) ~20kBytes/event • Aod (Analysis Object Data) ~2kBytes/event • Tag (Event Selection Tag) ~200Bytes/event • Navigation Trees • Minimize size of navigation headers • Allow for expansion of data without schema evolution Vincenzo Innocente LCB workshop
Root Physical Clustering Vincenzo Innocente LCB workshop
ODBMS-MSS Integration SLAC-Objy Plan • Extensible AMS • Allows use of any type of filesystem via oofs layer • Generic Authentication Protocol • Allows proper client identification • Opaque Information Protocol • Allows passing of hints to improve filesystem performance • Defer Request Protocol • Accommodates hierarchical filesystems • Redirection Protocol • Accommodates terabyte+ filesystems • Provides for dynamic load balancing Vincenzo Innocente LCB workshop
vfs vfs vfs Dynamic Load Balancing Hierarchical Secure AMS ams Redwood ams Dynamic Selection hpss client Redwood ams Redwood Vincenzo Innocente LCB workshop
One Technology for All ? • Event catalogues • Update (add and remove) items of a catalogue • Searchable: SQL or equivalent • Event data • Write once-read many (WORM) • Often on tertiary (sequential) storage • Bulk data used by the entire collaboration (Raw, Rec,…) • User extracted data (N-tuples) Vincenzo Innocente LCB workshop
One Technology for All ? • Detector data • Updates of data items • Versioning of data items • Version configuration • Statistical data • Understandable by interactive tools A single coherent solution(non optimal for all purposes) or Ad-hoc optimal product for each given type? Vincenzo Innocente LCB workshop
LHCb Event Persistency SicbCnvSvc Transient Event Store Sicb data Files Sicb/Zebra Converter Event Data Service Converter Converter Persistency Service RootCnvSvc Algorithm Algorithm Root data Files Root I/O Converter Converter Converter AppManager OutputStream OutputStream Vincenzo Innocente LCB workshop
Link ID Link Info ... ... DB/Cont.name LHCb Generic Persistent Model Technology Converter (3) (2) (4) 12ByteOID <number> (1) Lookup table Vincenzo Innocente LCB workshop
LHCb Link Tables • One Link table per Storage technology per DB • Link to Objy object • no link table • 8 Bytes are enough to hold ooRef directly • Link to ROOT object • Link table entry must contain all navigation info • File name • Tree/Branch name • Link toZEBRA (SICB) object • Link Table contains file name + ZEBRA bank name Vincenzo Innocente LCB workshop
Hybrid Event Store in STAR • Adoption of ROOT I/O for the event store leaves Objectivity with one role left to cover: the true ‘database’ functions of the event store • Navigation among event collections, runs/events, event components • Data locality (now translates basically to file lookup) • Management of dynamic, asynchronous updating of the event store from one end of the processing chain to the other • From initiation of an event collection (run) in online through addition of components in reconstruction, analysis and their iterations • But with the concerns and weight of Objectivity it is overkill for this role. • So we went shopping… • looking to leverage the world around us, as always • and eyeing particularly the rising wave of Internet-driven tools and open software • and came up with MySQL in May. Vincenzo Innocente LCB workshop
Experiments’ Status and Plans
CMS • Uses Objectivity in production • Test Beam DAQ • Montecarlo (GEANT3) reconstruction • Objectivity fully integrated in Application Framework (CARF) • CARF manages transactions, physical clustering and the whole persistent object structure and its relations with the transient structure • users access persistent objects through C++ pointers • CARF takes care of pinning • leaf inheritance from ooObj often used Vincenzo Innocente LCB workshop
CMS • Limited use of Objectivity “extentions” • associations, indexes, maps, query predicates, etc. • object copy, move, versions • Schema evolution routinely used • No complex object conversion attempted so far • Multi-federation environment to decouple • production • analysis • development Vincenzo Innocente LCB workshop
ATLAS • Used Objectivity in several test-bed applications • HCAL test-beam • ATLFAST++ • 1TB Milestone (HPSS used as MSS) • Plan to use Objectivity in future test-beams and MonteCarlo reconstruction • The application framework will provide a “database” independent interface Vincenzo Innocente LCB workshop
ALICE • Simulation and reconstruction framework fully integrated in ROOT • Used in MonteCarlo simulation and reconstruction • Will be Used in TestBeams Mockup Data Challenge done: 7 TB in seven days • Use HPSS and/or CASTOR for file management Vincenzo Innocente LCB workshop
ALICE DC II NA 57 data source Computer Centre 9 PowerPC AIX LDC LDC 5 MB/s LDC Intel/Linux PC Cluster 10/15 nodes LDC LDC LDC LDC Switch LDC LDC GB eth GDCEvent Builder pipe Switch ROOTObjectifier Intel/PC Linux + PowerPC /AIX +Sun Switch LDC LDC LDC 10MB/s GB eth LDC LDC LDC LDC LDC 10 MB/s HPSS CASTOR ?? LDC ALICE DAQ data source DATE=GDC+LDC Vincenzo Innocente LCB workshop
LHCb • Do not want to limit to one persistency technology • Speed, when you need speed • Functionality, when you need functionality • Ease migration to upcoming (superior) technologies • Independence • Well defined interface to persistency technologies • Interface: abstract technology independent API • Example: ODBC for relational DBs Vincenzo Innocente LCB workshop
LHCb • LHCb application framework (GAUDI) is independent from persistent technology • Manage its own application caches (data services) specialized in • event data • detector data • statistical data • Abstract interface for user provided converters Vincenzo Innocente LCB workshop
BaBar • Taking data since May • Use Objectivity for all kind of data • many home made tools to manage the database • Complete decoupling between transient objects (seen by end user) and their persistent representations • No schema evolution (explicit renaming of classes) • Starts using multiple-federations to decouple running environments Vincenzo Innocente LCB workshop
STAR • Moved away from Objectivity mainly because of configuration management issues • Hybrid solution: • ROOT for event file • MySQL for event catalog and environmental data • MySQL under test for event tags as well • HPSS (through Grand Challenge) for tertiary storage management Vincenzo Innocente LCB workshop
Objectivity Burdens in STAR • The list of burdens imposed by Objectivity grew as our experience and lessons from BaBar mounted • Management, development burden imposed by ensuring consistent schema in a single experiment-wide federation • Schema evolution unusable if forward compatibility is desired (ability to run old executables on new data) • Do-it-yourself access control, particularly with AMS • Risk of major impact from platform lock-in due to porting delays; both Linux and Sun • Scalability concerns (fall ‘98) -- lock manager performance issues in parallel usage? Vincenzo Innocente LCB workshop
Requirements: STAR 8/99 View Vincenzo Innocente LCB workshop
Fermi RUNII (CDF & DØ) • Sequential access model based on RUNI experience • focus on efficient data access from hierarchical storage • clustering optimized to largest data volume access pattern • Use • ROOT (CDF), EVpack (modified DSPACK) (DØ) for event files (MSQL and Oracle8 evaluated by DØ) just I/O back-ends to EDM and DØOM • DØ uses SAM for event catalog and file management • Oracle8 supporting database Vincenzo Innocente LCB workshop
Data Organization User and physics group (derived) data Metadata Event Information Tiers Warm Cache Physical Clustering From Oct 1997 Review - Lee Lueking Vincenzo Innocente LCB workshop
Data Access Mass Storage Pipeline Consumers Metadata Thumbnail Freight Train Pick Event User File =Group of Users =Data flow =File =Disk Storage =Tape Storage =Pipeline Name =Single User =Event Metadata Lee Lueking - October 1997 Vincenzo Innocente LCB workshop
Season IV - aggregate bandwidths, summed from spreadsheet Vincenzo Innocente LCB workshop
(non-technical) Risk Analysis