SAMGrid Database Server
This document outlines the redesign of the SAMGrid Database Server to address performance and maintainability issues encountered in the previous system. Key improvements include a new DB server design featuring a code generator for core classes, CORBA interface enhancements, and multithreading capabilities aimed at reducing performance bottlenecks. The migration plan ensures minimal impact on users while transitioning systematically to the updated system. This restructured server architecture aligns better with the current schema, thereby facilitating more flexible metadata handling and supporting distributed database access via SBIR II integration.
SAMGrid Database Server
E N D
Presentation Transcript
A. Lyon, Fermilab (for the SAMGrid Team) SAMGrid Database Server
Outline • Introduction • Issues Addressed with Redesign • Redesign Goals • New DB Server Design/Features • Outstanding Issues • Integration with SBIR II • Concluding Remarks
Introduction: The SAMGrid System • SAMGrid: general data-handling system designed to work for experiments with peta-byte sized datasets and widely distributed production/analysis facilities • Offers a wide variety of services, including those for: • data transfer, storage and management • process bookkeeping on distributed systems • Used by D0 and CDF, being tested for use by MINOS and CMS
Introduction: DB Server Role/Usage • SAMGrid uses central Oracle RDBMS • Most of the communication with the DB handled by the CORBA-based DB Server • Services provided: • Cataloguing services (file metadata, event catalog, replica catalog • Dataset services • Process accounting • Runtime support for the SAMGrid station services • Usage: About 250 million DB queries over the recent 3 month period
Issues Addressed with Redesign • Large code base: more than 27000 lines of python code, 350 CORBA IDL methods implemented – more than 60% obsolete • Single threaded code => performance issues • Removing/modifying old code is very difficult => maintenance problems, hard to adapt to the DB schema changes (the latest change resulted from the CDF adoption of the SAMGrid system was very complex)
Redesign Goals • Update treatment of file metadata, align it with the latest DB schema • Improve code maintainability • Easier new development • Improve server performance
New DB Server Design/Features • DB Server Generator • taken from the old infrastructure • handles automatic generation of the core (DB-derived) classes – each of those correspond to one table in the DB • CORBA wrapper classes: layer of code on top of the ORB-generated structs end exceptions with the purpose of shielding developers from having to manipulate those structs/exceptions directly • promote code maintainability/re-use (e.g., SAMGrid python API uses the same code as the Db Server) • easier development
New DB Server Design/Features • CORBA interfaces • redesigned and reorganized so that they closely match services which the DB server provides • File metadata • described as dictionaries • each file type has a certain set of required parameters • flexible/configurable system • Multithreading • should minimize performance problems
Outstanding Issues • Impact of the new CORBA infrastructure with respect to the server performance (issue for large lists) • We have not completely finished transferring all functionality of the existing code into the new server
Deployment Path • Major changes in the core software component => deployment into production is not easy • Upgrade will be incremental, so that its impact on both users and the DH system itself should be minimal. • Plan for deployment in three phases • Upgrade both experiments DB to the latest schema (completed in June ’04, required patching of the old code) • Deploy new db server in parallel to the old one, install new clients, start testing (ongoing now) • Start gradually upgrading main production stations
Integration with SBIR II • SBIR II strives to provide access to distributed databases with a single query • This would remove the SAMGrid dependence on the centralized DB • We are working on interfaces which will allow us to plug different query mechanisms into our code
Concluding Remarks • SAMGrid DB Server, one of the most critical components of the system, was completely re-designed • New architecture promotes code maintainability, easier development, and better performance • New treatment of the file metadata: flexible and configurable • Deployment into production and necessary system upgrades will be done incrementally to minimize impact on users/DH system