200 likes | 202 Vues
Status of the Accelerator Online Operational Databases. Accelerators and Beams Department Controls Group. Ronny Billen, Chris Roderick LTC – 7 March 2008. Outline. The Accelerator Online Operational Databases Current Database Server Situation Evolution of the Provided Services
E N D
Status of the AcceleratorOnline Operational Databases Accelerators and Beams Department Controls Group Ronny Billen, Chris Roderick LTC – 7 March 2008
Outline • The Accelerator Online Operational Databases • Current Database Server Situation • Evolution of the Provided Services • Performance Hitting The limits • 2008: Planned Upgrade and Migration • Implications, Policy and Constraints for Applications • Logging Data : Expected Vs Acceptable • The Future • Conclusions LTC - Controls session - Databases
The Accelerator Online Operational Databases • Data needed instantaneously to interact with the accelerator • Database is between the accelerator equipment and the client (operator, equipment specialist, software developer) • Many database services, including APIs, and applications • LSA – Accelerator Settings database • MDB – Measurement database • LDB – Logging database • CCDB – Controls Configuration • E-Logbook – Electronic Logbooks • CESAR – SPS-EA Controls • LASER – Alarms database • TIM – Technical Infrastructure Monitoring database • 3-tier deployment of services for resource optimization Client Application Server Database Server LTC - Controls session - Databases
Current Database Server Situation • Technical • 2-node cluster SUN Fire V2402 x {single core 1GHz CPU, 4GB RAM, 2 x 36GB disks, 2 PS} • External Storage9TB RAID 1+0 / RAID 5 mirrored & striped (~60% usable) • History • Purchased original setup: March 2004 • Purchased extra disks: October 2006 • Main accounts - data • Logging: LHC HWC, Injectors, Technical Services • Measurements: LHC HWC, Injectors • Settings: LSA for LHC, SPS, LEIR, PS, PSB, AD • Today’s specifics • 150 simultaneous user sessions • Oracle data-files 4.7 TB, SUNLHCLOG Often referred to as the “LHC Logging Database” LTC - Controls session - Databases
Current Database Server Situation • Technical • Server SUN E420R {450MHz CPU, 4GB RAM, 2x36GB disks} • External Storage 218GB • History • Installed in January 2001 • Main accounts - data • AB-Controls, FESA, CMW, RBAC, OASIS • CESAR, PO-Controls, INTERLOCK • e-Logbooks, ABS-cache • Historical SPS and TZ data • LSA Test • Today’s specifics • 200-300 simultaneous user sessions • Oracle data-files 32GB SUNSLPS Often referred to as the “Controls Configuration Database” LTC - Controls session - Databases
Evolution of the Provided Services • LSA Settings: operationally used since 2006 • Deployed on SUNLHCLOG to get best performance • Used for LEIR, SPS, SPS & LHC transfer lines, LHC HWC • Continuously evolving due to requirements from LHC and PS • Measurement Service: operationally used since mid-2005 • Satisfying central short-term persistence for Java clients • Provides data filtering and transfer to long-term logging service • Generates accelerator statistics • Increasingly used for complete accelerator complex • Logging Service: operationally used since mid-2003 • Scope extended to all accelerators, technical data of experiments • Equipment expert data for LHC HWC: accounts for >90% volume • Largest consumer of database and application server resources LTC - Controls session - Databases
Evolution of the Logging – Data Volume LTC - Controls session - Databases
Evolution of the Logging – Data Rates CIET CRYO QPS LTC - Controls session - Databases
Performance Hitting The Limits • I/O Limits • I/O subsystem is used for reading and writingdata • Recent samples: 4 to 37 clients waiting for I/O subsystem No of active sessions waiting for I/O subsystem LTC - Controls session - Databases
Performance Hitting The Limits • CPU Limits • CPU is always needed to do anything: • Data writing and extraction • Data filtering (CPU intensive) and migration from MDBLDB • Exporting archive log files to tape, Incremental back-ups • Migrating historic data to dedicated read-only storage • Hitting the I/O limits burns CPU Percentage of CPU used on I/O wait events LTC - Controls session - Databases
Performance Hitting The Limits • Storage Limits • Pre-defined allocated data-files difficult to manage (due to size) • Monthly allocations always insufficient (necessary) • Archive log file size insufficient (when backup service down) Storage Utilisation LTC - Controls session - Databases
2008: Planned Upgrade and Migration Separate into 3 high-availability database services • Deploy each service on a dedicated Oracle Real Application Cluster • Settings & Controls Configuration (including logbooks) • Highest-availability, Fast response • Low CPU usage, Low disk I/O • ~20GB data • Measurement Service • Highest-availability • CPU intensive (data filtering MDBLDB), Very high disk I/O • ~100GB (1 week latency) or much more for HWC / LHC operation • Logging Service • High-availability • CPU intensive (data extraction), High disk I/O • ~10TB per year LTC - Controls session - Databases
CTRL CTRL CTRL CTRL 2008: Planned Upgrade and Migration Additional server for DataGuard testing: Standby database for LSA Oracle RAC 1 Oracle RAC 2 Oracle RAC 3 11.4TB usable 2 x quad-core 2.8GHz CPU 8GB RAM Clustered NAS shelf14x146GB FC disks LSA Settings Controls Configuration E-Logbook CESAR Measurements HWC Measurements Logging Clustered NAS shelf14x300GB SATA disks LTC - Controls session - Databases
2008: Planned Upgrade and Migration • Dell PowerEdge 1950 Server specifications: • 2x Intel Xeon quad-core 2.33 GHz CPU • 2x 4 MB L2 cache • 8GB RAM • 2x power supplies, Network cards (10Gb Ethernet), 2x 72GB system disks • NetApp Clustered NAS FAS3040 Storage specifications: • 2x disk Controllers (support for 336 disks (24 shelves)) • 2x disk shelves (14x 146GB Fibre Channel 10,000rpm) • 8GB RAM (cache) • RAID-DP • Redundant hot-swappable: controllers, cooling fans, power supplies, optics, and network cards • Certified >3000 I/O per second LTC - Controls session - Databases
2008: Planned Upgrade and Migration launched Sep-2007 launched Oct-2007 arrived at CERN Nov-2007 arrived at CERN Jan-2008 ordered Jan-2008 stress-tested Jan-2008 liberated Feb-2008 fully installed 7-Mar-2008 installed, configured 14-Mar-2008 deployed (AB/CO/DM) ready for switch-over (1-day stop) 21-Mar-2008? (later) (Sep-2008) • Purchase order for storage (2/11) • Purchase order for servers (7/122) • NetApps NAS storage shelves • Dell servers • Additional mounting rails for servers • Servers • Rack space • Server and storage • Oracle system software • Database structures • Database services • Switch to services of new platform • Migration of existing 5TB logging data to new platform • Purchase additional logging storage for beyond 2008 LTC - Controls session - Databases
Implications, Policy and Constraints for Applications Foreseen for all services, already implemented for a few: Implications • All applications should be cluster-aware • Database load-balancing / fail-over (connection modifications) • Application fail-over (application modifications) Policy • Follow naming conventions for data objects Constraints • Use APIs for data transfer (no direct table access) • Enforce controlled data access • Register authorized applications (purpose, responsible) • Implement application instrumentation • Provide details of all database operations (who, what, where) LTC - Controls session - Databases
Logging Data: Expected Vs Acceptable • Beam related equipment starting to produce data • BLM • 6,400 monitors * 12 * 2(losses & thresholds) + crate status = ~154,000 values per second (filtered by concentrator & MDB) • XPOC • More to come… • Limits • Maximum: 1 Hz data frequency in Logging database • Not a data dump • Consider final data usage before logging – only log what is needed • Logging noise will have a negative impact on data extraction performance and analysis LTC - Controls session - Databases
The Future Logging Data • Original idea keep data available online indefinitely • Data rates estimated ~10TB/year • Closely monitor evolution of storage usage • Order new disks for 2009 data (in Sept 2008) • Migrate existing data (~4TB) to new disks Service Availability • New infrastructure has high-redundancy for high-availability • Scheduled interventions will still need to be planned • Use of a standby database will be investigated, with the objective of reaching 100% uptime for small databases LTC - Controls session - Databases
Conclusions • Databases play a vital role in the commissioning and operation of the Accelerators • Database performance and availability have a direct impact on operations • Today, the main server SUNLHCLOG is heavily overloaded • Based on experience, and the evolution of existing services, the new database infrastructure has been carefully planned to: • Address performance issues • Provide maximum availability • Provide independence between the key services • Scale in function of data volumes, and future requirements • The new database infrastructure should be operational ahead of injector chain start-up and LHC parallel sector HWC LTC - Controls session - Databases