Distributed Database Services

Distributed Database Services Maria Girone, CERN, IT-DM WLCG Workshop, 13th Nov. 2008

Outline • Distributed Database Workshop • Experience from 2008 • DB issues for Castor @ Tier1 • Physics DB Services at CERN and Tier1s • Streams • Data protection • Application Performance Optimization • DB resource requests for 2009 Maria Girone Physics Database Services 2

Distributed DB Operations Workshop • Well attended (40 people registered) • DBA teams from CERN and Tier1 sites, experiments representatives, ESA-ESAC Presentation title - 3

Tuesday 11th November Presentation title - 4

Wednesday 12th November Presentation title - 5

Experience in 2008 6

DB Issues for CASTOR @ Tier1 7

CASTOR Deployment Architecture • CERN: 6 instances (4x LHC, Public, cernT3) • Each instance: stager + DLF + SRM DB schemas, each on a different cluster (2 nodes) • Name server on yet another cluster • 38 machines! • RAL: 4 different stagers + DLF + SRM, one Name server, two clusters • CNAF and ASGC: one castor instance (Stager + DLF + SRM + Name server), one cluster 8

Issues Seen • SQL executing in wrong schema • 14k files lost on LHCb, synchronization suspended • SR opened, but bug not seen by Oracle on 10.2.0.3 • Redo logs inconclusive • Difficult to reproduce • ORA-600 sometimes when delete on id2type table • DML partition lock (ORA-14403) • Violation of primary key constraint ORA-00001 • Not seen by any other CASTOR service! 9

Proposal for CASTOR DB • Need to come up with common CASTOR DB architecture and share deployment/diagnostic procedures • Dedicated tutorial for CASTOR DBAs? • Agree on a Oracle version for CASTOR and establish a clear plan for evolution with realistic validation period • Synchronized with version for distributed DBs cluster (3D) to avoid divergence • Need a dedicated test setup similar to T1 setups • Reproduce and fix Oracle problems • Establish performance metrics via a common stress test 10

Physics Database Services 11

DB Services for Physics at CERN • ~ 25 RAC databases (up to 6 nodes) in the computer centre • 125 servers, 150 disk arrays (2000 disks) • 450 CPU cores, 900GB of RAM, 540 TB raw disk space, ~140 TB of effective disk space • More than 1000 deployed schemas • Took over the responsibility for LHCb and CMS online databases • Hope to converge soon on funding resources for ALICE and ATLAS online database services • Team of 5 DBAs (+2 fellows who have just joined) + service coordinator and link to experiments • Coverage outside working hours on “best effort” for production databases – consistency with WLCG Critical Services? • Maintenance with no user-visible downtime • 0.04% services unavailability (all 2008) = 3.5 hours/year • 0.22% server unavailability (all 2008) = 19 hours/year • Patch deployment, broken hardware Maria Girone Physics Database Services 12

Preparation for the start-up • Major migration in April in view of LHC start-up (RAC5 and RAC6) • Dual-CPU quad-core 2950 DELL servers, 16GB memory, Intel 5400-series “Harpertown”; 2.33GHz clock • Dual power supplies, mirrored local disks, 4 NIC (2 private/2 public), dual HBAs • 60 disk arrays, 16 disks SATA disks of 400 GB each • dual-ported FC controllers • RAC5 and RAC6 have a “mixed power configuration” • Storage, FC infrastructure, Ethernet Switches and half of the servers are connected to the critical power • Migrated to Oracle 10.2.0.4, 64bit in June 2008 • Standby DBs implemented for Maximum Availability Architecture on hardware going out of warranty prior to LHC start-up Maria Girone Physics Database Services 13

H/W Configuration • 1 disk array ~ 6.4 TB raw space ~25% effective space (mirroring and on-disk backups) • Note(!): online DBs outside CC not included in the list

ATLAS – CPU AND IO READS

Nodes in Production Cluster

3D DB Machines

3D Storage

Hardware Plans • GridKa: New server in Spring 2009 • PIC: None • Triumf: 3D,FTS - add a new 2-node cluster and storage by April 2009 • NDGF: planning for a 3-node cluster, with SAN storage with 4Gbit/s FC and SAS disks. Nodes will be something adequate, maybe dual Quad-core Xeons with 8 or 16GB of RAM. • CNAF: Just installed 10TB of SATA storage (for flash recovery area) and10 TB of FibreChannel storage (4Gbit) • SARA: in the process of purchasing new hardware. This process should be completed within a few months. • ASGC: add new instances into same RAC • RAL: SAN redundancy and failover on CASTOR

Streams Status 20

Overview ATLAS

Overview LHCB CMS

Recent problems • Site unresponsive during a weekend • propagation could not send LCRs to destination • processes were healthy – no errors reported • large number of spilled LCRs kicked up the flow control (≈ 6.000.000 LCRs) • capture process « temporarily » paused • Additional capture latency monitored • alert sent when 90 minutes threshold exceeded • Tests on the streams pool memory usage • new node allocated for the downstream cluster 2.6 GB Streams Pool Size (MB)

Planned Interventions • LFC migration out of SRM v1 endpoint • Streams replication stopped • Data updated at source and all destinations • problems with RAL, where data was finally imported from CERN • CNAF, PIC and IN2P3 hardware migration • re-synchronization using transportable tablespaces • Tier1 sites should consider the use of Data Guard in order to minimize the impact

Bugs related to Streams • Fixed: • ORA-600 when dropping propagation • ORA-26687 no instantiation SCN provided when drop table (2 streams setup between same source and destination databases) • To be fixed: • <BUG:6402302> create view on schema not in streams is replicated • drop view is not replicated!

Application Performance Optimization 26

Applications Performance Optimisations • Extensive tests on integration before production • Concurrency and stress tests are crucial • Most issues of performance and locking contention come up at this stage • Allows to measure resource usage pattern per application (i.e. is it IO-bound, CPU-bound, etc) • Check of connection management • Oracle likes a pool of stable connections • High connection rate or high number of concurrent connections are not good for Oracle • Execution plan stability • Tests need to be ‘real scale’ data • Execution plan stability requires some extra effort in particular cases (see also COOL experience)

Data Protection 28

Data Protection at CERN • Implementation of on-disk backups to speed-up recovery from physical and logical data corruptions • Performance tuning of on-tape backups • better usage of available resources (tape drives, disk pools, network bandwidth) and improved scheduling • Deployment of Data Guard Physical Standby • to decrease time-to-recover, further improve handling human errors and facilitate testing • Implementing new backup & recovery procedures • with improved handling of read-only data • supporting re-startable full backups • fully integrating automatic recovery scripts • Testing LAN-free backups to tapes • Examining possibility to replace tapes with disk pools as backup destination (for LHCb online) 29

Experiments Resource Review Request 30

CMS • Standby DB requested for both offline and online clusters • Support outside working hours for DBs and streams from online to offline • CMS DB payload for 2009 31

CMS DB payload for the 2009 (100 days) * Updated 03/10/08

ATLAS • Waiting for requests from the global ATLAS computing resource allocations • From https://twiki.cern.ch/twiki/bin/view/Atlas/DatabaseVolumes ATLAS Offline RAC Volumes Total TAGS COOL PVSS ADC et al. (TB) (TB) (TB) (TB) (TB) 2007 4.3 1.0 0.3 2 1 2008 8.7 2.0 0.7 4 2 2009 22.6 12.1 1.5 7 2 2010 39.3 23.4 2.5 10 3.4 33

LHCb • CPU, volumes and IO resources at Tier0 basically ok • Tier1 DB resources also adequate 34

Conclusions • Smooth running in 2008 at CERN, profiting from larger headroom from h/w upgrade • Comfortable for operating the services • Adding more resources can only be done with planning • Use of data-guard also requires additional resources • New hardware for RAC2 replacement expected early 2009 • 20 dual-cpu quad-core (possibly blade) servers, 32 disk arrays • Policies concerning integration and production database services remain unchanged in 2009 • s/w and security patch upgrades and backups • Hope that funding discussions for online services with ATLAS and Alice will converge soon Maria Girone Physics Database Services 35

More Details • Physics Database Services at CERN wiki • https://twiki.cern.ch/twiki/bin/view/PSSGroup/PhysicsDatabasesSection • Support: phydb.support@cern.ch • LCG 3D wiki • interventions, performance summaries • http://lcg3d.cern.ch Maria Girone Physics Database Services 36

Backup Slides 37

Resources Usage and Capacity Studies • Capacity studies from production usage • Identify the critical resources for our services • In particular • CPU • random IO (read) • available storage space • Plot the usage trend • Correlate with Users’ requirements • This is the input for new HW acquisition

Monitoring • New tools created for simplified storage (SAN) configuration & administration • Many improvements in monitoring • RACMon – storage, availability and standby DB monitoring, rolling intervention mode, performance plots, ...) • Streams monitoring – improved capture latency monitoring and alerts • Future plans • load & latency monitoring • Improved monitoring for Tier1s with additional dashboards 39

Standby databases Standby databases set-up on hardware going out of warranty prior to LHC start-up To increase data protection level To facilitate recovering from large scale data corruption and data loss To facilitate handling human errors All the transactions are propagated immediately to the standby system Changes applied on the standby with 24 hours delay Very positive experience so far, but the hardware will be dismantled soon Collecting now experiments’ requests for 2009 Will test the stability of active data-guard on 11g within the openlab Enables to open a physical standby database for read-only access Clients Primary Standby Data changes

Applications Performance Monitoring • Monitoring of production • Online monitor and find ‘queries that go bad’ • A single query with a ‘bad plan’ can generate CPU or IO load to ‘kill a cluster’ (see examples on the metrics data) • Weekly reports for bind variables usage, number of connections, service load • Feedback to users from production • Application owners should have the possibility to react fast i.e. apply perf-related changes to their app to better use this service

CMS – CPU and I/O reads

LCG CPU and IO reads

LHCBR CPU and IO reads

Distributed Database Services

Distributed Database Services

Presentation Transcript

CS4404 Distributed Database

Distributed Database

Distributed Database Systems

Distributed Database Applications

Distributed Database Systems

Distributed Database

Distributed Database

DISTRIBUTED DATABASE SECURITY

Distributed Database Applications

Distributed Database Security

Distributed Database Systems

Distributed Database Systems

DISTRIBUTED DATABASE ARCHITECTURE

Distributed Database Design

Distributed Database Applications

Distributed Database Systems

Grid Technologies for Distributed Database Services