160 likes | 291 Vues
Database Summary 2009 (focus on Conditions DB). Elizabeth Gallas – Oxford 2009 Software & Computing Post Mortem Workshop (Part I) Jan 18-19, 2010. Outline. General Database Summary 2009 Distributed Database Summary 2009 Conditions DB in 2009 Coming of age
E N D
Database Summary 2009(focus on Conditions DB) Elizabeth Gallas – Oxford 2009 Software & Computing Post Mortem Workshop (Part I) Jan 18-19, 2010
Outline • General Database Summary 2009 • Distributed Database Summary 2009 • Conditions DB in 2009 • Coming of age • Release 15 and other major improvements to Conditions access • Frontier – Grid-wide access to Conditions via Frontier • Sasha will cover in next presentation • Conditions DB Usage 2009 and Challenges for 2010 • Briefly: Other DB Projects 2009 2010 • Real Data Conditions in Simulation (Sim) • Luminosity in Conditions DB (LWG, LMTF) • Data Quality in Conditions DB entry/usage (DQ) • Conditions and Catalogue Metadata for TAGs (TAG) • AGIS – ATLAS Grid Information System (ADC) • Thanks to many … • Summary Elizabeth Gallas - Databases
General Database Summary 2009 • General Database 2009 summary in the Software Week Extended CMB (so I will just summarize here) http://indico.cern.ch/conferenceDisplay.py?confId=74279 • Online stability and isolation from GPN – many aspects • Schema movement to final locations – many applications • Oracle Streams performance (ONL OFL Tier1) – Good • DB Issues logging and follow up • continued improvements in identifying and fixing problems • Plot: Service Usage by application • Highlights largest DB resource consumers • Creation of Archive instance – increases robustness • Makes offline “Standby Database” possible in 2010 • Grid-wide Frontier Deployment • It not only works, people are using it … Elizabeth Gallas - Databases
Distributed Database Summary 2009 • ATLAS represented at “3D Workshop” in November 2009 http://indico.cern.ch/conferenceTimeTable.py?confId=70892 (This workshop brought WLCG DBAs, system managers, and experiment representatives together – thanks to Maria Girone) • ATLAS Distributed Databases include • AMI … Trigger … Conditions … TAGs • TAG DB distribution and operation at volunteer sites • Discussion of the evolution of the TAG DB/services model • Conditions DB access – biggest challenge since the majority of ATLAS processing/analysis happens on “The Grid” • DB Release – continued usage for reprocessing • But can use other access methods • Direct Oracle access – can deploy if needed “Message Board” style queue throttles COOL jobs based on load • Frontier – see talk in workshop by Douglas Smith More details in next talk: Sasha Vaniachine • Very useful to enhance our communication with Tier-1s • Making them aware of our usage and optimize feedback Elizabeth Gallas - Databases
Conditions DB Improvements in 2009 (1) • See TWiki https://twiki.cern.ch/twiki/bin/view/Atlas/DeliverablesForRelease15 • Itemizes some (but not all) of the improvements on next few slides • Updates for underlying infrastructure changes … • COOL 2.6 … removed SEAL dependencies … LCG 56 … • IOVDbSvc: • Reduce number of simultaneous COOL connections (15.0) • Request reloading conditions which have changed (detect open-ended IOVs) • Add internal monitoring to check which folders are used and how much data is read … summary written to log files • Alignment of IOV queries and greater use of read-cache • improves performance with Frontier • “COOL schema split” was complete in 15.4 (online conditions moved to offline folders if they are not needed online) • Associated global tags reviewed to ensure non-needed relations are eliminated • Support for locked UPDx tags was added (with internal lock/unlock cycle) • Essential for implementation of UPD1-mode tags Elizabeth Gallas - Databases
Conditions DB Improvements in 2009 (2) • Detector Status (DQ) enhancement: • black flag, additional detector IDs, combined performance flags • Magnetic field migrated to Conditions DB (15.4) • https://twiki.cern.ch/twiki/bin/view/Atlas/CoolMagneticField • Frontier support (15.4) • https://twiki.cern.ch/twiki/bin/view/Atlas/AthenaDBAccess • Better ways to configure access to resources (Oracle server, POOL file catalogues) at external sites • based on job information system or site-specific customisation • By default, ignore SQLite files in the DB release when running with real data • uses Oracle or Frontier in Release 15.4+ • Fix Conditions access to non-POOL files • these are COOL managed histogram files -- currently are the reference MC histograms but will also arise with real data Elizabeth Gallas - Databases
Additional Conditions DB refinements • Completion of mechanisms for COOL offline online • https://twiki.cern.ch/twiki/bin/view/Atlas/CoolPublishing • Improved Conditions DB POOL file registration system optimized for better grid distribution • https://twiki.cern.ch/twiki/bin/view/Atlas/ConditionsDDM (introduction of Conditions dataset families) • Implementation of UPD1-mode tags • https://twiki.cern.ch/twiki/bin/view/Atlas/CoolTagging (ensure Conditions data used for any given already-taken run cannot be overridden.) • Savannah #51429: Single-row data retrieval for CLOB • Fixed inefficiency … performance comparisons pending Elizabeth Gallas - Databases
Conditions DB Operation • In 2009: Generally worked well, no major failures • Issues with high load on CERN Oracle servers due to • Jobs analysing LBs out of order produced bad behaviour in IOVDbSvc caching -- now fixed. • Jobs with lots of small files with 'simple' events which don't take much time to process, but each require full job initialisation and data reading from COOL • Experience so far in ramping up of Frontier usage with locally cached POOL files on Grid • Many user questions … high support load on Rod Walker, Sasha Vaniachine, and Richard Hawkings • people still learning how real data analysis differs from MC • No formal 'on call' rotation for Conditions DB issues • still many things only Richard Hawkings can fix. Elizabeth Gallas - Databases
Conditions DB Tag and Data Management • In 2009: Worked well, but under strain • For reproducibility of Conditions, all groups must understand • importance of using the global tags properly, eliminating hard-coded tags, adding UPD1 protection to SV (single version) folders • Paul Laycock, COOL Tag Coordinator • Spending significant time training / corresponding with the subsystem experts and various production coordination groups • Hope this will lessen in 2010, • but scope/number of global tags increases every month ! • Communities are diverse (Tier0, HLT, online monitoring, MC, Reprocessing) • Subsystem experts must be aware of which global tags exist…ensure that the correct conditions go into each one … there's no way that Paul can do that. • The high multiplicity of tags to be maintained comes from: • different B field configurations, • particularly in MC • lots of global tags which differ only by small things • (e.g. beamspot size for different energies). Is this the best approach for handling small 'delta' changes ? • Need: • Better awareness from subsystems and groups • Better communication between the Data and Monte Carlo coordination groups • Time / Help to develop COOL Tag management / browsing tools • sorely needed even a the current level of global tags Elizabeth Gallas - Databases
Global Conditions Tag Summary (from Paul) • Good • 4 global tags for tier0 processing, 1 for HLT and 1 for monitoring. • *All* conditions linked to the global tag were locked before we started taking collisions data. • Subsystem experts were educated in the process and are now mostly capable of dealing with conditions-related issues using the high-level AtlCool tools. • Bad • *Not all* conditions are linked to the global tag • need to make this a priority this month, • there was no possibility to change this situation once data-taking had started. • Single-version folders are not linked to the global tag, but any that are used should have UPD1 protection - again this didn't happen because of the risk of breaking things, this is the second top priority item. • Actions: • Paul: contacting each sub-system individually and cc Andreas and Beate. • Vakho: identified some manpower for work on web browser for inspecting global tags • In the meantime, Paul will generate a list of global tags automatically in the nightly checks, so at least the information is guaranteed to be up to date Elizabeth Gallas - Databases
Update Issues Summary (from Paul) • Good • The online/offline split was completed in time and the P1 gateway transfer mechanism worked well • Bad • TRT had problems updating the alignment, understood and corrected. • They were updating only some of the channels, and thus old un-updated channels conditions "peeked through". • The tags have been truncated and a warning now reports if your update is missing some channels in the folder. • Rt and t0 constants are in different folders but are completely correlated. • A problem came when updating both folders but a run start happens in between • Rt is valid from run X, t0 from run X+1. No solution to this, but the expected low frequency of run starts during data taking should mean this isn't an issue. • Two SCT online folders had problems in the very beginning • Code couldn't deal with locked folders • This was quickly resolved. • Not everybody who needs to know about updates, does. • Actions: • Richard is updating the AtlCool tools to send the information to more email-lists and make them more informative. Elizabeth Gallas - Databases
Known Additional Challenges for 2010 • Have not done any 24hr calibration loop processing yet • Will bring a new set of challenges (and work required) • Any indication when this will start ? • Need to complete migration of MC simulation using BField from COOL • Personpower needed ! • Elimination of hard-coded COOL tags in JobOptions • Highest priority: locking tags used by Tier-0 • Plan: eliminate from real data first, then MC • Complicated by need to reverse-engineer what was used previously and "not breaking anything" (higher priority) • …uncertainty whether or when complete elimination is possible … • We must continue to review most resource-consuming subdetector usage of Conditions DB in Athena …suggest optimisations • This is an ongoing effort, increased when real data analysis started, • some involve particular grid sites where particular types of processing occurs • Must enhance level of our communication of expected activity to Tier-1s and improve monitoring / feedback with Tier-1s • Oracle … Frontier … Squid servers • … and understanding limitations at each level … Elizabeth Gallas - Databases
ATLAS DB Projects 2009 2010 I wanted to mention a few ATLAS projects under considerable development / usage using DB based information and the groups under which they are evolving: • Luminosity in Conditions DB (LWG, LMTF) • Model for Luminosity in COOL laid out … related data is being added as available (online and offline) • Real Data Conditions in Simulation (Simulation) • Special Simulation meeting Wednesday this week: • http://indico.cern.ch/conferenceDisplay.py?confId=80850 • Data Quality in Conditions DB entry/usage (DQ) • Data Quality and Good Run Lists now in computing tutorials • Conditions and Catalogue Metadata for TAGs (TAG) • Evolution of TAG DB and services will increasingly use metadata collected from many systems to enhance user access • AGIS – ATLAS Grid Information System (ADC) • Support the optimization of the ATLAS computing grid Elizabeth Gallas - Databases
Thanks to many … Oracle Databases are a shared resource essential to smooth operation • THANKS: many many application developers ! • Applications are living, breathing, consuming, and evolving systems … requiring careful intervention for collective stability • Thanks: DBAs, System Administrators, Other Application support • ATLAS – Gancho Dimitrov, Florbela Viegas • CERN – Maria Girone and Physics Database Services; Andrea Valassi, others for Conditions DB, CORAL support • Tier-1s – many DBAs and system admins • Thanks: ADC Operations • Thanks: Fellow DB Coordinators • Online – Giovanna Lehmann • PVSS – Stefan Schlenker, Slava Khomuntnikov • TAG / Metadata – David Malon, Eric Torrence • Operations – Sasha Vaniachine • Frontier – John DeStefano, Rod Walker • Squid – Douglas Smith • Conditions / COOL Tagging – Richard Hawkings, Paul Laycock • Others ! RDSchaffer, Shaun Roe, Hans von der Schmitt, Uli Felzmann … Elizabeth Gallas - Databases
Summary (from 3D Workshop in Nov 2009) • Schemas are stable • Access is controlled • Monitoring in place • Depth of experience • Some redundancy and tools in back pockets (COOL Pilot, Frontier) Will we be challenged? Certainly ! Are we Ready ? YES ! It’s November… and the data is coming… intensive and diverse analysis will follow … Are we ready ? Elizabeth Gallas - Databases
Sorry for lack of pictures in this talk … images from the web … Washup: Rename this meeting for 2010 ? … but its no Jamboree either … Post mortem: Backup Elizabeth Gallas - Databases