1 / 16

Database Summary 2009 (focus on Conditions DB)

Database Summary 2009 (focus on Conditions DB). Elizabeth Gallas – Oxford 2009 Software & Computing Post Mortem Workshop (Part I) Jan 18-19, 2010. Outline. General Database Summary 2009 Distributed Database Summary 2009 Conditions DB in 2009 Coming of age

jariah
Télécharger la présentation

Database Summary 2009 (focus on Conditions DB)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Summary 2009(focus on Conditions DB) Elizabeth Gallas – Oxford 2009 Software & Computing Post Mortem Workshop (Part I) Jan 18-19, 2010

  2. Outline • General Database Summary 2009 • Distributed Database Summary 2009 • Conditions DB in 2009 • Coming of age • Release 15 and other major improvements to Conditions access • Frontier – Grid-wide access to Conditions via Frontier • Sasha will cover in next presentation • Conditions DB Usage 2009 and Challenges for 2010 • Briefly: Other DB Projects 2009  2010 • Real Data Conditions in Simulation (Sim) • Luminosity in Conditions DB (LWG, LMTF) • Data Quality in Conditions DB entry/usage (DQ) • Conditions and Catalogue Metadata for TAGs (TAG) • AGIS – ATLAS Grid Information System (ADC) • Thanks to many … • Summary Elizabeth Gallas - Databases

  3. General Database Summary 2009 • General Database 2009 summary in the Software Week Extended CMB (so I will just summarize here) http://indico.cern.ch/conferenceDisplay.py?confId=74279 • Online stability and isolation from GPN – many aspects • Schema movement to final locations – many applications • Oracle Streams performance (ONL  OFL  Tier1) – Good • DB Issues logging and follow up • continued improvements in identifying and fixing problems • Plot: Service Usage by application • Highlights largest DB resource consumers • Creation of Archive instance – increases robustness • Makes offline “Standby Database” possible in 2010 • Grid-wide Frontier Deployment • It not only works, people are using it … Elizabeth Gallas - Databases

  4. Distributed Database Summary 2009 • ATLAS represented at “3D Workshop” in November 2009 http://indico.cern.ch/conferenceTimeTable.py?confId=70892 (This workshop brought WLCG DBAs, system managers, and experiment representatives together – thanks to Maria Girone) • ATLAS Distributed Databases include • AMI … Trigger … Conditions … TAGs • TAG DB distribution and operation at volunteer sites • Discussion of the evolution of the TAG DB/services model • Conditions DB access – biggest challenge since the majority of ATLAS processing/analysis happens on “The Grid” • DB Release – continued usage for reprocessing • But can use other access methods • Direct Oracle access – can deploy if needed “Message Board” style queue throttles COOL jobs based on load • Frontier – see talk in workshop by Douglas Smith More details in next talk: Sasha Vaniachine • Very useful to enhance our communication with Tier-1s • Making them aware of our usage and optimize feedback Elizabeth Gallas - Databases

  5. Conditions DB Improvements in 2009 (1) • See TWiki https://twiki.cern.ch/twiki/bin/view/Atlas/DeliverablesForRelease15 • Itemizes some (but not all) of the improvements on next few slides • Updates for underlying infrastructure changes … • COOL 2.6 … removed SEAL dependencies … LCG 56 … • IOVDbSvc: • Reduce number of simultaneous COOL connections (15.0) • Request reloading conditions which have changed (detect open-ended IOVs) • Add internal monitoring to check which folders are used and how much data is read … summary written to log files • Alignment of IOV queries and greater use of read-cache • improves performance with Frontier • “COOL schema split” was complete in 15.4 (online conditions moved to offline folders if they are not needed online) • Associated global tags reviewed to ensure non-needed relations are eliminated • Support for locked UPDx tags was added (with internal lock/unlock cycle) • Essential for implementation of UPD1-mode tags Elizabeth Gallas - Databases

  6. Conditions DB Improvements in 2009 (2) • Detector Status (DQ) enhancement: • black flag, additional detector IDs, combined performance flags • Magnetic field migrated to Conditions DB (15.4) • https://twiki.cern.ch/twiki/bin/view/Atlas/CoolMagneticField • Frontier support (15.4) • https://twiki.cern.ch/twiki/bin/view/Atlas/AthenaDBAccess • Better ways to configure access to resources (Oracle server, POOL file catalogues) at external sites • based on job information system or site-specific customisation • By default, ignore SQLite files in the DB release when running with real data • uses Oracle or Frontier in Release 15.4+ • Fix Conditions access to non-POOL files • these are COOL managed histogram files -- currently are the reference MC histograms but will also arise with real data Elizabeth Gallas - Databases

  7. Additional Conditions DB refinements • Completion of mechanisms for COOL offline  online • https://twiki.cern.ch/twiki/bin/view/Atlas/CoolPublishing • Improved Conditions DB POOL file registration system optimized for better grid distribution • https://twiki.cern.ch/twiki/bin/view/Atlas/ConditionsDDM (introduction of Conditions dataset families) • Implementation of UPD1-mode tags • https://twiki.cern.ch/twiki/bin/view/Atlas/CoolTagging (ensure Conditions data used for any given already-taken run cannot be overridden.) • Savannah #51429: Single-row data retrieval for CLOB • Fixed inefficiency … performance comparisons pending Elizabeth Gallas - Databases

  8. Conditions DB Operation • In 2009: Generally worked well, no major failures • Issues with high load on CERN Oracle servers due to • Jobs analysing LBs out of order produced bad behaviour in IOVDbSvc caching -- now fixed. • Jobs with lots of small files with 'simple' events which don't take much time to process, but each require full job initialisation and data reading from COOL • Experience so far in ramping up of Frontier usage with locally cached POOL files on Grid • Many user questions … high support load on Rod Walker, Sasha Vaniachine, and Richard Hawkings • people still learning how real data analysis differs from MC • No formal 'on call' rotation for Conditions DB issues • still many things only Richard Hawkings can fix. Elizabeth Gallas - Databases

  9. Conditions DB Tag and Data Management • In 2009: Worked well, but under strain • For reproducibility of Conditions, all groups must understand • importance of using the global tags properly, eliminating hard-coded tags, adding UPD1 protection to SV (single version) folders • Paul Laycock, COOL Tag Coordinator • Spending significant time training / corresponding with the subsystem experts and various production coordination groups • Hope this will lessen in 2010, • but scope/number of global tags increases every month ! • Communities are diverse (Tier0, HLT, online monitoring, MC, Reprocessing) • Subsystem experts must be aware of which global tags exist…ensure that the correct conditions go into each one … there's no way that Paul can do that. • The high multiplicity of tags to be maintained comes from: • different B field configurations, • particularly in MC • lots of global tags which differ only by small things • (e.g. beamspot size for different energies).  Is this the best approach for handling small 'delta' changes ? • Need: • Better awareness from subsystems and groups • Better communication between the Data and Monte Carlo coordination groups • Time / Help to develop COOL Tag management / browsing tools • sorely needed even a the current level of global tags Elizabeth Gallas - Databases

  10. Global Conditions Tag Summary (from Paul) • Good • 4 global tags for tier0 processing, 1 for HLT and 1 for monitoring. • *All* conditions linked to the global tag were locked before we started taking collisions data. • Subsystem experts were educated in the process and are now mostly capable of dealing with conditions-related issues using the high-level AtlCool tools. • Bad • *Not all* conditions are linked to the global tag • need to make this a priority this month, • there was no possibility to change this situation once data-taking had started. • Single-version folders are not linked to the global tag, but any that are used should have UPD1 protection - again this didn't happen because of the risk of breaking things, this is the second top priority item. • Actions: • Paul: contacting each sub-system individually and cc Andreas and Beate. • Vakho: identified some manpower for work on web browser for inspecting global tags • In the meantime, Paul will generate a list of global tags automatically in the nightly checks, so at least the information is guaranteed to be up to date Elizabeth Gallas - Databases

  11. Update Issues Summary (from Paul) • Good • The online/offline split was completed in time and the P1 gateway transfer mechanism worked well • Bad • TRT had problems updating the alignment, understood and corrected. • They were updating only some of the channels, and thus old un-updated channels conditions "peeked through". • The tags have been truncated and a warning now reports if your update is missing some channels in the folder. • Rt and t0 constants are in different folders but are completely correlated. • A problem came when updating both folders but a run start happens in between • Rt is valid from run X, t0 from run X+1. No solution to this, but the expected low frequency of run starts during data taking should mean this isn't an issue. • Two SCT online folders had problems in the very beginning • Code couldn't deal with locked folders • This was quickly resolved. • Not everybody who needs to know about updates, does. • Actions: • Richard is updating the AtlCool tools to send the information to more email-lists and make them more informative. Elizabeth Gallas - Databases

  12. Known Additional Challenges for 2010 • Have not done any 24hr calibration loop processing yet • Will bring a new set of challenges (and work required) • Any indication when this will start ? • Need to complete migration of MC simulation using BField from COOL • Personpower needed ! • Elimination of hard-coded COOL tags in JobOptions • Highest priority: locking tags used by Tier-0 • Plan: eliminate from real data first, then MC • Complicated by need to reverse-engineer what was used previously and "not breaking anything" (higher priority) • …uncertainty whether or when complete elimination is possible … • We must continue to review most resource-consuming subdetector usage of Conditions DB in Athena …suggest optimisations • This is an ongoing effort, increased when real data analysis started, • some involve particular grid sites where particular types of processing occurs • Must enhance level of our communication of expected activity to Tier-1s and improve monitoring / feedback with Tier-1s • Oracle … Frontier … Squid servers • … and understanding limitations at each level … Elizabeth Gallas - Databases

  13. ATLAS DB Projects 2009  2010 I wanted to mention a few ATLAS projects under considerable development / usage using DB based information and the groups under which they are evolving: • Luminosity in Conditions DB (LWG, LMTF) • Model for Luminosity in COOL laid out … related data is being added as available (online and offline) • Real Data Conditions in Simulation (Simulation) • Special Simulation meeting Wednesday this week: • http://indico.cern.ch/conferenceDisplay.py?confId=80850 • Data Quality in Conditions DB entry/usage (DQ) • Data Quality and Good Run Lists now in computing tutorials • Conditions and Catalogue Metadata for TAGs (TAG) • Evolution of TAG DB and services will increasingly use metadata collected from many systems to enhance user access • AGIS – ATLAS Grid Information System (ADC) • Support the optimization of the ATLAS computing grid Elizabeth Gallas - Databases

  14. Thanks to many … Oracle Databases are a shared resource essential to smooth operation • THANKS: many many application developers ! • Applications are living, breathing, consuming, and evolving systems … requiring careful intervention for collective stability • Thanks: DBAs, System Administrators, Other Application support • ATLAS – Gancho Dimitrov, Florbela Viegas • CERN – Maria Girone and Physics Database Services; Andrea Valassi, others for Conditions DB, CORAL support • Tier-1s – many DBAs and system admins • Thanks: ADC Operations • Thanks: Fellow DB Coordinators • Online – Giovanna Lehmann • PVSS – Stefan Schlenker, Slava Khomuntnikov • TAG / Metadata – David Malon, Eric Torrence • Operations – Sasha Vaniachine • Frontier – John DeStefano, Rod Walker • Squid – Douglas Smith • Conditions / COOL Tagging – Richard Hawkings, Paul Laycock • Others ! RDSchaffer, Shaun Roe, Hans von der Schmitt, Uli Felzmann … Elizabeth Gallas - Databases

  15. Summary (from 3D Workshop in Nov 2009) • Schemas are stable • Access is controlled • Monitoring in place • Depth of experience • Some redundancy and tools in back pockets (COOL Pilot, Frontier) Will we be challenged? Certainly ! Are we Ready ? YES ! It’s November… and the data is coming… intensive and diverse analysis will follow … Are we ready ? Elizabeth Gallas - Databases

  16. Sorry for lack of pictures in this talk … images from the web … Washup: Rename this meeting for 2010 ? … but its no Jamboree either … Post mortem: Backup Elizabeth Gallas - Databases

More Related