1 / 9

Persistency Framework News for ATLAS

Persistency Framework News for ATLAS. Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 18 th July 2011. Outline and summary. Recent developments and releases POOL, CORAL, COOL (releases since Apr 2011 talk at ATLAS sw week )

santa
Télécharger la présentation

Persistency Framework News for ATLAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Persistency Framework Newsfor ATLAS Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 18th July 2011

  2. Outline and summary • Recent developments and releases • POOL, CORAL, COOL (releases since Apr 2011 talk at ATLAS sw week) • Service issues from POOL and ROOT/xrootd • Largest development activity was on CORAL • New Oracle client too – review open issues too • Other issues • Frontier client upgrade • Oracle 11g server upgrade • Long-term support for POOL • Some other work in progress

  3. ATLAS 16.6.7 (June 2011) • Main motivation: urgent bug fix in ROOT (xrootd client) • Upgrade to ROOT 5.26.00f • Avoid Kerberos request flood from the xrootd client (bug #82793) • The main patch, which was already in ROOT 5.28 (LCG 60 series), was backported from 5.28 to 5.26.00f • POOL 2.9.11, CORAL 2.3.14, COOL 2.8.8 • Same as in LCG 59b (no rebuild) • Original plan was LCG 59c, finally handled internally in ATLAS 16.6.7 • Kerberos service (KDC) was flooded with kinit-like requests • Initially reported for a POOL utility (pool_extractFileIdentifier) called by users of ATLAS 16.6.5 – soon tracked down to xrootd client • Soon observed also from direct xrdcp (only ROOT, no POOL) • Still unclear what changed to trigger the issue (new usage pattern?) • ROOT client upgrade eventually removed the KDC server flood • But client jobs with new ROOT occasionally failed until a server-side workaround was also applied in the xrootd redirector • Combination of xrootd bug with a bug in the SLC5 Kerberos library To all ATLAS 16.x users : please use (or at least asetup) 16.6.7

  4. LCG 60c for ATLAS (June 2011) • Motivation: changes in ROOT, POOL, CORAL, COOL, oracle • ROOT 5.28.00e: more robust fix for xrootd/Kerberos (bug #82793) • Main part of the fix was already in 5.28.00 (backported to 5.26.00f) • Oracle 11.2.0.1.0p3 client: another Kerberos fix (see next slide) • POOL 2.9.14 • Many fixes and enhancements for ATLAS in collection packages • [Root]StorageSvc fixes (auto flush, file merge, longlong support…) • CORAL 2.3.16 • Fix crashes when using a deleted Oracle session (e.g. bug #73834) • Both in network glitches and user code, both single and multi threaded • Major redesign of session-related lifetime management in all plugins • Internal consolidation of C++ tests (CppUnit) and qmtest setup • Many other fixes and enhancements – most changes are in CORAL • COOL 2.8.10 • Performance fix: avoid excessive exception throwing (bug #79937) • Performance fix: faster BLOB reading in PyCool (task #20032) • For full details see the release notes(still in progress)

  5. Oracle client – three open issues • Oracle client redefines gssapi symbols (bug #71416) • SR 3-1977807081 – gssapi in libclntsh.so conflicts with libgssapi_krb5.so • Suggested that Oracle use versioned symbols (Oracle bug 10184681) • No workaround needed so far (problem not yet seen in production…) • Similar to bug in Globus that also redefines gssapi symbols (bug #70641) • Workaround: disabled gssapi from Xerces (used by CORAL) • Now fixed in 2011 EMI release – upgrade Grid middleware clients eventually? • Oracle client redefines kerberos symbols (bug #76988) • SR 3-3620145421 – krb5 symbols in libclntsh.so conflict with libkrb5.so • Suggested that Oracle use versioned symbols (Oracle bug 12557209) • Workaround in Oracle 11.2.0.1.0p3 client: fix kerberos in custom sqlnet.ora • New in LCG60c (see previous slide) • Crash in OCIStmtRelease outside session (bug #83601) • SR 3-3905928071 – crash in OCIStmtRelease after closing the session • Suggested that Oracle fails with error but no crash (Oracle bug 12733000) • Workaround in CORAL (do not call OCIStmtRelease outside session), but this may potentially lead to client memory leaks and server resource leaks? • Same crash (same stack dump) outside transaction • SR 3-1360883521 – crash in OCIStmtRelease after closing the transaction • Workaround in CORAL (add missing OCI call) with no negative side effects

  6. Frontier client upgrade • New 2.8.2 client available while releasing LCG60c • Main motivation: performance improvements (bug #84067) • Interleave decompression with socket reading and XML processing • It was too late to include it in the LCG60c release (there was no time for proper testing and validation) • LCG60c uses the 2.8.0 client (the same as in LCG60b) • Functional bugs in 2.8.2 were identified in CMS soon after the release: it was a good choice not to include it in a rush! • New 2.8.3 client now available with bug fixes • Still being built/installed in the AFS area • Will then be tested in the nightlies • Can then include it in a new LCG release or ATLAS patch • Not yet clear if a CORAL rebuild is necessary

  7. Oracle server move to 11g • IT-DB plans to upgrade all servers to 11g in January 2012 • Recommend both functional and performance tests beforehand • Do not expect bug surprises from either of them, but you never know • In contact with Gancho/Roman and IT-DB to run some tests • CORAL and COOL functional tests • Will run the CORAL and COOL test suites against 11g • Initially standalone, eventually in the nightlies • COOL performance tests • Will run standalone the COOL small-scale scalability tests • Test execution plan stability (missing statistics, bind variable peeking) • Does the optimizer treat COOL hints differently in 11g than in 10g? • Recommend larger scale ATLAS performance tests too

  8. POOL long-term support • POOL usage in ATLAS • Interface to ROOT for events, collections, conditions • Developed and used by ATLAS and LHCb • Collection packages (including TAG database) • Developed and used by ATLAS alone • LHCb is considering to drop POOL • To move to direct usage of ROOT • POOL implies an overhead but few benefits for LHCb • ATLAS would then remain the only experiment using POOL • Considering the move of POOL responsibility to ATLAS • Several meetings this week during the ATLAS software week • IT team already more involved with CORAL and COOL than POOL • Move of components to GAUDI also being discussed with LHCb • Long term support for CORAL and COOL also discussed • Reviewing resources in IT and the experiments

  9. Some other work in progress • API extensions to CORAL and COOL in the pipeline • e.g. many payload per IOV, replace ATLAS CoraCool (task #10335) • Postponed to later releases (breaks binary compatibility) • Complete improvements for ‘network glitch’ handling • Fixed Oracle crashes – now fix also the other plugins • Now address actual network glitch issues, beyond crashes • CORAL monitoring • Requested by HLT team – interest also in ATLAS • Related work on performance studies and optimizations • Prepare future changes in the development infrastructure • Port from CMT to cmake (with SPI and LHCb) • Move out of CVS eventually (to start with, of “LCG CVS”)

More Related