1 / 14

WLCG Middleware Validation

WLCG Middleware Validation. Markus Schulz IT/SDC. Landscape after EMI. Summary of the GDB presentation: https://indico.cern.ch/conferenceDisplay.py?confId= 197806 EGI produces UMD releases see Tiziana’s presentation at the GDB INFN (Cristina) populates the emi repository periodically

jam
Télécharger la présentation

WLCG Middleware Validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WLCGMiddleware Validation Markus Schulz IT/SDC

  2. Landscape after EMI • Summary of the GDB presentation: • https://indico.cern.ch/conferenceDisplay.py?confId=197806 • EGI produces UMD releases • see Tiziana’s presentation at the GDB • INFN (Cristina) populates the emi repository periodically • “blind” copy of binaryRPMs (dependencies can break) • this will end March 2014 • Simplified view: UMD == EMIrepo+ Staged Rollout • With EMIrepo == PTs + Cristina

  3. Other Services • ETICS ends in August (no impact) • WLCG Repository • Managed by WLCG CERN (Maarten) • HEP_OS libs, xrootd monitoring, info-xx, yaim, vobox.... • Mostly things that don’t fit into EPEL • UMD does NOT integrate these packages

  4. What do sites do? • (UMD or emi) + WLCG + PT packages • “WLCG Baseline” defines minimal versions • EGI + WLCG Operations Coordination drive transitions • developments are driven by the WLCG community

  5. Production Readiness Now • EGI Staged Rollout ensures that material that is in UMD can be installed and doesn’t fall over • finds certain issues +++ • mainly deployment related • smoke testing • doesn’t cover all major WLCG deployment scenarios • doesn’t cover all experiment use cases

  6. Problems • PTs release directly through the EPEL path • no emi QA and testing • no established inter product tests • focus is on self consistency within EPEL • RPMs might work or not • EPEL is based on continuous independent releases • UMD is based on snapshots • Not all material is in EPEL • WLCG repository • emi repository • no consistency test • Transition from EPEL-test to EPEL-stable is time driven • without active intervention the transition happens within 2 weeks

  7. What can WLCG do? • Fill the gap.... • Model: emi-1/2 WN verification • https://twiki.cern.ch/twiki/bin/view/LCG/WorkerNodeTesting • 6 contributing sites covering • all SE flavours • all experiments • all standard workflows • using a fraction of their resources

  8. How? • Turn the ad hoc solution into continuous operation • Adapt to the future release process • driven by EPEL and WLCG Repositories • EPEL-Test + WLCG-Test • Update frequently a small fraction of the resources • 10-50 cores/site • One instance of every service (globally) • Exercise these resources with experiment workloads • Best: inclusion into the production systems • small fraction of a small fraction of tasks will fail • Alternative: Invest in HammerCloud like testing • maybe more work and diverge after a while

  9. Current flow of middleware STOP! UMD site WLCG EPEL Stable EPEL Test EMI Updates are driven by WLCG Baseline, EGI, Users Site needs EPEL Koji (mock) cristina Binary Binary Source Binary PT PT PT PT

  10. Proposed flow of middleware UMD EGI/WLCG Verification STOP! Fast Track GO! site EPEL Stable EPEL Test WLCG Stable WLCG Test EPEL Koji (mock) WLCG Koji (mock) discouraged as the standard path ( OK for dCache etc.) CERN Agile Build? Source Source Binary PT PT PT

  11. EGI/WLCG Verification Part of standard workflows. Problem reports to WLCG Ops Coordination site C FTS etc. site B site A Additional validation instances of central services (mostly covered by common practice) Frequent updates, notification on re-config need, only on a fraction of the prod resources EPEL Test WLCG Test

  12. What is needed * a good name • Coordination • top level: WLCG Ops Coordination and EGI Staged Rollout • launch: Taskforce (WLCG+XXXXXX*) • Resources • hardware negligible (10-30 cores/site) • human effort • 0.1 FTE per participating site (not too many updates per month) • follow releases, re-config as needed, report issues..... • Sites • Candidates: T0/T1s and experienced T2s (about 6 sites needed) • need to participate in coordination too (rota on watching for re-config, first deployment etc.) • Experiments • targeting the validation resources • monitor the behaviour (might need small changes) • report issues • in general already happening, minor adjustments needed • 0.1 FTE per experiment

  13. Is this additional effort? • Probably not.. • We have done this in an ad hoc fashion • harder to coordinate • sometimes missing changes • complex communications • ------

  14. Timeline • Spring 2014 it has to work • Taskforce should start September • first activity: identify suitable sites • liaise with experiments • Resource commitments from sites latest by October • Taskforce will then coordinate the setup and development of procedures • and follow up on operations

More Related