Download
lcg middleware certification and support n.
Skip this Video
Loading SlideShow in 5 Seconds..
LCG Middleware Certification and Support PowerPoint Presentation
Download Presentation
LCG Middleware Certification and Support

LCG Middleware Certification and Support

68 Vues Download Presentation
Télécharger la présentation

LCG Middleware Certification and Support

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

  2. Where is the code? • CERN central CVS system  autobuild (see next page) • EDG/LCG code • http://isscvs.cern.ch:8180/cgi-bin/cvsweb.cgi/?cvsroot=lcgware • LCG configuration: • http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi • Everything else is “external” • Simplifies build • Complicates debugging • Need at least the sources • All RPMs under /afs/cern.ch/project/gd/RpmDir • LCG code guidelines adapted from EDG • http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/index.cgi • Documentation menu Maarten Litmaath, GridPP meeting, 2004/06/03

  3. The Builds • EDG autobuild system has been ported to LCG • http://lxshare0297.cern.ch/LCG/autobuild/ • Allows nightly build of latest compliant CVS tag per package • Build-on-demand tag triggers immediate build • Currently only RH 7.3 supported • Porting to RH Enterprise Linux underway in GD group • WN being tested • Collaboration with CERN OpenLab to port code + build recipes to IA-64 • CE + WN already included in EIS testbed • Other platforms being considered: • Fedora • RH 9 • RH 6.2 • Solaris • IRIX • … Maarten Litmaath, GridPP meeting, 2004/06/03

  4. The Certification • Resulting middleware must be integrated and then certified on all supported platforms • Also verify interoperability of all platforms • Complicates certification exponentially • Goal is production quality: • Stability, robustness, performance, scalability • Easy configuration, operation, maintenance • A lot of effort has been going into debugging • Get feedback from production system (e.g. rollout mailing list) • Send feedback to developers, but apply in-house patches in the meantime • See next talk by David Smith • Current “big” certification testbed shown on next page • Only RH 7.3 for now • Remote sites to be added (again) • Madison (VDT), Taipei, Budapest, … • Simulates multiple realistic configurations • Can test multiple platforms at the same time Maarten Litmaath, GridPP meeting, 2004/06/03

  5. Cluster_1 Cluster_2 Cluster_3 Cluster_4 Cluster_5 h275 UI_1 h276 UI_3 h234 RB_a h240 RB_b Certification & Testing Testbed h239 RB_3 h246 MyProxy h243 BDII_a h281 BDII_b h284 BDII_3 h285 UI_4 h235 CE_a h277 CE_2_a h237 CE_5 Condor lxs5243 CE_6 LSF h290 CE_3_a h286 CE_4 h241 CE_b h236 SE_a h278 SE_2_a h291 SE_3_a h287 SE_4 h244 WN_b1 h270 WN_5_1 lxs5238 h282 SE_c dcache h247 SE_2_b dcache share local /home h229 SE_3_b Castor h245 WN_b2 h296 WN_4_a1 lxs5239 h206 WN_5_2 lxs5240 h289 WN_b3 h294 WN_4_a2 h238 WN_a1 h248 pool dcache No home sharing lxs5241 h300 WN_3_a1 h303 SE_d Castor h271 WN_a2 No home sharing lxs5242 h288 WN_3_a2 h280 WN_2_a1 h279 WN_a3 No home sharing h230 WN_3_a3 h272 WN_2_a2 h273 WN_2_a3 rlscert02 RLS_Oracle No home sharing h274 WN_2_a4 Maarten Litmaath, GridPP meeting, 2004/06/03

  6. The Tests • Feature testing • Workload Management, Data Management, Information System, … • Job distribution with and w/o data constraints, resource saturation, proxy renewal • Data access, replica services • Different architectures/configurations • Try to simulate the production system to some extent • Stress tests • Performance should degrade gracefully, no crashes • Explicit error injection • Study system reaction • Security • One should not be able to bypass it • Experiments integration testing done by GD/EIS on their testbed Maarten Litmaath, GridPP meeting, 2004/06/03

  7. Certification, Testing & Release Cycle CERTIFICATION TESTING EXPERIMENTS INTEGRATION DEPLOYMENT EGEE fix problems new releases Integrate Experiments software installation Basic Functionality Tests LCG C&T section add features fix problems transmit problems Run Certification Matrix Testing experiments specific features RELEASE PRE-DEPLOYMENT GENERAL RELEASE Run Special Tests Certified release tag Release candidate tagged VDT fix problems new releases Maarten Litmaath, GridPP meeting, 2004/06/03

  8. Typical Certification Matrix • Errors reflect ongoing development • Details available through links • An LCG release candidate must not have any serious errors reported by the test suites Maarten Litmaath, GridPP meeting, 2004/06/03

  9. The Tasks • Web page to open bugs and tasks: • https://savannah.cern.ch/projects/lcgoperation/ • Main task: stabilize LCG-2 • Allow serious work to get done efficiently • Minor remaining inconveniences should be tolerable • To be addressed by EGEE/ARDA • Main ingredients • dCache • Porting to RH 7.3 successors • Redo Replica Manager core • Flexible info providers  corresponding changes in WP1/WP2 code • Shield CE against overload risk • … Maarten Litmaath, GridPP meeting, 2004/06/03

  10. More Tasks • Try and follow Globus releases (via VDT) • Use the VDT more: • Helps EU-US interoperability • Try more functionality already provided by VDT • Condor as default batch system? • PacMan? • Try and put more into the VDT • Try R-GMA for monitoring • Combine with GridICE • Get rid of MDS completely • LCFGng  Quattor • … Maarten Litmaath, GridPP meeting, 2004/06/03