30 likes | 123 Vues
Report on recent system issues including corrupted file catalog in CVMFS, Nagios pipe file error, and MySQL table crashes, along with upgrades and interventions. Problems resolved with some still under investigation.
E N D
Tier0/Central Services • a corrupted pool file catalog in CVMFS created massive failures (weekend 3-4 Mar, savannah:126847). Problem fixed, but it took a while to propagate the modifications • SAM/Nagios stuck (GGUS:79883, 5 Mar): pipe file /var/nagios/rw/nagios.cmd corrupted, re-created by hand • acron problem on ATLAS nodes: excluded one problematic node, fixed (GGUS:79853, 5 Mar) • castoratlas upgrade (6 Mar), no problem • Site Services upgrade (6 Mar), no problem • Site Services: some voboxes crashing (7 Mar) for MySQL corrupted tables. Fixed, to be understood • GGUS problem in raising priority (GGUS:80121, 11 Mar): fixed • Lemon metrics to monitor tape migration on castoratlas not working properly (GGUS:80170, 14 Mar): still under discussion • new project tag for MCTAPE tokens (17 Mar): mc12_14TeV • AGIS API error “Unable to acquire Oracle environment handle” (17 Mar): under investigation
Tier1s/Tier2s • FTS upgrade in NDGF and UK (6 Mar): no problem • US network intervention (6 Mar), no problem • a few problems with SRMs and FTSs at Tier1s (PIC, FZK, NDGF, RAL, IN2P3), usually fixed in a very timely way • FTS upgrade in TRIUMF (19 Mar) • LFC migrtion in the French cloud (20 Mar) • minor problems in Tier2s, usually promptly fixed