140 likes | 284 Vues
This report summarizes the support incidents related to WLCG services over the last 7 weeks, highlighting alarm tickets from ATLAS and CMS. Since the last management board meeting on June 7, a total of 11 real alarm tickets were submitted, with corresponding issues resolved. Notably, there was an email notification failure affecting CERN operators due to a change in the sender’s email. The report includes specific details on alarms related to CERN's SRM connections and storage issues, as well as recommendations for adjusting totals based on any alarms during July.
E N D
GGUS summary (7 weeks) • To calculate the totals for this slide and copy/paste the usual graph please: • Take the summary from the table on pages: • https://gus.fzk.de/download/wlcg_metrics/html/20110718_escalationreport_wlcg.html • https://gus.fzk.de/download/wlcg_metrics/html/20110725_escalationreport_wlcg.html • 2. Copy file: • https://twiki.cern.ch/twiki/pub/LCG/WLCGOperationsMeetings/ggus-tickets.xls • Locally and add the 2 lines for 18-Jul and 25-Jul. Re-upload .xls on • https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOperationsMeetings • 3. Add up the last 7 weeks, starting 13-Jun (included) and put them in this table. • 4. Copy/paste the graph from the .xls file of point 2 above. 1
Support-related events since last MB NB!!!!!!!!!!!!!!! CHECK IF THERE ARE MORE ALARMS BETWEEN 13-24 July & adjust totals below!!!!!!!!!!! There were 11 real ALARM tickets since the 2011/06/07 MB (7 weeks), 9 submitted by ATLAS, 2 by CMS, all ‘solved’, some even ‘verified’, 10 of them for CERN and 1 for CNAF. The 1st 5 ALARM tickets for CERN did not generate the required email notification to the CERN operators and experts on call! This was due to a switch of the sender’s email address from helpdesk@ggus.org to apache@ggus.org that happened with the 2011/05/25 GGUS Release due to the new exim mailer at KIT. This was solved in the week of 2011/06/27 by including this new email address in the CERN [VO]-operator-alarm@cern.ch e-groups’ admins. All test ALARMs following the 2011/07/06 release were successful. Details follow… WLCG MB Report WLCG Service Report
ATLAS ALARM->CERN SRM connections fail GGUS:71471 WLCG MB Report WLCG Service Report
ATLAS ALARM->CERN SRM many errors GGUS:71715 WLCG MB Report WLCG Service Report
ATLAS ALARM->CERN Castor timeouts GGUS:71904 WLCG MB Report WLCG Service Report
CMS ALARM->CERN job stageout errors GGUS:71934 WLCG MB Report WLCG Service Report
CMS ALARM-> CERN Castor pool full GGUS:71969 WLCG MB Report WLCG Service Report
ATLAS ALARM-> CERN export fails with fts errors GGUS:71985 WLCG MB Report WLCG Service Report
ATLAS ALARM-> CERN t0merge writing errors GGUS:72132 WLCG MB Report WLCG Service Report
ATLAS ALARM-> CERN No space left on device in pools GGUS:72218 WLCG MB Report WLCG Service Report
ATLAS ALARM-> CERN castor pools’ writing hangs GGUS:72262 WLCG MB Report WLCG Service Report
ATLAS ALARM-> CNAF Monitoring shows zero space on datatapeGGUS:72473 WLCG MB Report WLCG Service Report
ATLAS ALARM-> CERN Castor no access to file GGUS:72528 WLCG MB Report WLCG Service Report
VOname ALARM->Site Service GGUS:xxxxx WLCG MB Report WLCG Service Report