120 likes | 247 Vues
This document provides a comprehensive summary of the achievements and ongoing tasks related to the Operational Tools M1 project at CERN. Highlights include the implementation of regionalized dashboards, deployment of GOCDB 4 schema, and advancements in Nagios-based monitoring systems. Key features, issues encountered, and timelines are outlined for various components such as the SAM portal and metric databases. The document also covers resource testing and monitoring details. For thorough insights, refer to shared resources and links for milestone tracking and architecture.
E N D
Operational Tools M1 Update James Casey SA1 Management Meeting CERN
Summary of milestone timeline We are here…
M1 Features - April 2009 DONE DONE DONE IN PROGRESS DONE DONE Regional Dashboard • ‘Regionalized’ dashboards at IN2P3 using current SAM tests for alarms GOCDB • Programmatic Interface (XML over HTTP) available • GOCDB 4 schema deployed with current data inserted for validation Configuration repositories • Aggregate Topology Provider (ATP) • What resources should I test ? • Metric Description Database • What tests should I use ? Gstat • First prototype of new monitoring (based on Nagios) done
M1 Features – April 09 DONE PENDING IN PROGRESS DONE IN PROGRESS DONE DONE DONE DONE ROC level nagios based monitoring available • Configured from Metric Description Database and ATP • ‘SAM Portal’ level of visualization complete Full Nagios testing of all resources in grid running • At CERN – Central system, simulating 11 ROCs • Used to validate equivalence to SAM • Availability calculation using current algorithm but with new metrics QR Reporting Portal (MSA1.3) • Initial version with metrics for job usage implemented Accounting • Central infrastructure for ActiveMQ based accounting deployed • Consumer and Producer developed
M1 Objectives summary Mostly completed • Issues understood and where appropriate new timelines in place Nagios probe equivalence took longer than expected and delayed some other components • ‘SAM Portal’, SLA Calculation Reduced effort at CERN due to re-hiring interviews and hardware provisioning delays slowed delivery of some components • ATP, Metric Store All details in ‘Operations Automation Team Milestone 1 Summary’ • https://espace.cern.ch/sa1-share/oat/Shared%20Documents/Milestones%20and%20Deliverables/EGEE-III-SA1-TEC-OAT-M1-Summary-v1_1.pdf
Support for Messaging FUSE Message Broker is the same as Apache ActiveMQ. All ActiveMQ developers are employed by Progress, Inc. • FUSE is an open-source rebundling of ActiveMQ. It’s the same code inside. There is no cost to ‘just use FUSE’. • Currently bus are being fixed sooner on the FUSE release than the Apache release. Issue we are trying to solve: Little expertise in SA1 and no resources for support FUSE (http://fusesource.com/) offers support to the activeMQ distribution we are using CERN is interested in getting a support contract for a set of core brokers(4-5?), paying for all of them till the end of EGEE III. This will solve the present problem of support and will increase expertise in the team • If somebody else wants to set up a broker later on with the Apache version software, there’s no problem – ActiveMq and FUSE versions interoperate The involved teams consulted through the OAT are happy with this
Resources https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III Architecture and components https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview Milestone tracking https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringMilestones