40 likes | 163 Vues
Join us for an insightful workshop led by James Casey, focusing on the importance of better site monitoring to improve reliability within the WLCG framework. This session will cover the traditional usage of Nagios and its applications for monitoring grid services, with a specific emphasis on EGEE/gLite systems. Learn how Regional Operations Centers (ROCs) use Nagios to detect issues and enhance historical monitoring results. We aim to share knowledge and practical insights gained over 1.5 years of experience with Nagios in WLCG.
E N D
Monitoring tutorial WLCG workshop 24th April 2008 James Casey, IT-GS-MND
Welcome • Why the tutorials? • Better monitornig at the site improves site reliability • And therefore reliability of the grid • We’ve been working with Nagios in WLCG for 1.5 years now • Trying to transfer this knowledge Presentation title - 2
Outline • Nagios Introduction • And ‘traditional’ Nagios usage • Nagios for monitoring grid services • Slight focus on EGEE/gLite • Other topics • How the ROC would use Nagios to monitor you • And tell you when they see problems • What (one) site thinks of all this • Better historical monitoring of results • The OSG view • Using Nagios in a slightly different way Presentation title - 3
Questions about how to do anything you see today: https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo wlcg-monitoring-discuss@cern.ch Thank you ! Presentation title - 4