1 / 28

Service Level Status Overview project Sebastian Lopienski CERN, IT/FIO

Service Level Status Overview project Sebastian Lopienski CERN, IT/FIO HEPiX meeting, Jefferson Lab, October 10 th , 2006. Agenda. Overview of the project Concepts service, subservice, metaservice availability vs. status Key Performance Indicators Demonstration Your own SLS instance?.

Télécharger la présentation

Service Level Status Overview project Sebastian Lopienski CERN, IT/FIO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Service Level Status Overview project Sebastian Lopienski CERN, IT/FIO HEPiX meeting, Jefferson Lab, October 10th, 2006

  2. Agenda • Overview of the project • Concepts • service, subservice, metaservice • availability vs. status • Key Performance Indicators • Demonstration • Your own SLS instance?

  3. The need • What is current availability of the CVS service? • Which services are still affected by the power cut last night? • If my service is in maintenance, what other services will be affected? • What is overall status of all services used by ATLAS experiment?

  4. Service Level Status Overview (SLS) • Aim: • To provide a web-based tool that dynamically shows availability, basic information and statistics about IT services, as well as dependencies between them. • For whom? • service users • department and CERN management • other service providers • manager of the given service

  5. First insight

  6. Features • collecting and displaying service information,status and availability • dependencies and reverse dependencies • service incidents, scheduled interventions • hierarchical structure of services • configurable views of services • charts of availability trends over time • statistics of availability (and other values) • Key Performance Indicators (KPIs)

  7. Architecture we collect and display information but we don’t generate it!

  8. Architecture

  9. Agenda • Overview of the project • Concepts • service, subservice, metaservice • availability vs. status • Key Performance Indicators • Demonstration • Your own SLS instance?

  10. Services, metaservices etc.

  11. What is service availability? • Service availability indicates to what extent a given service is accessible and useful for its users • Services should be monitored from users’ point of view • a user doesn’t care about alarms on machines running the service • In SLS, service availability is a number N: 0 ≤ N ≤ 100

  12. Service availability and status Service fully (100%) available Service available at 95%, still marked as fully available • above the highest threshold Service available at 87%, marked as affected • below the highest threshold Service available at 50%, marked as degraded • below the medium threshold Service available at 13%, marked as not available • below the lowest threshold Service info expired, update not available Scheduled outage or maintenance Different status thresholds mean different status for services with the same availability (more at http://cern.ch/SLS/help.php)

  13. Key Performance Indicators • KPIs are metrics that indicate whether a service meets its requirements (performance or other) • Examples of Key Performance Indicators: • % of availability of CPU servers (how many machines in production out of total) • % of AFS volumes and servers available,also breakdown by VO • CPU delivered to VO as compared to quota,% of usage from Grid • KPI is a pair of two values: measured and target

  14. Agenda • Overview of the project • Concepts • service, subservice, metaservice • availability vs. status • Key Performance Indicators • Demonstration • Your own SLS instance?

  15. SLS instance at CERN • http://cern.ch/SLS(NICE password required) • all availabilities shown there are real and up to date • inline SLS view for a given service(e.g. at http://cern.ch/CVS)

  16. SLS instance at CERN Most IT services are covered by SLS: • Administrative applications • Windows, Mail, Web services • AFS, lxbatch, lxplus, Backup, Tapes, Remedy, Lemon • CVS services, J2EE Public Service, EDMS • databases • LCG Tier-0 and 1 sites • Indico, CDS, CRBS, VRVS etc. Metaservices and views: • logical structure, group structure, VO-oriented structure

  17. Agenda • Overview of the project • Concepts • service, subservice, metaservice • availability vs. status • Key Performance Indicators • Demonstration • Your own SLS instance?

  18. Setting up an SLS instance • Simple installation from an RPM • for SLC3 and SLC4 • see: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSAdminDocumentation • No CERN-specific dependencies • Requirements • Apache, Python, PHP (with DOM and OCI8 extensions) • Xerces-C >= 2.3 • JpGraph and GD library • cx_Oracle (for the database functionality) • Comes with one service predefined – SLS itself • Released under the EU DataGrid software license • a BSD-style license

  19. Adding a new service • Service manager has to: • have an idea how to measure service availability • and a piece of code that calculatesthe availability percentage value (0..100) • Then, follow the two simple steps: • prepare a static service description XML fileand send it to us (once) • make service update XMLs available via HTTP • SLS Manual for Service Managers provides detailed instructions, and many examples of XMLs: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM

  20. Minimal static XML example <?xml version="1.0" encoding="UTF-8"?> <service xmlns="http://sls.cern.ch/SLS/XML/static"> <id>DFS</id> <fullname>DFS (Distributed File System)</fullname> <datasource> <url> https://websvc02.cern.ch/winservices-soap/... </url> </datasource> </service> Example of static service description XML with more information: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#Static_XML_with_more_information

  21. Service managers … <servicemanagers> <servicemanagermain="true" login="ungil"> Carlos Ungil </servicemanager> <servicemanager> Maciej Stepniewski </servicemanager> <servicemanager login="wtomlin"> William Tomlin </servicemanager> </servicemanagers> … Contact data from LDAP

  22. Service dependencies … <dependencies>  <dependencylevel="dependson">AFS</dependency>  <dependencylevel="uses">Castor</dependency></dependencies>   … • Two different levels of dependency: • dependson - means that the service will not work if AFS is down • uses- means that the service uses Castor (for example for backup), but will work fine (or almost fine) even if Castor is not available

  23. 0 30 70 80 100 Status thresholds … <availabilitythresholds> <thresholdlevel="available">80</threshold> <thresholdlevel="affected">70</threshold> <thresholdlevel="degraded">30</threshold> </availabilitythresholds> …

  24. Minimal update XML example <?xml version="1.0" encoding="utf-8"?> <serviceupdate xmlns="http://sls.cern.ch/SLS/XML/update"> <id>CVS</id> <availability>100</availability> <timestamp> 2006-03-14T14:20:27+01:00 </timestamp> </serviceupdate> Example of availability update XML with more information: https://twiki.cern.ch/twiki/bin/view/FIOgroup/SLSManualForSM#Update_XML_with_more_information

  25. Making update XML accessible via http • Generate update XMLs with any server-side language / technology / platform: • PHP, Perl, Python, CGI, ASP • .Net: C#, J2EE: Servlets, JSP • or: Refresh periodically (from a cron) a fileand make it available via http • or: Write a Lemon sensor providing service availability • Advice and examples in the SLS Manual for Service Managers

  26. Observations • Trusting service managers • there is no way to cross-check availability figures provided by services • User expectations • Is it really real-time? • My mailbox/CVS repository/J2EE container doesn’t work, but the service is green! • Surprisingly, convincing service managers to join in was not that difficult

  27. Summary • SLS shows availability and status of services as seen by users • SLS is a flexible and informative display covering the entirety of computing services • SLS collects and displaysinformation provided by the services • SLS is available for use outside CERN

  28. Thank you! SLS instance at CERN (password protected)http://cern.ch/SLS Sebastian.Lopienski@cern.ch Questions?

More Related