1 / 17

WLCG Monitoring – An overview

WLCG Monitoring – An overview. LHCOPN Meeting Madrid, 11 th March 2008 James Casey. The WLCG Monitoring Vision. Show stakeholders the state of the global WLCG infrastructure, and its historical evolution, in order to improve the availability and reliability of this infrastructure.

lorand
Télécharger la présentation

WLCG Monitoring – An overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WLCG Monitoring – An overview LHCOPN Meeting Madrid, 11thMarch 2008 James Casey

  2. The WLCG Monitoring Vision Show stakeholders the state of the global WLCG infrastructure, and its historical evolution, in order to improve the availability and reliability of this infrastructure WLCG Monitoring - 2

  3. What is monitoring for us? • Service Availability/Reliability • Service Status provided • Availability + Reliability calculated • Usage records • Gridftp, SRM, Grid Job execution • One record per task, or one record per state change • Accounting information • Daily rollups of Usage • Right now, distributed debugging/service management not in scope WLCG Monitoring - 3

  4. Double vision • Sites and the experiments • Both have a view of this data • Very complicated stack to trace through • Try to connect the two perspectives • Important for site managers and project management • Especially sites which support more than one experiment WLCG Monitoring - 4

  5. Complexity ! WLCG Monitoring - 5

  6. Strategy to simplify • Operations • Delegate to regional entities • Provide some tools to have a “global” view • Tools • Delegate to experts • Standardize on information interchange schemas + protocols • Reporting • Lightweight metric collection • Of metrics that are useful for site managers or project management • Reporting on top of this • For project management WLCG Monitoring - 6

  7. Infrastructure and tools - Message Bus • WLCG Monitoring Working Group • Aim to consolidate the current monitoring effort • Single message bus for data interchange • With reliable message delivery • Message persistence • Isolate producers and consumers from each other • Define the message schemas and protocols • Provide bridges/adaptors as needed • NMWG ? WLCG Monitoring - 7

  8. Broker at the centre .. A Strategy for WLCG Monitoring - 8

  9. Leverage the underlying infrastructures • WLCG is a virtual infrastructure built on top of other physical infrastructures • Added value ? • From interoperation and exchange of information between the systems • Provide information not available only in one • Don’t add too many layers • Enough exist already ! • E.g Our MoUs should be defined related to the SLA/MoU of the infrastructures WLCG Monitoring - 9

  10. LHCOPN and monitoring • Availability/Reliability • Provide E2E link status • Create bridge from LHCOPN monitoring to WLCG monitoring • Usage records • At the individual flow level is too detailed • Summary statistics should be ok • Aggregate rates as seen E2E • No need to expose internal complexity • Always ask “How could a site admin use this?” • Reporting • Operational statistics • MoU reporting WLCG Monitoring - 10

  11. LHCOPN and Operations • What’s the requirements of LHCOPN? • Notification of grid ‘users’ of • Service interruptions • Status of problem investigations • Mechanism for grid users to raise problems against LHC OPN • GGUS is too complicated for the problem • ‘300 supporters’,TPM in the loop • Perhaps a simpler solution works for notifications • “Dashboard” (from Dans presentation) • Good experiences in CCRC’08 WLCG Monitoring - 11

  12. MoU compliance reporting • We agreed to try and measure MoU metrics during CCRC’08 • To evaluate if we can actually do it ! https://prod-grid-logger.cern.ch/elog/CCRC'08+Logbook/ 12

  13. Mapping to MoU Services • Current availability is per-service • Map grid services status (from SAM) to MoU categories • These are “custom” service availability calculations • LHC OPN • Can provide “Networking services to/from T1” LHC OPN 13

  14. ServiceMap • What’s a ServiceMap? • It’s a gridmap with many different maps, showing different aspects of the WLCG infrastructure • What’s the CCRC’08 ServiceMap? • Service ‘readiness’ • Service availability • For VO critical services • VO Functional blocks • A single place to see both the VO and the infrastructure view of the grid • For all stakeholders 14

  15. CCRC’08 ServiceMap …Demo… http://gridmap.cern.ch/ccrc08/servicemap.html 15

  16. What are the VO functional blocks ? • Functional blocks for LHC experiments are similar to a large extent • Allows for a site to compare the service they provide for different experiments • e.g - functional blocks for ATLAS and CMS for CCRC08 ATLAS CMS Data archiving at T0 Data transfer from T0 Data processing at T0 CAF Data archiving at T0 Data processing at T0 Data transfer from T0 T0 Data archiving at Tier1 Processing at Tier1 Data transfer T1-T1 Data transfer T1-T2 Data transfer T1-T1 Data transfer T1-T2 T1 Processing at Tier1 MC production at T2 Analysis at T2 Data transfer T2-T1 MC production at T2 Analysis at T2 Data transfer T2-T1 T2 Julia Andreeva, CERN, 04.03.2008 F2F meeting

  17. Summary • WLCG monitoring needs for LHCOPN are modest • Providing service status information would satisfy MoU availability requirements • We calculate availability/reliability according to our algorithms • Need downtime information too for this • How to satisfy MoU response time? • This is still a wider problem for us • Test some simpler notification systems • elogger + RSS feed ? WLCG Monitoring - 18

More Related