1 / 22

WLCG infrastructure monitoring proposal

WLCG infrastructure monitoring proposal. Pablo Saiz IT/SDC/MI 16 th August 2013. Table of contents. Summary of the progress Desired structure of applications Proposal for infrastructure monitoring. Summary. Motivation. Reduction on number of people Redefining scope of applications

alisa
Télécharger la présentation

WLCG infrastructure monitoring proposal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16thAugust 2013

  2. Table of contents Summary of the progress Desired structure of applications Proposal for infrastructure monitoring Infrastructure monitoring P. Saiz

  3. Summary Infrastructure monitoring P. Saiz

  4. Motivation • Reduction on number of people • Redefining scope of applications • Combining expertise • Step out and evaluate other alternatives • Goal: • Offer (at least) same QoS with less resources Infrastructure monitoring P. Saiz

  5. Status so far WLCG monitoring consolidation group created Applications supported by the section Applications used … so now we know what to provide Infrastructure monitoring P. Saiz

  6. How to provide it • Input from our experience • Input from other groups • What is available out there • Split in different areas of work • Source of Information • Transport • Storage • Aggregation • Review of the areas • Visualization • Documentation • Deployment • Recurrent tasks Infrastructure monitoring P. Saiz

  7. Structure of applications Infrastructure monitoring P. Saiz

  8. Different layers of applications Transport Visualize Storage Collect information Aggregate Recurrent Tasks Documentation Deployment Infrastructure monitoring P. Saiz

  9. Deployment Transport Visualize Storage Collect information Aggregate Recurrent Tasks Documentation Deployment Deployment • Using openstack, puppet, hiera, foreman • Quota of 100 nodes, 240 cores • Multiple templates already created • Development machine (7 nodes) • Web servers (SSB, SUM, WLCG transfers, Job: 16 nodes) • Elastic Search (6 nodes), Hadoop (4 nodes) • Currently working on nagiosinstallation • Migrating machines from quattor to AI • Koji and Bamboo for build system and continuous integration Infrastructure monitoring P. Saiz

  10. Source of information Transport Visualize Collect information Storage Collect information Nagios GOCDB REBUS Aggregate OIM Savannah Recurrent Tasks Documentation Other app Deployment Gather info from external, internal sources. Publish it in the transport layer Infrastructure monitoring P. Saiz

  11. Transport Transport Transport Visualize Storage Collect information Aggregate Recurrent Tasks Documentation Deployment Message Broker Local files HTTP PUT/GET UDP (table in DB)? Infrastructure monitoring P. Saiz

  12. Storage Transport Visualize Storage Storage Collect information Accepts any data • #jobs, status of a service, downtime, pledges, channel status • Metric, Instance, Time Range, Value Archival • Long term data • (Same format as Metric Storage)? Current Metrics • Most common views Metadata • Profiles • Topology Archival Aggregate Current Metrics Recurrent Tasks Documentation Meta data Deployment Infrastructure monitoring P. Saiz

  13. Aggregation Transport Visualize Storage Collect information Summary Site readiness Aggregate Aggregate Availability Recurrent Tasks Documentation Deployment Treated as another metric Might collect input from previous metrics Current schema of ‘CMS Site readiness’ Infrastructure monitoring P. Saiz

  14. Visualization Transport Visualize Visualize Storage Collect information Server: • HTML skeleton • REST API with JSON data • Cache: memcache, varnish Client • Common library + plugin • jQuery • Common MVC • No obvious choice… • Plots (Interactive, Exportable, Embeddable) • Highcharts Aggregate Recurrent Tasks Documentation Deployment Infrastructure monitoring P. Saiz

  15. Infrastructure monitoring Infrastructure monitoring P. Saiz

  16. Current situation • Big system, difficult to maintain/evolve • Many internal dependencies • Multiple schemas, aggregations: • SSB, MRS, ACE • Scope much bigger than what we need • Limit to WLCG • Usage of probes • Does not test what the experiments are doing! • Non-trivial deployment of new tests • Based on technologies available at the time of the design • New requests from experiments: • Test whatever they want • Availability vs Usability • Combine Dashboard/SAM apps Infrastructure monitoring P. Saiz

  17. Infrastructure monitoring Transport Visualize Storage Collect information Nagios Pledge Archival MyWLCG SUM SSB Down Pilot Metrics Report Trend HC VO feed POEM Aggregate ACE Recurrent Tasks Documentation Deployment Infrastructure monitoring P. Saiz

  18. And for the prototype… Transport Visualize Storage Collect information Nagios Pledge Archival MyWLCG SUM SSB Simplified MRS • Accepts any data • No foreign keys! • No status calculation • 300K messages per day Down Direct Processed Data SSB format New Data Metrics Metrics Report Trend SSB Storage • Records status changes • Same procedure as any other metric HC VO feed POEM Aggregate ACE All the data in storage have the same format: • Instance, Metric, Time range, Value • Source could be nagios, pilot framework, VO-defined metrics, availabilities Recurrent Tasks Documentation Deployment consume2db Infrastructure monitoring P. Saiz

  19. And now we can see metrics… 19 14 August 2013 Infrastructure monitoring P. Saiz Infrastructure monitoring P. Saiz

  20. Aggregation • Combination of ACE +SSB Virtual Columns • Two types: • Horizontal: Ins1(M1…Mn) Ins1 (Mp) • Vertical: M1 (Ins1…Insn) Insp(M2) • Initial options for “and”, “or” of current status • Later on, might be extended to ‘sliding window’ • Full description Infrastructure monitoring P. Saiz

  21. Examples of aggregation ATLAS_CRITICAL WN Site (expand this column) Infrastructure monitoring P. Saiz

  22. Summary • Lots of progress towards unified schema • Data can be published from different sources • Nagios, VO-defined metrics, ACE, (HC, Job Pilots) • Single schema for storage • Components talk to each other through API • Getting close to a “proof of concept” • Aggregation needs some work • Visualization might need adjusting • Other tasks can go in parallel • NoSQL evaluation • Nagios configuration • Only active metrics Infrastructure monitoring P. Saiz

More Related