1 / 17

Grid Monitoring Tools

Putchong Uthayopas Director High Performance Computing and Networking Center Faculty of Engineering, Kasetsart University Thailand. Grid Monitoring Tools. Introduction. The ability to monitor and manage large scale distributed system is very important

xiang
Télécharger la présentation

Grid Monitoring Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putchong Uthayopas Director High Performance Computing and Networking Center Faculty of Engineering, Kasetsart University Thailand Grid Monitoring Tools

  2. Introduction • The ability to monitor and manage large scale distributed system is very important • Determine the source of performance problem • System tuning for both application and system software • Fault detection and Recovery • Support for advance services such as prediction system (NWS), Grid scheduler, accounting service

  3. Some of the Monitoring Entities • Software ( Application and System Software) • Behavior (cpu, memory, disk usage, message, event generated) • Platform • Resources usage • Processor, I/O , memory • Network status • Bandwidth, latency • Availability

  4. Challenges for Grid Monitoring • Scalability across wide area network • Lage number of entities to monitor • Large number of properties to monitor • Large data strorage requirement for monitoring data • Timeliness delivery of data across wide area network • Heterogeneity • Platform, Protocol • Create interoperability problem • Integration with Grid middleware in term of security and naming

  5. Grid Monitoring Architecture • Defined by Grid Performance Working Group of Grid Forum document GWD-Perf-16-1 • Consists of 3 types of components • Directory for Resource Discovery • Producer : make performance data available • Consumer : use the performance data • These components communicate using events Consumer Publish/ Query Directory Services Events Publish/Query Producer

  6. Grid Monitoring Architecture • Directory Service • used by producer and consumer to “discover” each other and shared some characteristics • Service Models • Consumer initiated • Query/ subscribe (stream) • Producer initiated • Push event/ push stream

  7. Some tools • NWS (Network Weather Services) • Monitoring of system, network and predict traffic • Netlogger Toolkit • Toolkit for integrated application and system monitor • host/network monitoring tool, client library, and Simple visualization tools • Ganglia • System monitoring of cluster and some grid extension • Iperf – measure internet bandwidth from point to point • MRTG – Popular Network Data Graphing Tool • FlowScan – Network Flows measurement based on netflow's archtecture • Many more tools

  8. Grid Observer Project (HPCNC/KU) • Objective • Building technology and tools for grid and cluster monitoring and performance analysis • Setup a “Grid Observatory” that • Monitor grid status , Send report/ alert • Collect and distribute monitoring data for further analysis • Explore the deployment of existing tools • Software being developed evolve from our monitoring Service in OpenSCE

  9. Features • Cluster and grid monitoring package • Support system monitor such as processor utilization, I/O, network, memory, temperature • Graph base presenter with Web interface and RRD (Round-Robin Database) tools • Simple grid support • Simple notification service ( a simple analyser)

  10. Observer Recursive Monitoring Structure Analyser Collector Presenter Data Analyser Collector Presenter Data Other Monitoring System (SNMP, NWS, Ganglia etc. ) Sensors Sensors

  11. Components • Sensor is a producer that generates the performance information • Support multicast and query mode • Seperate hardware access layer • Collector : Hybrid producer and consumer • Consumer of sensor data and also act as a producer for the next level • Presenter : consumer that visualize the information • Analyser : Analyser information collected

  12. Sensors API App Sensor Sensor Sensor Sensor Sensor Multicast Channel - Dynamic Loadable Plugin - multithreaded Sensor Core Plugin Plugin Plugin HAL

  13. Demo Demo!

  14. Work in Progress • Reliability analyser • Interface to other system ( SNMP, Ganglia, NWS) • Event schema ( suppose to be done by GGF) • Better presenter • Redesign many ad-hoc part • More sensors. C language client sensors library is available in other project, not integrated into this yet • Bandwidth measurement • Better Observatory Site (currently at observer.cpe.ku.ac.th) • Add grid security support

  15. Todo list • Interoperability • More detailed framework / architecture • Event schema • Data storage format • Common monitoring protocol • monitoring capabilities definition • Collaborate with GGF various working group

  16. Thank you. Question and Answer? End of Presentation

  17. Issues • Monitoring data is currently stored in file • Avoid the overhead of data movement • Cope with much larger data set than being stored in directory • Remote access is still difficult, lock Presenter to the same node as data • Uniform way to handle • System and application event (dynamics) • Structural data such as system configuration (static)

More Related