1 / 24

Online Monitoring with MonALISA

Online Monitoring with MonALISA. Dan Protopopescu Glasgow, UK. MonALISA. Is a distributed service able to: collect any type of information from different systems analyze this information in real time take automated decisions and perform actions based on it

vartan
Télécharger la présentation

Online Monitoring with MonALISA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK

  2. MonALISA Is a distributed service able to: • collect any type of information from different systems • analyze this information in real time • take automated decisions and perform actions based on it • optimize work flows in complex environments Read more at http://monalisa.caltech.edu

  3. Uses • Monitoring distributed computing, i.e. GRIDs • Optimizing flow in complex system (VRVS, optics cable networks) • ALICE also uses ML for monitoring online reconstruction • Some benchmark figures for the service: • ~ 800k monitored parameters at 50k updates/second • > 10k running (alien) jobs monitored simultaneously • > 100 WAN links We are proposing ML as a high level monitoring and possible control system along with (or on top of) existing slow controls systems as epics, pvss etc.

  4. Advantages • MonALISA is simple to install, configure and use • ApMon APIs are available in C, C++, Java, Python and Perl • ROOT plugin allows macros to send data directly to MonaLISA • Can easily interface with (or sit on top of) any existing or future slow controls subsystem (epics, pvss) • Data is stored in a standard PgSQL (or MySQL) database that can be accessed by other applications, independently of ML • Automatic data summarizing • Several data repositories (and hence DBs) can exist (local and remote) • Easy access via WebService (WS) from service and/or repository • Fully supported by development team; work is being done in this direction

  5. Based on monitored information, actions can be taken in: ML Service ML Repository Actions can be triggered by: Values above/below given thresholds Absence/presence of values Correlations between several values Possible actions types: External command Plain event logging Annotation of repository charts; RSS feeds Email Instant messaging Capabilities

  6. Components GUI LUS/Proxies Web Server Service Service ApMon Actions based on local information Repository ApMon ApMon ApMon Actions based on aggregated information Quick actions

  7. Service setup ML Service setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gz tar -zxvf MonaLisa.tar.gz cd MonaLisa/ ./install.sh cd ../MonaLisa/Service/CMD/ ./MLD start LUS Web Server Service Service ApMon Actions based on local information Repository ApMon ApMon ApMon Actions based on aggregated information Quick actions

  8. Repository setup ML Repository setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgz tar -zxvf MLrepository.tgz [configure it] cd MLrepository ./start.sh LUS Web Server Service Service ApMon Actions based on local information Repository ApMon ApMon ApMon Actions based on aggregated information Quick actions

  9. ApMon setup ApMon setup: wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gz tar -xzvf ApMon_perl.tar.gz cd ApMon_perl [create your script, say mysend.pl] perl mysend.pl LUS/Proxies Web Server Service Service ApMon Actions based on local information Repository ApMon ApMon ApMon Actions based on aggregated information Quick actions

  10. [monalisa@glasgow]$cat mysend.pl use ApMon; my $apm = new ApMon({"glasgow.jlab.org:8884" => {"sys_monitoring" => 0, "general_info" => 0}}); my @pair; while (1) {# loop forever # get values from somewhere @pair = getmypar(“pspec_logic_ai_0”); $apm->sendParameters(”Detector", “MOR”, @pair); sleep (20); } Simple monitoring script LUS Web Server Service Service ApMon Actions based on local information Repository ApMon ApMon ApMon Actions based on aggregated information Quick actions

  11. Time history example: [monalisa@glasgow]$cat mor.properties page=hist Farms=JlabML Clusters=Detector Nodes=MOR Functions=pspec_logic_ai_0 ylabel=Tagger rate title=MOR annotation.groups=2 Time history LUS Web Server Service Service ApMon Actions based on local information Repository ApMon ApMon ApMon Actions based on aggregated information Quick actions

  12. Web interface

  13. Java GUI

  14. Application control Your custom Java client • ML Clients • TCP based subscribe mechanism serialized, compressed objects with optional encryption • ML Proxies • Application commands are encrypted • ML Services • Standard and/or user’s sensors and/or application modules GUI client ML Repository Your custom view Key LUS Keystore ML Service Your mon module Your app module App MonC ApMon Your application bash Your Application

  15. Alert-based Actions MySQL daemon is automatically restarted when it runs out of memory Trigger: threshold on VSZ memory usage ALICE Production jobs queue is automatically kept full by the automatic resubmission Trigger: threshold on the number of aliprod waiting jobs Administrators are kept up-to-date on the services’ status Trigger: presence/absence of monitored information via instant messaging, RSS feeds, toolbar alerts etc.

  16. Summary • MonALISA is a very promising tool for online experiment monitoring and interfacing with a variety of slow control subsystems; GlueX are seriously considering ML for this task • Easy to configure, understand and use • Experience from Grid monitoring and more • Support from the developers group for implementation of new modules/features • Online experiment monitoring tests of CLAS@Jlab were recently carried on; demo repository is at http://mlr1.gla.ac.uk:7002

  17. More examples / Extras

  18. Integrated Pie Charts

  19. History Plots, Annotations

  20. AliEn Services Monitoring • AliEn services • Periodically checked • PID check + SOAP call • Simple functional tests • SE space usage • Efficiency

  21. Job Network Traffic Monitoring • Based on the xrootd transfer from every job • Aggregated statistics for • Sites (incoming, outgoing, site to site, internal) • Storage Elements (incoming, outgoing) • Of • Read and written files • Transferred MB/s

  22. Individual Job Tracking • Based on AliEn shell cmds. • top, ps, spy, jobinfo, masterjob • Using the GUI ML Client • Status, resource usage, per job

  23. Head Node Monitoring • Machine parameters, real-time & history, load, memory & swap usage, processes, sockets

  24. MonALISA in AliEn • The MonALISA framework is used as a primary monitoring tool for the ALICE Grid since 2004 • Presently the system is used for monitoring of all (identified) services, jobs and network parameters necessary for the Grid operation and debugging • The number of concurrently monitored and stored parameters today is ~ 300.000 in 75 ML Services • The add-on tools for automatic events notification allow for more efficient reaction to problems • The framework design and flexibility answers all requirements for a monitoring system • The accumulated information allows to construct and implement automated decision making algorithms, thus increasing further the efficiency of the Grid operations

More Related