1 / 67

HP-UX Monitoring Standard

HP-UX Monitoring Standard. B&DO IES. HP-UX Monitoring Standard. General HP O pen V iew O perations 6.x (OVO 6) Standard all over Europe Trade and HP Account Mainly based on WW standard tools (known as GII) Divided into default and additional tools

dvargo
Télécharger la présentation

HP-UX Monitoring Standard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HP-UX Monitoring Standard B&DO IES

  2. HP-UX Monitoring Standard General • HP OpenViewOperations 6.x (OVO 6) • Standard all over Europe • Trade and HP Account • Mainly based on WW standard tools (known as GII) • Divided into default and additional tools • Also available or in preparation for other UX-platforms (solaris, AIX, linux, Tru64, …)  not yet ready and not part of this training B&DO IES Event Management

  3. HP-UX Monitoring Standard Contents (Default Tools) Area Tool/Template • Processes ps_mon • File Systems df_mon • LVM vol_mon • Printing (lp only) lp_mon • Kernel-messages dmsg_mon • syslog-messages syslog B&DO IES Event Management

  4. HP-UX Monitoring Standard Contents (Additional Tools Operating System) Area Tool/Template • serviceguard sg_mon • NFS nfs_mon • swapspace swap_mon • cron cronlog, cron_mon • mail mailqueue, mail.log • su sulog • login attempts btmp • system startup rc.log • disk arrays disk_array • housekeeping housekeep B&DO IES Event Management

  5. HP-UX Monitoring Standard General mechanisms • in most cases: scripts/binaries triggered by OVO • local configuration possible • local configuration preferred against default configuration • other cases: default logfiles read by OVO logfile encapsulator B&DO IES Event Management

  6. HP-UX Monitoring Standard General mechanisms (cont’d) File locations: • executables /var/opt/OV/bin/OpC/cmds or possibly/var/opt/OV/bin/OpC/monitor • logfiles /var/opt/OV/log/OpC • default configuration /var/opt/OV/bin/OpC/cmds • local configuration /var/opt/OV/conf/OpC • temporary files /var/opt/OV/tmp/OpC B&DO IES Event Management

  7. HP-UX Monitoring Standard General mechanisms (cont’d) In some cases (ps-mon, df_mon, swap_mon) processing of message can be configured: Possibility to choose if message will be • routed to the troubleticket-interface (EWM) • sent out to the notification interface (JET) • sent to both • kept on the browser B&DO IES Event Management

  8. HP-UX Monitoring Standard process-monitoring with ps_mon B&DO IES Event Management

  9. HP-UX Monitoring Standard • Process monitoring: ps_mon • Binary, triggered every 15 minutes by OVO • checks the following: • process not running • too few instances of process running • too many instances of process running • cpu utilisation of process • size of process • configuration required • default cfg-file: /var/opt/OV/bin/OpC/cmds/ps_mon.cfg • local cfg-file: /var/opt/OV/conf/OpC/ps_mon.cfg B&DO IES Event Management

  10. HP-UX Monitoring Standard ps_mon: configuration ############################################################################# # File: ps_mon.cfg # Description: The ps_mon Configuration file # Package : Concorde - CONC_UNIX # Version: A.01.02 # Syntax: # <Process name> <Severity> <Instances> [<Group/Appl. Name> [<Schedule>]] # [; <Mode> [<Arg_string>; <Cmd_string>]] # [*PINFO <Max %CPU> <Max Size>] # [*ACTION <Command String>] # # <Schedule> = <start time>-<end time> <day of week>[,<day of week>] # [<Schedule>] # <Mode> = c|v|o|n (command | verbatim | option | none) # # Note 1: Some processes change names after they are invoked so be sure # to use the name as listed by "ps -ef" (on HPUX systems). # # Note 2: The "*PINFO" and "Mode" parameters are only available on HP-UX. # ############################################################################# B&DO IES Event Management

  11. HP-UX Monitoring Standard ps_mon: configuration (cont’d) # # Examples: # # Check if exactly one example process is running with cmd option. # # example warning 1; o cmd; cmd # # Check if exactly one example process is running with cmd1 cmd2 options where # where cmd1 takes two args. # # example warning 1; o cmd1 arg arg cmd2; cmd1 cmd2 # # or equivalently. # # example warning 1; o cmd1 arg arg cmd2; cmd2 cmd1 # example warning 1; o cmd2 cmd1 arg arg; cmd1 cmd2 # example warning 1; o cmd2 cmd1 arg arg; cmd2 cmd1 # B&DO IES Event Management

  12. HP-UX Monitoring Standard ps_mon: configuration (cont’d) # For backwards compatibility, if cmd options are prefixed by '-', i.e. "-D", or "-E" # then <Cmd_string> does not have to be specified. For example, the following two # statements are equivalent. # # (1) example warning 1; o -D arg -E # (2) example warning 1; o -D arg -E; -D -E # # Statement (1) and (2) are both legitimate; however, the syntactic form of statement # (1) is defunct. To ensure compatibilty with future versions of ps_mon, please # follow the syntactic format of statement (2). Again, note that statements of the # form in (1) applies *only* when cmd options are prefixed by '-', whereas all # statements of the form in (2) will work. # # Check if netscape process size is more than 1000K. Do not care about the # actual number of processes. # # netscape warning - # *PINFO - 1000 # # Check if at least one sendmail process is running. If not start it. # # sendmail major 1- # *ACTION /sbin/init.d/sendmail start # ############################################################################# B&DO IES Event Management

  13. HP-UX Monitoring Standard ps_mon: configuration (cont’d) check if process midaemon is running in exactly one instance, Monday till Friday from 06:00 till 22:00; if not send a warning message with the object MWA. midaemon warning 1 MWA 0600-2200 1,2,3,4,5 check if process sendmail is running in at least one instance; if not, restart it. sendmail major 1- *ACTION /sbin/init.d/sendmail start Caution: There will be a message raised that the process is not running but in the message you won’t see that the process was restarted! For this, you need to check the ps_mon.log manually (var/opt/OV/log/OpC/ps_mon.log) B&DO IES Event Management

  14. HP-UX Monitoring Standard ps_mon: configuration (cont’d) Special feature: processing of the message: • adding a special prefix to the <group> - parameter causes special treatment of the message: • TT_<group>  message will go to the TroubleTicket-interface (EWM) • N_<group>  message will be sent to the notification interface (JET) • TN_<group> or NT_<group>  message will be sent to both (EWM & JET) Parameters of messages : • Application: HPUX_ps_mon • Message Group: Job • Object: <group> (the parameter specified in the cfg-file will be used) Make sure that appropriate mappings are setup in EWM or JET (or both) ! IMPORTANT: The default configuration file contains NO entries! Current version of ps_mon: 1.5.1.9 B&DO IES Event Management

  15. HP-UX Monitoring Standard ps_mon: configuration testing ps_mon [-s <sleep interval>] [-f <configurationfile>] [-l <logfile>] [-d [<debugfile>]] [-w <waittime>] [-t] [-g] [-q <quiet period>] -s <sleep interval> run program in daemon mode with time in minutes between checking. To specify time in seconds append ‘s’ (e.g. –s 10s) -f <configurationfile> optional path to the configuration file. If omitted, default configuration file is used (/var/opt/OV/bin/OpC/cmds/ps_mon.cfg) -l <logfile> optional path to the logfile. If omitted, default logfile will be used (/var/opt/OV/log/OpC/ps_mon.log) -d [<debugfile>] run in debug mode and optionally write debug message to file. If <debugfile> does not exist, output goes to stderr. -w <waittime> minutes to wait at startup before beginning to monitor -t prefixes event messages with timestamp -g group event messages. Messages with the same group and event condition are combined into one message -q <quiet period> If a process is not running and an autoaction is specified, then the autoaction is given a chance to run. <quiet period> is the number of seconds (5-120) to wait for the autoaction to finish successfully before an error will be reported IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management

  16. HP-UX Monitoring Standard diskspace-monitoring with df_mon B&DO IES Event Management

  17. HP-UX Monitoring Standard • Diskspace monitoring: df_mon • Shellscript, triggered every 15 minutes by OVO • checks the following: • percent of disk space utilization • percent of inode utilization • configuration required • default cfg-file: /var/opt/OV/bin/OpC/cmds/df_mon.cfg • local cfg-file: /var/opt/OV/conf/OpC/df_mon.cfg B&DO IES Event Management

  18. HP-UX Monitoring Standard df_mon: configuration ############################################################################# # File: df_mon.cfg # Description: The Diskspace Monitor Configuration file # Package : Concorde - UXSM # @(#) $Header: /Concorde/uxmon/df_mon.cfg 1.4 2002/02/09 00:22:06 skwok Exp $ # Version: A.01.00 # # <Filesystem> exclude # <Filesystem> <space> [<inode> [<Severity> [<Schedule>]]] [ ; <group>] # [*ACTION <action>] # # <space> = space utilization threshold in percents (decimal value) # <inode> = inode utilization threshold in percents (decimal value) # Note: Monitoring of INODES is not supported on SunOS # <Severity> = warning|minor|major|critical # <Schedule> = hhmm-hhmm [<Daylist> [<Schedule>]] # <Daylist> = n[,<Daylist>] | * # where n represents day of a week starting with # Sunday=0 and Saturday=6; # * means all days # <group> = group name to associate with event - note the space # before the ";" that must be there to separate it from # the preceding fields Ex: "* 95 95 warning ; support_grp" # # <action> = action(program) to be called af event occurs B&DO IES Event Management

  19. HP-UX Monitoring Standard df_mon: configuration (cont’d) # # Use '-' in place of <space> or <inode> to skip threshold parameter # If parameter is not specified the checking of that value is skipped. # # Use '*' in place of <Filesystem> to specify ALL filesystems # ############################################################################# /var 80 99 major /var 97 99 critical /usr 80 99 warning /tmp 85 99 warning /tmp 95 99 critical * 95 95 warning * 99 99 critical ############################################################################# # end of df_mon.cfg ############################################################################# B&DO IES Event Management

  20. HP-UX Monitoring Standard df_mon: configuration (cont’d) Special feature: processing of the message • adding a special prefix to the <group> - parameter causes special treatment of the message: • TT_<group>  message will go to the TroubleTicket-interface (EWM) • N_<group>  message will be sent to the notification interface (JET) • TN_<group> or NT_<group>  message will be sent to both (EWM & JET) Parameters of messages : • Application: HPUX_df_mon • Message Group: OS • Object: <group> (the parameter specified in the cfg-file will be used) Make sure that appropriate mappings are setup in EWM or JET (or both) ! Current script-version: 1.22 B&DO IES Event Management

  21. HP-UX Monitoring Standard df_mon: configuration testing (usage) df_mon.sh [-t] [-a] [-f <config file>] [-l <logfile>] -t append timestamp at the beginning of each message -a append actual value to event message -f <configuration file> use <configuration file> as configuration file instead of default (/var/opt/OV/bin/OpC/cmds/df_mon.cfg) -l <logfile> write event messages into <logfile> instead to a standard output. IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management

  22. HP-UX Monitoring Standard volume-monitoring with vol_mon B&DO IES Event Management

  23. HP-UX Monitoring Standard • HP-UX volume monitor: vol_mon.sh • Triggered every 15 minutes by OVO • Checks • Logical volumes for stale extents • Volume groups status • Mount information (by default if all volumes specified in /etc/fstab are mounted). • local configuration possible B&DO IES Event Management

  24. HP-UX Monitoring Standard • vol_mon: configuration: ############################################################################# # File: vol_mon.cfg # Description: The Volume Monitor Configuration file # Package : Concorde - UXSM # @(#) $Header: /Concorde/uxmon/vol_mon.cfg 1.2 2002/09/16 22:13:47 skwok Exp $ # Version: A.01.00 # # <Filesystem> # # This configuration file for volume monitor is optional and only useful for # specifying file systems which should be mounted but don't appear in the # file /etc/fstab (e.g. ServiceGuard file systems). The configuration file # consists of one-line entries each specifying a separate filesystem. # # Blank lines and comment lines beginning with a "#" are ignored. Also, # extra fields after the filesystem entry on the same line are ignored. # This is useful for specifying the disk space monitoring configuration file # (df_mon.cfg) as the configuration file for volume monitor. # ############################################################################# ############################################################################# # end of vol_mon.cfg ############################################################################# B&DO IES Event Management

  25. HP-UX Monitoring Standard • vol_mon: configuration testing vol_mon.sh [-v] [-c <configfile>] [-l <logfile>] -c <configfile> use <configfile> as configuration file -l <logfile> write event messages into <logfile> instead to a stdout -v verbose mode IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management

  26. HP-UX Monitoring Standard lp print monitoring with lp_mon B&DO IES Event Management

  27. HP-UX Monitoring Standard • HP-UX lp print monitor: lp_mon.sh • Triggered every 15 minutes by OVO • Needs binary lpinfo • Checks • printer queue length • time of print request in queue • active time of print request • status of spooler and printer • local configuration possible and required • default cfg-file: /var/opt/OV/bin/OpC/cmds/lp_mon.cfg • local cfg-file: /var/opt/OV/conf/OpC/lp_mon.cfg B&DO IES Event Management

  28. HP-UX Monitoring Standard • lp_mon: configuration #################################################################################### # File: lp_mon.cfg # Description: The Lp Monitor Configuration File # Package: Concorde - UXSM # Version: A.01.00 # Syntax: # lpsched_check=YES|NO|AUTO [lpsched_options=<lpsched_options>][;<time_schedule>] # exclude <printer>[,<printer>...] # <printer> [queue_length=<#requests>] [request_age=<#days>]\ # [active_time=<#min>] [disable_check=YES|NO|AUTO]\ # [reject_check=YES|NO|AUTO] [phantom_check=YES|NO|AUTO]\ # [;<time_schedule>] # # - queue_length - max number of pending requests # - request_age - max age of print request # - active_time - max active time of print request # - disable_check - check if printer is disabled # - reject_check - check if printer is rejecting print request # - phantom_check - look for phantom print request #################################################################################### lpsched_check=YES #* queue_length=50 request_age=7 active_time=10\ #disable_check=YES reject_check=YES phantom_check=AUTO #################################################################################### # end of lp_mon.cfg #################################################################################### B&DO IES Event Management

  29. HP-UX Monitoring Standard • lp_mon: configuration testing lp_mon.sh [-f <configfile>] [-l <logfile>] -f <configfile> use <configfile> instead of default (/var/opt/OV/bin/OpC/cmds/lp_mon.cfg) -l <logfile> write event messages to <logfile> instead to a stdout. IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management

  30. HP-UX Monitoring Standard monitoring of kernelmessages with dmsg_mon B&DO IES Event Management

  31. HP-UX Monitoring Standard • HP-UX kernel messages monitor: dmsg_mon.sh • Triggered every 5 minutes by OVO • Checks • output of “dmesg –” • local configuration for unwanted messages • requires ongoing template review process with each OS-patch applied • local configuration possible • global configuration via template B&DO IES Event Management

  32. HP-UX Monitoring Standard dmsg_mon.sh: configuration ############################################################################### # # File: dmesg_mon.cfg # Description: strings listed here don't generate an ITO message for dmesg # Syntax: just list the strings, one line for each # !!! all dmesg lines matching one of the listed strings # are taken out of monitoring !!! # # Example: # # hardware path # # If the string "hardware path" is listed, all dmesg lines matching (containing) # the string "hardware path" are ignored for monitoring purposes. # Still, the dmesg history contains these lines, but no message is generated. # ############################################################################### ############################################################################### # End of dmesg_mon.cfg ############################################################################### B&DO IES Event Management

  33. HP-UX Monitoring Standard dmsg_mon.sh: configuration testing Basically no “real” testing possible; every run of “dmesg –” will set a new pointer to the dmesg-output check the files: /var/opt/OV/log/OpC/dmsg_mon.hist history of ALL output /var/opt/OV/log/OpC/dmsg_mon.log logfile read by ITO /var/opt/OV/log/OpC/dmsg_mon.tmp normally empty B&DO IES Event Management

  34. HP-UX Monitoring Standard dmsg_mon.sh: ongoing maintenance • every new OS-release, HW-patch etc. causes new output • new messages come with the prefix “DMESG-UNCLASSIFIED:” • Regular reports are crosschecked with UX-PE to classify these messages (match or suppress) B&DO IES Event Management

  35. HP-UX Monitoring Standard monitoring of kerneltables with kts_mon B&DO IES Event Management

  36. HP-UX Monitoring Standard • HP-UX kernel tables monitor: kts_mon.sh • Triggered every 15 minutes by OVO • Checks • nproc over threshold • ninode over threshold • nfile over threshold • local configuration possible (thresholds in percent) B&DO IES Event Management

  37. HP-UX Monitoring Standard kts_mon: configuration ############################################################################# # File: kts_mon.cfg # Description: nfile, nproc, ninode Monitor Configuration file # Package : Concorde - UXSM # Version: A.01.00 # # <PARAMETER> <space> # # <PARAMETER> THRESH_NP nproc # THRESH_NI ninode # THRESH_NF nfile # <space> space utilization threshold in percents (decimal value) # ############################################################################# THRESH_NP=70 THRESH_NI=101 THRESH_NF=70 ############################################################################# # end of kts_mon.cfg ############################################################################# B&DO IES Event Management

  38. HP-UX Monitoring Standard kts_mon: configuration testing kts_mon.sh [ -f <config-file> ] configuration file must be executable for root (at least) execution of kts_mon.sh will write ALWAYS in default logfile! B&DO IES Event Management

  39. HP-UX Monitoring Standard monitoring of syslog B&DO IES Event Management

  40. HP-UX Monitoring Standard • HP-UX syslog monitoring • encapsulates /var/adm/syslog/syslog.log • polling interval 30s • no local configuration • enhancement requests ( unwanted messages) to TEG monitoring B&DO IES Event Management

  41. HP-UX Monitoring Standard optional monitors B&DO IES Event Management

  42. HP-UX Monitoring Standard This chapter includes all standard monitoring solutions that are not applicable to all systems and therefore not part of the default monitoring • serviceguard • swapspace • security • cron • mail • system startup • NFS • Disk array • Housekeeping B&DO IES Event Management

  43. HP-UX Monitoring Standard monitoring of service guard B&DO IES Event Management

  44. HP-UX Monitoring Standard • HP-UX service guard monitor: sg_mon.ksh • Triggered every 15 minutes by OVO • uses output of cmviewcl • checks: • package switching enabled ? • package running (where / at all) ? • nodes active ? • Network available ? • local configuration possible and mandatory (default configuration file empty) • cc_mon may use same logfile if configured B&DO IES Event Management

  45. HP-UX Monitoring Standard • sg_mon : configuration ############################################################################# # File: sg_mon.cfg # Description: Check service guard Package monitoring script # Package : Concorde - UXSM # Version: A.01.00 # # Description of parameters # ------------------------------------------------------------------- # PKG[0]=xxx Package name 1 # PKG_NODE[0]=yyy Primary node on which the pkg must run # PKG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED # Set to 0 if Package_switching must not be ENABLED # ############################################################################# #PKG[0]=xxx; PKG_NODE[0]=yyy; PKG_SWTCH[0]=0 #PKG[1]=zzz; PKG_NODE[1]=xyz; PKG_SWTCH[1]=1 ############################################################################# # end of sg_mon.cfg ############################################################################# B&DO IES Event Management

  46. HP-UX Monitoring Standard monitoring of swap space B&DO IES Event Management

  47. HP-UX Monitoring Standard • HP-UX swapspace monitor: swap_mon.sh • Triggered every 15 minutes by OVO • checks for total swapspace used • different severities for different usage levels possible • local configuration possible and mandatory (default configuration 90%) • configuration of message-processing possible • /var/opt/OV/bin/OpC/cmds/swap_mon.sh B&DO IES Event Management

  48. HP-UX Monitoring Standard • swap_mon: configuration ######################################################################## # Config file for swap_mon.sh ######################################################################## # All lines which start with a hash-sign (#) will be ignored # the whole config file is case-insensitive # # total <percent_used> <severity> <alert> [<from-to> [<days>]] # # Every config line must start with "total" because every check is # performed on the totally free space # The percent used must be between 0 and 100. # Possible severities are: warning, major, critical # possible alert types (processing of the message): # B -> Browser # N -> Browser+Notification # T -> Browser+Trouble Ticket # NT -> Browser+Notification+Trouble Ticket # <from-to>: 24h-format, 0000-2400 as default (if nothing else configured) # <days>: 0=Sunday, 6=Saturday. values to be separated by “,” or “-” # (e.g. "1,3,4" -> Monday, Wednesday, Thursday) or ("2-4" -> from Tuesday until Thursday). ################################################################################ # Example configuration ############################################################################### #total percent_used severity Alert FROM-TO Days total 90 major T 0000-2400 * B&DO IES Event Management

  49. HP-UX Monitoring Standard • swap_mon: configuration (cont’d) special feature: processing of messages can be configured Alert type • B message appears only in browser • N notification using JET • T troubleticket in EWM will be created • TN JET & EWM appropriate mapping in EWM or/and JET required Parameters: • Application HPUX_swap • Msg-Group OS • Object swap B&DO IES Event Management

  50. HP-UX Monitoring Standard security (bad login attempts, sulog) B&DO IES Event Management

More Related