1 / 82

Advanced Server Monitoring and Alert Notifications

Advanced Server Monitoring and Alert Notifications. Andy Pedisich Technotics. Your Presenter. One half of a pair of two hard-working IBM ® Notes ® Administrators/Developers who have worked with IBM ® Notes ® and IBM Domino ® since version 2.1

rhoda
Télécharger la présentation

Advanced Server Monitoring and Alert Notifications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Server Monitoring and Alert Notifications Andy Pedisich Technotics

  2. Your Presenter • One half of a pair of two hard-working IBM® Notes® Administrators/Developers who have worked with IBM® Notes® and IBM Domino® since version 2.1 • From Technotics, Inc. in Philadelphia, Pennsylvania – USA • Andy Pedisich • 28 years in IT • 19 years with Lotus Notes • Rob Axelrod • 23 years in IT • 19 years with Lotus Notes 1

  3. What We’ll Cover … Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up 2

  4. What We’ll Cover … Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up

  5. Requirements for Efficient and Accurate Statistics Collection • Two things are required for statistics collection: • The Collect task must be running on any server that is designated to collect the statistics • And Not all servers should run the Collect task • Only servers designated as collecting servers • The EVENTS4 Monitoring Configuration database must have at least one Statistics Collection document • Minimum collection time should be an hour

  6. There Is a Special Replica ID for Your EVENTS4.NSF • The replica ID of system databases, such as EVENTS4, is derived from the replica ID of the Domino directory Database Replica ID NAMES.NSF 852564AC:004EBCCF CATALOG.NSF 852564AC:014EBCCF EVENTS4.NSF 852564AC:024EBCCF ADMIN4.NSF 852564AC:034EBCCF • Notice that the first two numbers after the colon for the EVENTS4.NSF replica are 02 • Make sure EVENTS4.NSF is the same replica ID • Opening a copy from every server and putting it on your desktop • There’s some code on the next slide to help you do that

  7. Add a Button to Your Toolbar _names := @Subset(@MailDbName; 1) : "names.nsf"; _servers := @PickList([Custom]; _names; "Servers"; "Select servers"; "Select servers to add database from"; 3); _db := @Prompt([OkCancelEdit]; "Enter database"; "Enter the file name and path of the database to add."; "log.nsf"); @For(   n := 1;   n <= @Elements(_servers);   n := n + 1;   @Command([AddDatabase]; _servers[n] : _db) ) • Add this code to a button on your toolbar • This is courtesy of Thomas Bahn • He’s a smart guy, nice guy, and sometimes brings chocolates to his friends from Europe • www.assono.de/blog 6

  8. Add a Database Icon from All Servers to the Desktop • This code will prompt you to pick the servers that have the database you want on your desktop • Then it will prompt for the name of the database • And open it on all the servers you’ve selected • Use it to make sure all the EVENTS4.NSF are the same replica in your domain

  9. What We’ll Cover … Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up

  10. Event Monitoring Details • Enough setting up already! • Event monitors of all types are set in the EVENTS4 database • Two broad categories of events: • Event handlers • Specify the action that Domino takes when a specific event occurs • Event generators • Each type of event generator has a view that provides a list of all event generators, plus additional configuration information

  11. Event Generators • Event generators deal with specific Notes/Domino issues • There are six types of event generators: • Database Event Generator • Domino Server Response Event Generator • Mail-Routing Event Generator • Statistic Event Generator • Task Status Event Generator • TCP Server Event Generator • Some are used more than others • We’ll stick to the more popular ones that every administrator should use, for starters

  12. Here’s One That Everyone Should Use • The ACL of Names.nsf should rarely change, so monitor it! • Alarms should go off if it changes • Select Names.nsf • Choose either a single server or all servers in the domain • I like to pick all servers in the domain • Admins won’t get away with anything! • But I do get a storm of messages when an ACL change occurs • Every server tells me aboutthe change

  13. Unused Space Event Generator • This is an example of the Events system actually doing something automatically when a certain condition exists • It’s questionable – it is going to execute the Compact task immediately upon detection of free space threshold being exceeded • I could see this event being used on archive servers • And I wish there was a way to run it during specific hours

  14. Domino Server Response Generator • One server checks others by sending a probe • It’s a good idea to try opening Names.nsf • If you can’t open Names.nsf, then something is wrong! • Default is every three minutes • Default response time tolerance is 1,000 Msecs (one second) • Your settings will depend on your own environment

  15. More About Probes • The response time is a bit on the harsh side • If you leave it at 1,000 Msecs (one second), you will receive a lot of notifications • You should make it ten seconds, or whatever the metrics in your Service Level Agreement (SLA) require • Also, be careful what servers you choose to probe other servers • Try to pick probing servers that are in the same LAN as the probed servers • Otherwise, your probing will actually be testing network latency, rather than the servers themselves • I have used these probes as a method of testing exactly that • Network latency

  16. Statistic Event Generators • Statistic Event Generators monitor a specific Domino or platform statistic • They can let you know when a stat goes over a particular threshold • These stat event generators are extremely valuable • Smart administrators use them every day!

  17. Complete Listing of All Statistics Is in EVENTS4.NSF • The Monitoring Configuration (EVENTS4.NSF) supplies document detailing thresholds for each statistic • 1,193 statistic documents available • The complete listing is in the view Statistics by Name • But only 166 of them are considered useful for setting thresholds and are found in the default statistics view • The default statistics thresholds view only shows documents where the field “useful” is equal to the word “Yes”

  18. Finding the “Not Useful” Stats • You might find that a statistic you need has been marked as not useful • To see which are marked as not useful, full text index the EVENTS4.nsf • Create an advanced query checking the field useful = “No” • You might discover a statistic who’s threshold would be just right for using

  19. Why Are Most Stats Considered “Not Useful” for Thresholds? • One setting on the advanced query that controls whether it will appear in the drop-down list when you’re setting an event generator • Note that there are no Agent statistics in this list

  20. Why No Agent Stats • It’s not that the Agent stats aren’t useful • They might not be valuable for threshold tracking • In some releases, Agent.Hourly.UsedRunTime has a data type of text • We can’t set a threshold with text values

  21. We Do Have a Nice Way of Seeing That Stat, Though • Technotics has created a super-customized version of the Monitoring Results database, STATREP.NSF • Technotics R8.5.3 statrep • It’s the stock statrep with added views • One of these valuableviews is Agent Stats view • You can download this from: • www.andypedisich.com • Look for the Admin2013 link

  22. Show Me the Stats • When you issue a SHOW STAT command at the console, Domino dumps every statistic it is tracking • Every one of these statistics is in every single one of the documents in the STATREP.NSF database • All you need is a view to see them

  23. Static Statistics Are Not Useful for Thresholds • Statistics that don’t change usually represent the operating environment of the server • Server.Version.Notes = Release 8.5.3 • Server.Version.OS = Windows NT 5.0 • Server.CPU.Type = Intel Pentium • Disk.D.Size = 71,847,784,448 • Mem.PhysicalRAM = 527,433,728 • Platform.Network.1.AdapterName = Intel[R] PRO_1000 MT Server Adapter • Think these stats aren’t helpful? They are! • You can take a pretty detailed worldwide server inventory • Just by looking at the fields in STATREP.NSF

  24. Wizard Lets You Choose the Method of Handling the Event • There are lots of methods of event handing • Which one you choose depends a lot on your infrastructure • We’re going to talk more about the notification methods in the next section of the presentation • For now, just remember that an event generator is fairly worthless by itself • Unless you have an effective event handler that tells you, in its own way, what is going on with your servers

  25. Event Handlers Are an Exquisite Gift • They can give you a heads-up about issues provided by event generators • They also give you a free-form way of being alerted of anything that happens in the Domino server log and most of what happens on the Domino server console • You can use event handlers to respond to generators and certain add-in tasks • They are most valuable for picking out text on the console that will mean trouble if ignored • We’re going to focus on this type of event handling, since it is less intuitive than responding to generators or add-ins

  26. Basics of the Event Handler Configuration • 3 screens to deal with • Decide whether you want to track an event on just a few servers or all servers • You might want to track a particular event on mail servers only • Decide what triggers a notification • We’re going for free-form, so we will select “any event that matches a criteria”

  27. Second Set of Choice for Event Handling • When working with console events, select: • “Events can be of any type” • “Events can be of any severity” • Then look for a particular string of text in the event message • This can be absolutely any text that appears on the console • We will explain why we are picking the text “full administrator access” in a moment

  28. Final Set-Up Tab for Event Handling • Define action to occur when the text appears • We’ve selected email notification • But there are over a dozen others that we will discuss in a few moments • Note: You can control the time of day the event handler is on the job • I wish they did that for event generators

  29. Why Did We Monitor the Text Full Access Administrator? • It is the highest level of administrative access to the server • Manager access with all access privileges enabled to all databases on the server, regardless of the ACL settings or readername settings • Access to any unencrypted data on the server • Your security model should make FAA almost unnecessary • When full FAA is turned on, you want to know about it to prevent some hooligan from doing shenanigans

  30. Other Words You Should Track with Event Handlers 01/05/2013 04:02:17 PM Opened live remote console session for Andrew M Pedisich/DomLab 01/05/2013 04:04:50 PM Database ArchiveOfIncriminatingPhotos.nsf deleted by Andrew M Pedisich/DomLab • “Deleted by” • This generally means someone has deleted a database • Usually their mail file if they have manager access • You’ll be getting out the back-up tapes in a minute

  31. Other Bad Words to Watch for Extremely Inefficient Here are some other words and expressions to watch for:

  32. What We’ll Cover … Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up

  33. We’re Circling Back to Notification Methods • Here is the panoply of notification methods • The most widely-used notification method is to send an email to an admin group when a problem occurs • And yet, that is also very risky, since the email system itself might be the problem 32

  34. Paging Dr. Howard, Dr. Fine, Dr. Howard … • 14 ways to be notified – these 2 are the most widely used • But not necessarily the best to use • Paging notification is a good choice, but not if you are paging through a third-party phone system, like Verizon or AT&T • They generally require an email to be sent • They have no Service Level Agreement – NONE! • Sadly, due to budget and resource constraints, we generally see these two mail or paging methods used the most in production environments

  35. The Most Important Notification Options These two are the best, and there’s one more that’s not listed

  36. Customized Tivoli Package • In one case, I developed a custom monitoring solution that fed trouble tickets into a version of the Tivoli Event Console that was not supported by the Domino Tivoli event handler system • When you have to deal with extreme monitoring capability with high reliability, you sometimes need to get in deep • This is very effective because it uses that postemsg.exe executable on the OS level to send the message to the TEC • Note that the message is carefully crafted to form a large command string which sends the ticket to Tivoli • Check with your Tivoli team to see if you can take advantage of this method

  37. Customized Tivoli Package (cont.) vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {" } vMess2 = {hostname="} + vReportServerName + {" } vMess3 = {sub_source="MESSAGINGLOTUS" Mynotify_supportfilter="1" MyNotify_severity="2" } vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" } vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" } vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING} vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6 result = Shell( vmess , 6 ) • As someone who creates a lot of Domino monitoring solutions, I often have to bend the rules and do some development (Ugh!) • Executable called postemsg.exe was placed on the c: drive of a Windows server that was the central Domino monitoring hub • This is very effective because it uses that postemsg.exe executable on the OS level to send the message to the TEC • With some knowledge of LotusScript, I crafted a system to monitor servers and send results back to the Tivoli event console

  38. What We’ll Cover … Setting up the foundation for guarding your domain Working with event generators and event handlers Selecting a notification method Customizing recommended actions in Domino Domain Monitoring Tracking problem servers Finding and tracking events that show on the console, but not in the log Using LotusScript to access server statistics Wrap-up

  39. DDM Is an Advanced Topic and Is Best Used by New Admins 01/22/2013 11:49:08 AM Warning: All Domino Domain Monitoring probes are disabled resulting in the loss of valuable diagnostic information.Please configure DDM probes in events4.nsf. Assess DDM reports in ddm.nsf. Domino Domain Monitoring (DDM) is a powerful, yet complex tool, that is often overlooked by administrators If you are using Domino 6, 7, or 8, you are already a proud owner of Domino Domain Monitoring Database, and could already be using its powerful functionality If you’re not using DDM, you see this with each server start

  40. DDM Backs Up Its Discoveries with Explanations • DDM explains the probable cause, possible solution, and sometimes corrective actions • That’s right; actions that will actually correct the problem you’re experiencing • These are stored in the EVENTS4.NSF and are configurable by you • Let’s look for the error “ATTEMPT TO ACCESS DATABASE BY”

  41. Looking in the View, “Event Messages by Text” • We can find that error message in the EVENTS4.NSF • And discover how we might change report DDM produces

  42. The Cause, Solution, and Corrective Action Are Listed • This document has all the probable cause, possible solution, and corrective action • These are supplied by Lotus and include the code in the corrective action

  43. Click the Link to the Modular Corrective Action • Clicking the link will take you to the code • This could be in formula language, LotusScript

  44. The Modular Corrective Action Is Re-Usable • At the bottom of the modular action, there is a list of other error text messages that also use this action • That same action that was written only a single time can be used as a corrective action multiple times

  45. Modular Documents – Cause, Solution, and Corrective Actions • Domino 8 comes with over 1,000 modular documents • Chances are your solutions are already there for most issues • You can use any of the same solutions provided by IBM for your custom solution • Or you can add brand new ones

  46. Modular Documents Let You Create Describe Issues • Modular documents let you add your own probable cause and possible solution text • And create corrective actions that are created withformula code and LotusScript agents

  47. You Can Add to the Solutions That Will Display with the Error Select the custom entries tab and add the description A custom solution of composing an email to the target user can be inserted

  48. Changes the DDM Report The modular document now has the “compose an email” choice

  49. It Starts the Email for You • The code plugs in the user’s name and the database that was being accessed • And it’s all done with modular documents in EVENTS4.SNF

  50. Role in DDM ACL That Will Restrict Who Can Use Actions • Many events have corrective actions associated with them • Only users with the Execute CA role in the DDM ACL are able to access the command actions and the corrective action text and links • This ensures that only qualified team members will be able to make the changes

More Related