Unicenter Automatic Duplicate SuppressionEvtADS EvtAds r11 Field Developed Utility January 17th 2006
Disclaimer This is a field developed utility targeted to be included in the future release for NSM. It is not supported by the CA Support organisation.
Acknowledgments • Kevin Higgins - Cap Gemini UK plc
What is EvtAds? • A simple but function rich utility to intelligently manage duplicate events to aid in auto-ticketing and eliminate opening multiple problem tickets for the same problem. • Automatic Duplicate Suppression • Tool for Windows Event Management
Problem Summary • Unicenter message processing repeats messages on the operator console unless explicitly discarded. • Such duplication may prevent use of auto-ticketing or occasionally cause acute message storms, (where too many messages per second are generated). • Message storm can result into denial of service if extensive instrumentation is carried out to address duplicates.
Automatic Duplication Suppression • EvtAds is Unicenter Event Management Anti-Duplication Suppression driven by Event Message Actions. • It is specifically designed for scalability to handle large message storms without any performance degradation. • In a lab environment, it has successfully processed message storm of over a million events
Automatic Duplication Suppression • ADS is called using EXTERNAL Message action and determines if the eventhas been previously seen within the reset interval. It returns with an &rcindicating the number of times an event/ alert has been previously seen. • Analysing &RC value assists in auto-ticketing and prevents multiple tickets been raised for the same problem. • Provides means of escalation based on the number of alerts received for the same problem. For example, if 15 failure events are received within reset interval period for the same problem, it would make sense to escalate the problem to emphasis the impact of the problem. Without ADS, this may result in 15 problem tickets being raised
Why EvtAds? • Unicenter Event Management provides frequency based message rules. This is based on Counter and Interval. • Frequency based event rules are driven by message records and may require complex rules to integrate with Help desk. • Interval computation for Frequency event rule is based on time the first event occurred. EvtAds works on interval based on last update time – this is key for effective duplicate suppression.
Help Desk • To provide integrity, it is important that duplicate problem tickets are not raised. Multiple problem tickets for the same problem can easily distract the main focus of resolving the problem. • For this reason, duplicate Alerts should be identified and discarded (or in some cases escalated). • If duplicated alerts exceed an acceptable limit, the problem should be escalated to minimise the impact. It is important to identify the number of times an alert has occurred within a clear down interval.
EvtAds Functions • CAds • Checks for duplicate Events • CVersion • Displays Version details • CStats • Displays how many events are cached • CList • Generate list of cached entries for review
ClearDownInterval • This is used for computing if the cached event entry is current and not expired. It is based on Last Update time of the event. • If (Last Update Time + ClearDownInterval) < Current Time, the cached event entry is expired and the duplication count of the cached event is reset to zero. • This enables identification of new occurrences of problems outside the ClearDownInterval
EvtAds Message ActionCAds Function Action define msgact name=(*,10) action="EXTERNAL" attrib="DEFAULT" color="DEFAULT" condop=" " evaluate='Y' quiet='N' status="ACTIVE" sim='N' text="EvtAds.DLL CAds /T:120 /M:""&text""" Function Name ClearDownInterval Message Text
EvtAds Message ActionCAds Function Query Option Action define msgact name=(*,10) action="EXTERNAL" attrib="DEFAULT" color="DEFAULT" condop=" " evaluate='Y' quiet='N' status="ACTIVE" sim='N' text="EvtAds.DLL CAds /Q /T:120 /M:""&text""" Function Name ClearDownInterval Message Text Query Option
Query Option • Query option function works in the same as CAds except the cache count is not updated. The clear down rules still applies. • This function is extremely useful when conditional logic is required such as “Stop forward action if more than 10 forwards have failed due to destination down”.
Sample Application of Query Option There is another MRA for %CAOP_E_504 with EXTERNAL action. This issues CAds Function call with /M”&1,&6”. This updates the cache counter to identify the number of times, the forward has failed within cleardown interval. Prior to forward action, Query option is requested to determine the number of times %CAOP_E_504 message has been generated. The query option will not update the counter. In this example if more than 5 FORWARD to node BREER01PACO have failed, forward action is suspended with appropriate messages. This would be good candidate for EVALUATE function
EvtAds Message ActionSearch & Replace Override define msgact name=(*,10) action="EXTERNAL" attrib="DEFAULT" color="DEFAULT" condop=" " evaluate='Y' quiet='N' status="ACTIVE" sim='N' text="EvtAds.DLL CAds /T:600 /F:2 /M:""&(2:9)""" Function Name ClearDownInterval Search & Replace Item Message Text
CVersion • Display EvtAds Version details define msgrec msgid="EVTADS CVersion" type="MSG" msgnode="*" desc="What version of EvtAds are you running" cont='N' msgact='Y' wcsingle='?' wcmany='*' case="y" regexp="n" define msgact name=(*,10) action="EXTERNAL" attrib="DEFAULT" color="DEFAULT" condop=" " evaluate='Y' quiet='N' status="ACTIVE" sim='N' text="EvtAds.DLL CVersion"
CStats • Display number of events cached define msgrec msgid="EVTADS CStats" type="MSG" msgnode="*" desc="Tell me how many entries have been cached so far" cont='N' msgact='Y' wcsingle='?' wcmany='*' case="y" regexp="n" define msgact name=(*,10) action="EXTERNAL" attrib="DEFAULT" color="DEFAULT" condop=" " evaluate='Y' quiet='N' status="ACTIVE" sim='N' text="EvtAds.DLL CStats"
CStats CStats - 2000 entries cached
CList • Dumps cached entries for review • List file placed in EvtAds directory. • File name • yyyymmdd.ads define msgrec msgid="EVTADS CList" type="MSG" msgnode="*" desc="Dump Current Cached entries to a ascii file" cont='N' msgact='Y' wcsingle='?' wcmany='*' case="y" regexp="n" define msgact name=(*,10) action="EXTERNAL" attrib="DEFAULT" color="DEFAULT" condop=" " evaluate='Y' quiet='N' status="ACTIVE" sim='N' text="EvtAds.DLL CList"
Clear Down Interval • Clear Down Interval is Reset Interval. Can be different for different message rules. • Cached entry is expired if the last update of the event is older than clear down interval. • If no Clear Down Interval specified on the message rule, it defaults to Initialisation Option default
Hierarchy • /T: override on message action • Initialisation Default from Ini File
What is Purge? • Remove matured cached event entries from the queue • Cache entries eligible for Purge if Current Time > (LastUpdateTime + (ClearDownInterval * PurgeLag) • PurgeLag is customisable option via configuration GUI.
Why Purge? • This is a simple housekeeping function to keep the Queue to a manageable size. • In the lab environment, Queue Size of 300,000+ has been tested successfully. • This allows removal of single event entries with no probability of reoccurrence
How is Purge Initiated • Purge can be initiated on demand or automatically performed at the specified Check Interval • CheckInterval = nn • where nn = interval value in minutes. • INI File option - configured via Configuration GUI • Check Interval value should be based on the Event Activity. In most cases Check Interval of 2hours should suffice. However, in lab environment, this has worked fine with value of 1 minute.
Purge On Demand • To run purge on demand, simply call COptions –Purge function. • Utility is built with threshold monitoring of Queue Size. If it exceeds Warning or Critical thresholds, it generates Warning or Critical Messages. • This messages should be reviewed with the view to execute Purge On Demand request on receipt threshold violation messages
Automatic Purge This shows Purge is executed every 15 minutes. CheckInterval value set to 15
How to disable automatic Purge • If you wish to stop the automatic purge process and certainly not recommended, then this can be accomplished by manually updating EvtAds.dat file. • Set CheckInterval=0 • This will disable to automatic purge process • The Configuration Wizard, will not permit this value to be set to 0 as it is not recommended
Purge 12,000+ Entries purged in no time
Why? • Some events may include fields such as Time which would make the event unique and thus potentially not suitable for ADS. • If fields such as Time are not relevant to determine the duplication count, then search and replace function should be reviewed.
How? • Using Configuration GUI, specify POSIX regular expression to search the message text passed to EvtAds and replace with the supplied text. • GUI provides a TEST function to verify your search and replace function. • Each Search & Replace item is allocated a sequential number. This number should be specified on the CAds function as /F:n option, where n is the item number
Replace Time Field with hh.mm.ss • Define Search and Replace Item. • Item number for this field is 2 • ReformatEventSearch<ItemNum> Search RegExp Replace Item
Search & ReplaceOverride Option Item Number
Time Field Replaced with hh:mm:ss Generate Time Based Events cawto EVTADS Jobxxxx failed at 10:11:38 cawto EVTADS Jobxxxx failed at 10:11:39 cawto EVTADS Jobxxxx failed at 10:11:40 cawto EVTADS Jobxxxx failed at 10:11:41 cawto EVTADS Jobxxxx failed at 10:11:42 Cached Entry