240 likes | 560 Vues
Tivoli NetView/TEC Event Flow Customization Milwaukee TUC 2006 Rob Clark clarkrob@us.ibm.com. Agenda. Event flow from NetView to TEC Map SNMP to TEC EIF events Filter traps to TEC ZCE enriches events TEC correlation rules ITM monitored components.
E N D
Tivoli NetView/TEC Event Flow CustomizationMilwaukee TUC 2006Rob Clarkclarkrob@us.ibm.com
Agenda • Event flow from NetView to TEC • Map SNMP to TEC EIF events • Filter traps to TEC • ZCE enriches events • TEC correlation rules • ITM monitored components
Points of integration for NetView/TEC event flow • Map SNMP/TEC events • Filter traps to TEC • ZCE enriches events • TEC correlation rules • ITM monitored components launch Tivoli NetView server 1 2 TEC trapd nvcorrd 3 nvserverd 4 ZCE ITM 5.1 netmon servmon 1 5 2 3 4 5
1. Map SNMP trap to TEC EIF event using xnmtrap Lists contents of SNMP varbinds Define slot mappings Windows: Edit file \usr\ov\conf\nv6k_tecad.cds Define TEC event class
2. Filter traps to TEC Adjust the list of traps to forward to TEC Windows Edit the file \usr\ov\conf\nv6k_tecad.conf or use the configuration GUI \usr\ov\bin\tecconfig.bat UNIX Use the NetView Ruleset GUI to edit the TEC_ITS.rs ruleset
3. State Correlation Engine (also known as ZCE) Used in NetView to enrich or generate new events to TEC Actions are written in Java, can query remote data sources and generate or augment TEC events See IBM Tivoli Enterprise Console Rule Developers Guide - Writing Rules for the State Correlation Engine for details
3. State Correlation Engine, continued • Write the ZCE rules in the file /usr/OV/conf/nvsbcrule.xml • It also requires that /usr/OV/conf/rule.dtd be present • The custom NetView actions are shipped in the /usr/OV/jars/nv_zce.jar • The /usr/OV/jars/zce.jar is the correlation engine launched by the EIF library. It depends on evd.jar, xerces-3.2.1.jar and log.jar(jlog) • The actions are always running when TEC integration is running and the property UseStateCorrelation=YES in /usr/OV/conf/tecint.conf(\usr\ov\conf\nv6k_tecad.conf)
State Correlation Engine actions There are two NetView out-of-the-box ZCE actions: • fqhostname: Tests the hostname slot to see if it is a fully qualified hostname. If it is, the value is copied to a new slot called fqhostname. Otherwise, nothing is done. • serviceImpact: for node/router down, node/router marginal, and subnet unreachable events query the object database for service impact • For node/router down/marginal. What are the services (applications) running on that node? • Generate a TEC_ITS_NODE_SERVICE_IMPACT event per service. • For subnet unreachable. What nodes in the affected subnet are running applications? • Generate a TEC_ITS_SUBNET_SERVICE_IMPACT event per application type listing the affected nodes in the event.
State Correlation Engine debugging • To see debugging statements from the correlation engine: In /usr/OV/conf/tecint.conf (\usr\ov\conf\tecad_nv6k.conf), uncomment these four lines: # LogLevel=ALL # TraceLevel=ALL # LogFileName=/usr/OV/log/adptlog.out # TraceFileName=/usr/OV/log/adpttrc.out Output will then be sent to adptlog.out and adpttrc.out • To see debugging statements from NetView Actions: In /usr/OV/conf/sbc-log4j.properties, change INFO to DEBUG on this line: log4j.category.com.tivoli.zce.actions=INFO Output will then be sent to /usr/OV/log/nvsbc.log
State Correlation Engine debugging cont’d • To force NetView rules to always fire: in /usr/OV/conf/nvsbcrule.xml, replace the <predicate> value with <predicate>true</predicate> . e.g.: <rule id="netview.fqHostname"> <match> <predicate>true</predicate> </match> <action function="fqAction"/> </rule> • Always restart nvserverd (UNIX) or tecad_nv6k (Windows) after making these changes
4. TEC correlation rules for network management TEC includes the netview.rls set of rules that correlate the network availability events 1. Housekeeping. Events that update status will clear the previous status events from same entity Adjustment of severity - ensure the default WARNING is changed accordingly Synchronizing ACK/UNACK from TEC back to NetView 2. TEC Heartbeat events - determine effect and cause events Missed Heartbeat events (effect) originated from ITM are correlated against node down and subnet unreachable events (cause) 3. Service impact events - determine effect and cause events Service impact events (effect) from NetView are correlated against node down and subnet unreachable events (cause) 4. Switch Analyzer layer 2 status events - determine effect and cause events Switch Analyzer events (effect) are correlated against node down events (cause) from same host. 5. Interface status events - determine effect and cause events Interface status events are correlated against node or router status events: Interface status Down/AdminDown/Unreachable (EFFECT) -- Node/Router Down /AdminDown/Unreachable (CAUSE) ** Interface status Down/AdminDown/Unreachable (CAUSE) -- Node/Router Marginal (EFFECT) Interface status Up (EFFECT) -- Node/Router Up (CAUSE) (**see slide notes for correction to rule)
3. Use State Correlation Engine to enrich events This out-of-the-box example shows how NetView generates service impact events via ZCE 1 5 3 2 4 3 Network Unreachable event received What services are affected in that subnet? TEC event contains list of impacted nodes with services 1 Queries ITM for WAS,DB2,MQ services Adds field to node, eg, isITM_IBM_DB2 4 2 5
ITM Background • The IBM Tivoli Monitoring product monitors products running on endpoint nodes in the network. • In a customer environment: • Each ITM Server monitors a configured set of endpoint nodes. • Each endpoint runs with a configured set of ITM Resource Models for the different products being monitored. • To gather product metrics, the ITM Server queries these endpoints to have them report back the ITM Resource Model information.
ITM 5.1 Integration Found in NetView 7.1.4 • A command line utility, itmquery, is available for performing ITM queries. • This utility enables: • Registering the ITM Servers to be monitored. • Displaying the IP addresses of the endpoints being monitored by each configured ITM Server. • Displaying the recognized products on each endpoint. • Using the itmquery utility you can do such things as: • Generate a netmon seed file containing all ITM-monitored endpoint nodes. • Generate a netmon seed file containing just those endpoints that are running with the ITM WebSphere Resource Model.
ITM Integration Found in NetView 7.1.4 (cont.) • New ITM monitoring feature embedded in the servmon daemon: • NetView automatically adds the following OVwDB fields to node objects: • isITMEndpoint • isITM_IBM_WebSphere_Application_Server • isITM_IBM_WebSphere_MQ • isITM_IBM_DB2 • These fields are added to node objects via servmon’s “service attribute” capability. In addition, some or all of these fields are automatically removed by servmon when it’s noticed that a particular node hasn’t reported a particular capability for the configured down time interval (the default is 7 days).
ITM Configuration Files • Register ITM Servers • /usr/OV/conf/itm_servers.conf • This file contains account information for the ITM Servers to access when either performing “dump” type operations in itmquery or node service attribute tests in servmon. • This file is not meant to be hand-edited. Instead use one of the following: • To edit via the command line: • itmquery –add-server <server_name> • itmquery –remove-server <server_name> • To edit via a GUI: • Windows: • \usr\ov\bin\nvconfigureitm.bat • UNIX: • Server Setup (/usr/OV/bin/serversetup) • To verify the entries in this file use the following: • itmquery –verify-server-info
ITM Configuration Files (cont.) • /usr/OV/conf/itm_attributes.conf • This file contains ITM product matching information used to recognize which products are running on the ITM endpoints based on the name found in the ITM Resource Models. • The supported default set of recognized products include: • WebSphere • DB2 • MQ • In addition, there are commented out entries for products that are not officially supported (but these entries would most likely work without problems): • Oracle • Informix • MQSI • mySAP • Siebel • Domino • Apache • IIS • iPlanet • WebLogic
ITM Configuration Files (cont.) Here are the default supported itm_attributes.conf entries. Here is an example of the XML document that the itm query code gets back from a query to an ITM Server. Please note that in this case, the Resource Model name matches the regular expression used to recognize DB2, which means that, for example, the “itmquery –dump-endpoints” command will list this endpoint as an endpoint that is running DB2.
Itmquery Example • In this example, the itm_servers.conf file contained a single ITM Server entry for server nvvm01. • Behind the scenes this itmquery command asked this ITM Server for: • 1) The endpoints being monitored • 2) The ITM Resource Models running on these endpoints • Each of these queries returns XML documents that are parsed and compared against the itm_attributes.conf file to inform you that (as can be seen in the screen capture): • 9.42.11.111 … is an ITM monitored endpoint running with WebSphere and DB2 • 9.42.11.155 … is an ITM monitored endpoint running with MQ
Itmquery Example (cont.) • In this example [from previous slide], if these nodes were under active NetView management, servmon’s discovery pass would automatically add the following capabilities to the node represented by 9.42.11.111: These 3 capabilities were automatically added by servmon. Similarly, if we were to look at the node object represented by IP address 9.42.11.155 we would see that it contained the isITMEndpoint and isITM_IBM_WebSphere_MQ capabilities.
Servmon ITM Integration • By default servmon automatically adds/removes the “isITM*” capabilities from nodes in ovwdb. • These capabilities are added during the discovery pass that occurs, by default, every 12 hours. • A particular "isITM*" capability field will be automatically removed when servmon notices that the discovery pass for that service has failed over the course of a configurable service-down-deletion-interval (the default is 7 days). • For example, the isITM_IBM_DB2 capability will be removed from the node if 7 days have elapsed since servmon’s ITM query code last reported that this endpoint contained the DB2 ITM Resource Model.
ITM Integration Wrap Up • The NetView advantages to this ITM integration include: • Users can use itmquery to define very useful netmon seed files. • The servmon ITM integration enables users to define their own useful SmartSets (such as “isITM_IBM_DB2=TRUE” or “isITMEndpoint=TRUE”) to keep track of nodes that are known to be running with certain products or just managed by ITM.
ITM Integration Wrap Up (cont.) • The TEC advantages to this new ITM integration include: • The NetView TEC Adapter code now forwards two new events to TEC to enable correlation and promotion of the severities involved in node down and subnet unreachable events. NetView uses the EIF's Zurich Correlation Engine to do the following: • Upon receipt of a TEC_ITS_NODE_STATUS or TEC_ITS_ROUTER_STATUS event where the status is DOWN or MARGINAL: • If the event pertains to a node which contains any "isITM_*" attributes, a TEC_ITS_NODE_SERVICE_IMPACT event will be generated for each ITM service attribute found on the node. Each of these generated events will mention the node's hostname as well as the affected service name (for example IBM_WebSphere_MQ, IBM_WebSphere_Application_Server, or IBM_DB2). • Upon receipt of a TEC_ITS_SUBNET_CONNECTIVITY where the status is UNREACHABLE: • If any of the nodes found within this subnet have "isITM_*" attributes, a TEC_ITS_SUBNET_SERVICE_IMPACT event will be generated for each affected ITM related service. Each of these generated events will mention the affected service name (for example IBM_WebSphere_MQ, IBM_WebSphere_Application_Server, or IBM_DB2) as well as the affected nodes found beneath this subnet which contain the affected service.