330 likes | 436 Vues
How to Increase the ROI of Your Management System Through Simulation. Dennis Morton Practice Director, Network Operations and Infrastructure Management Greenwich Technology Partners. Session Overview. Motivation for using simulation The MIMIC SNMP Simulator Technology and Limitations
E N D
How to Increase the ROI of Your Management System Through Simulation Dennis Morton Practice Director, Network Operations and Infrastructure Management Greenwich Technology Partners
Session Overview • Motivation for using simulation • The MIMIC SNMP Simulator • Technology and Limitations • Simulation Examples • OpenView NNM • MtTrapd Probe • Visionary • OpenNMS • What’s my tool doing? • Wrap-up
The Problem… • Management Systems are complex • Many moving parts • Many vendors • Many interfaces • How can you test all the interfaces in the system? • Little toleration for “failure” • … especially for critical problems • How do you make sure you have every condition covered? • How do you test conditions like excessive CRC errors or excessive spanning tree recalculations? • There are much more complex problems to test
Solutions – “Wait and See” • Wait until something happens and then modify the system to catch it next time • Terrible approach! • The problem must happen twice to verify that the system can catch it! • … if you made the modification properly the first time! • 100% Reactive • What happened to proactive network management? • Not good to have proactive tools but a reactive engineer…
Solution – “Raw Capture” • Raw Probe capture and log file duplication • Not much better • Still must wait for a problem to occur • However, can ensure problem is solved • Can be risky for production systems • Raw capture files can/will get huge • Only works at the Omnibus interface • What about everything before that? • Still 100% Reactive!
Solutions – “Build a Lab” • Build a lab composed of real devices and a mirror of the production system • Ideal solution • But… • Difficult to use • You must configure the devices and create conditions • If not you, then someone else who you must schedule time with • May run into conflicts • Expensive! • Real boxes cost real $$ • Could share, but this rarely works in practice
Solution – “Simulate Your Network” • Use a simulation tool to create a near-perfect copy of your production network • Many advantages • Simulate your actual network topology and devices • Simple to cause faults, even complex ones • Extremely rapid edit-test cycle • Allows you to test many alternatives quickly • Rapid feedback is good • Much, much easier to test/verify complex workflow • Think Impact policy testing, Reporting, Gateways, etc… • This is how the vendors do it! • Pump them for their simulations • Some caveats • Covered in detail later…
MIMIC SNMP Simulator • Market leader in SNMP simulation • http://www.gambitcomm.com/ • Used by the majority of hardware companies for testing SNMP agents and developing/testing management software • Used by software companies like Micromuse as well • Products • Core MIMIC SNMP Simulator • MIMIC Recorder • Discovery Wizard, Simulation Wizard, Topology Wizard, etc. • Much more… • Cisco IOS Simulator • Full IOS Simulation via Telnet • Cable Modem Simulator
MIMIC SNMP Simulator – Technology • Seamlessly supported on Linux, Windows, and Solaris • 2000 agents/box for Windows, 10,000 (!)/[U|Li]nix box • Core Simulation Engine • Supports any combination of SNMP V1, V2, V2c, and V3 • Extremely fast, native code core engine • 1GHz PIII w/512MB can easily simulate 250 devices with boatloads of headroom • TCL/Tk GUI Components • MIMIC Shell for CLI access to the engine and MIMICView for GUI access to features • Orthogonal feature sets • Wizards for ease of performing complex functions • I.e. Discovery Wizard for “recording” an entire network of devices • Topology Wizard for manipulating a topology • TCL, Perl, Java, and C++ APIs • TCL supported at the engine level • IOW, you have to learn TCL
MIMIC SNMP Simulator – Cost • Priced by agent plus yearly maintenance fee • 25 Agent license is $5K + $1,250 (support) • 250 Agent license is $10K + $2500 (support) • Or, about $250/agent for a 25 agent license and $50/agent for a 250 agent license. • Food for thought… • Used 2600 routers cost $750 and up • Much, much more for anything but basic interfaces • Feature-rich switches even more • Trunking/VoIP == $$$! • How much would electricity cost for 25 boxes?
Limitations of Simulation • Most limitations stem from one fact: a single physical node is acting like many virtual nodes • Can’t just change ifOperStatus to force a link “down” • Node is still ping-able, after all • For root-cause tools, YMMV • Causing good faults takes more thought • MIMIC-specific notes • May not have enough node licenses available for your network • Some faults are just not possible • EIGRP, for example • Some require fairly complex TCL scripting • But, when have we ever shied away from that! • SNMP, IOS, and Telnet only. • No syslog simulator
MIMIC Caveats • MIMIC has some caveats as a NMS simulation tool • Recording • Only records a single IP address for an agent • Gambit has a script to add the additional IP aliases • Still works quite well with only one, though • All counters set to one of three functions • Simulating • Tricky to have many problems happen simultaneously • Mainly affects root-cause simulation • Simulations consist of thousands of small files • Very very difficult to create one by hand! • Simulation Wizard greatly simplifies this • Can become difficult to remember where you put modified files
Example Simulation • Recording of the GTP internal network • Two real devices • Can you spot them? • Mix of Cisco routers, switches, and Sun workstations • Note: • Community Strings in the simulation do not have to be the same as those on the real devices • Same goes for any aspect of the simulated device. IOW, you could test the affect of moving to SNMPv3 on your systems!
Example – OpenView NNM • NNM works quite well with MIMIC • When recording, MIMIC stores the ARP and routing tables • NNM will use these to draw a nice topology!
Example – OpenView NNM (cont.) • Most built-in NNM applications work fine • Traceroute obviously won’t, though • Any tool that uses SNMP to locate a route, however, will work
Example – Visionary • MIMIC works especially well with tools like Visionary that have no notion of topology • Just enter the IP address of the agent and start causing faults! Change local.busyPer here
Example – Visionary (cont.) See the event Immediately!
Example – MtTrapd Probe • First, generate traps • Once/periodically via GUI or arbitrarily via scripts • Specify trap type and rate • MIMIC can quite easily cause a trap storm!
Example – MtTrapd Probe (cont.) • Specify Trap Variable bindings • Fun test – try nonsensical values!
Example – MtTrapd Probe (cont.) • Choose security options (optional) • MIMIC supports the full range of traps/informs
Example – MtTrapd Probe (cont.) • Voila! Note the Counts!
Example – OpenNMS • Great example of the limitations of simulation • Discovered the same services on all simulated nodes! • Will still work, but should disable service monitoring for the MIMIC server
What’s My NMS Tool Doing? • Another great use for simulation: tracing the SNMP polling behavior of NMS tools • Lets you explore in detail precisely what your NMS tools are doing and how they react to changes • What happens when you turn off SNMPv1 and turn on v2/v3? • What happens when you radically change a node? • What MIB objects are being polled and how often? • How efficient is my polling engine? • Many, many more examples… • Tremendously useful for customization • Trace output easier to use than Sniffer output, for example • Let’s see what a trace looks like…
PDU packing in action! This single GET had 71 varbinds Very useful for testing/tracing custom rules INFO 09/08.06:49:42 - agent 16, PDU type GET, req ID 1000a3e3 1.3.6.1.4.1.9.9.43.1.1.1.0 = ccmHistoryRunningLastChanged.0 1.3.6.1.4.1.9.9.43.1.1.2.0 = ccmHistoryRunningLastSaved.0 1.3.6.1.4.1.9.9.43.1.1.3.0 = ccmHistoryStartupLastChanged.0 1.3.6.1.4.1.9.2.1.46.0 = bufferFail.0 1.3.6.1.4.1.9.2.1.47.0 = bufferNoMem.0 1.3.6.1.4.1.9.2.1.19.0 = bufferSmMiss.0 1.3.6.1.4.1.9.2.1.27.0 = bufferMdMiss.0 1.3.6.1.4.1.9.2.1.35.0 = bufferBgMiss.0 1.3.6.1.4.1.9.2.1.43.0 = bufferLgMiss.0 1.3.6.1.4.1.9.2.1.67.0 = bufferHgMiss.0 1.3.6.1.4.1.9.9.48.1.1.1.5.1 = ciscoMemoryPoolUsed.1 1.3.6.1.4.1.9.9.48.1.1.1.6.1 = ciscoMemoryPoolFree.1 1.3.6.1.4.1.9.9.48.1.1.1.7.1 = ciscoMemoryPoolLargestFree.1 1.3.6.1.4.1.9.9.48.1.1.1.5.2 = ciscoMemoryPoolUsed.2 1.3.6.1.4.1.9.9.48.1.1.1.6.2 = ciscoMemoryPoolFree.2 1.3.6.1.4.1.9.2.1.56.0 = busyPer.0 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.48 = cpmProcExtUtil1Min.1.48 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.90 = cpmProcExtUtil1Min.1.90 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.96 = cpmProcExtUtil1Min.1.96 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.97 = cpmProcExtUtil1Min.1.97 1.3.6.1.4.1.9.9.109.1.1.1.1.4.1 = cpmCPUTotal1min.1 1.3.6.1.2.1.10.7.2.1.2.2 = dot3StatsAlignmentErrors.2 … and so on! Trace Example – Visionary
Trace from OpenNMS 1.0.1 Notice: Use of Get BULK Correctly noticed that node supported SNMPv2 INFO 08/25.20:01:34 - agent 25, PDU type BULK, req ID 3191fd84 1.3.6.1.4.1.9.2.1.58 = avgBusy5. 1.3.6.1.4.1.9.2.1.8 = freeMem. 1.3.6.1.4.1.9.2.1.46 = bufferFail. 1.3.6.1.4.1.9.2.1.47 = bufferNoMem. 1.3.6.1.4.1.9.2.1.15 = bufferSmTotal. 1.3.6.1.4.1.9.2.1.16 = bufferSmFree. 1.3.6.1.4.1.9.2.1.18 = bufferSmHit. 1.3.6.1.4.1.9.2.1.19 = bufferSmMiss. 1.3.6.1.4.1.9.2.1.23 = bufferMdTotal. 1.3.6.1.4.1.9.2.1.24 = bufferMdFree. 1.3.6.1.4.1.9.2.1.26 = bufferMdHit. 1.3.6.1.4.1.9.2.1.27 = bufferMdMiss. 1.3.6.1.4.1.9.2.1.31 = bufferBgTotal. 1.3.6.1.4.1.9.2.1.32 = bufferBgFree. 1.3.6.1.4.1.9.2.1.34 = bufferBgHit. 1.3.6.1.4.1.9.2.1.35 = bufferBgMiss. 1.3.6.1.4.1.9.2.1.39 = bufferLgTotal. 1.3.6.1.4.1.9.2.1.40 = bufferLgFree. 1.3.6.1.4.1.9.2.1.42 = bufferLgHit. 1.3.6.1.4.1.9.2.1.43 = bufferLgMiss. 1.3.6.1.4.1.9.2.1.63 = bufferHgTotal. 1.3.6.1.4.1.9.2.1.64 = bufferHgFree. 1.3.6.1.4.1.9.2.1.66 = bufferHgHit. 1.3.6.1.4.1.9.2.1.67 = bufferHgMiss. Trace Example – OpenNMS
Conclusion • Simulation can be a very cost effective method to: • Verify enhancements to your NMS before placing them into production • Proactively ensure that key conditions are handled properly • Test and debug complex workflow Our tools are proactive – why shouldn’t we be as well?
Questions? Contact information: Dennis Morton Snmp@greenwichtech.com Cell: (214) 289-3675