330 likes | 424 Vues
Learn how simulation can boost your management system's effectiveness, overcome limitations, and test complex scenarios quickly and efficiently. Explore simulation options and solutions to maximize your system's ROI.
E N D
How to Increase the ROI of Your Management System Through Simulation Dennis Morton Practice Director, Network Operations and Infrastructure Management Greenwich Technology Partners
Session Overview • Motivation for using simulation • The MIMIC SNMP Simulator • Technology and Limitations • Simulation Examples • OpenView NNM • MtTrapd Probe • Visionary • OpenNMS • What’s my tool doing? • Wrap-up
The Problem… • Management Systems are complex • Many moving parts • Many vendors • Many interfaces • How can you test all the interfaces in the system? • Little toleration for “failure” • … especially for critical problems • How do you make sure you have every condition covered? • How do you test conditions like excessive CRC errors or excessive spanning tree recalculations? • There are much more complex problems to test
Solutions – “Wait and See” • Wait until something happens and then modify the system to catch it next time • Terrible approach! • The problem must happen twice to verify that the system can catch it! • … if you made the modification properly the first time! • 100% Reactive • What happened to proactive network management? • Not good to have proactive tools but a reactive engineer…
Solution – “Raw Capture” • Raw Probe capture and log file duplication • Not much better • Still must wait for a problem to occur • However, can ensure problem is solved • Can be risky for production systems • Raw capture files can/will get huge • Only works at the Omnibus interface • What about everything before that? • Still 100% Reactive!
Solutions – “Build a Lab” • Build a lab composed of real devices and a mirror of the production system • Ideal solution • But… • Difficult to use • You must configure the devices and create conditions • If not you, then someone else who you must schedule time with • May run into conflicts • Expensive! • Real boxes cost real $$ • Could share, but this rarely works in practice
Solution – “Simulate Your Network” • Use a simulation tool to create a near-perfect copy of your production network • Many advantages • Simulate your actual network topology and devices • Simple to cause faults, even complex ones • Extremely rapid edit-test cycle • Allows you to test many alternatives quickly • Rapid feedback is good • Much, much easier to test/verify complex workflow • Think Impact policy testing, Reporting, Gateways, etc… • This is how the vendors do it! • Pump them for their simulations • Some caveats • Covered in detail later…
MIMIC SNMP Simulator • Market leader in SNMP simulation • http://www.gambitcomm.com/ • Used by the majority of hardware companies for testing SNMP agents and developing/testing management software • Used by software companies like Micromuse as well • Products • Core MIMIC SNMP Simulator • MIMIC Recorder • Discovery Wizard, Simulation Wizard, Topology Wizard, etc. • Much more… • Cisco IOS Simulator • Full IOS Simulation via Telnet • Cable Modem Simulator
MIMIC SNMP Simulator – Technology • Seamlessly supported on Linux, Windows, and Solaris • 2000 agents/box for Windows, 10,000 (!)/[U|Li]nix box • Core Simulation Engine • Supports any combination of SNMP V1, V2, V2c, and V3 • Extremely fast, native code core engine • 1GHz PIII w/512MB can easily simulate 250 devices with boatloads of headroom • TCL/Tk GUI Components • MIMIC Shell for CLI access to the engine and MIMICView for GUI access to features • Orthogonal feature sets • Wizards for ease of performing complex functions • I.e. Discovery Wizard for “recording” an entire network of devices • Topology Wizard for manipulating a topology • TCL, Perl, Java, and C++ APIs • TCL supported at the engine level • IOW, you have to learn TCL
MIMIC SNMP Simulator – Cost • Priced by agent plus yearly maintenance fee • 25 Agent license is $5K + $1,250 (support) • 250 Agent license is $10K + $2500 (support) • Or, about $250/agent for a 25 agent license and $50/agent for a 250 agent license. • Food for thought… • Used 2600 routers cost $750 and up • Much, much more for anything but basic interfaces • Feature-rich switches even more • Trunking/VoIP == $$$! • How much would electricity cost for 25 boxes?
Limitations of Simulation • Most limitations stem from one fact: a single physical node is acting like many virtual nodes • Can’t just change ifOperStatus to force a link “down” • Node is still ping-able, after all • For root-cause tools, YMMV • Causing good faults takes more thought • MIMIC-specific notes • May not have enough node licenses available for your network • Some faults are just not possible • EIGRP, for example • Some require fairly complex TCL scripting • But, when have we ever shied away from that! • SNMP, IOS, and Telnet only. • No syslog simulator
MIMIC Caveats • MIMIC has some caveats as a NMS simulation tool • Recording • Only records a single IP address for an agent • Gambit has a script to add the additional IP aliases • Still works quite well with only one, though • All counters set to one of three functions • Simulating • Tricky to have many problems happen simultaneously • Mainly affects root-cause simulation • Simulations consist of thousands of small files • Very very difficult to create one by hand! • Simulation Wizard greatly simplifies this • Can become difficult to remember where you put modified files
Example Simulation • Recording of the GTP internal network • Two real devices • Can you spot them? • Mix of Cisco routers, switches, and Sun workstations • Note: • Community Strings in the simulation do not have to be the same as those on the real devices • Same goes for any aspect of the simulated device. IOW, you could test the affect of moving to SNMPv3 on your systems!
Example – OpenView NNM • NNM works quite well with MIMIC • When recording, MIMIC stores the ARP and routing tables • NNM will use these to draw a nice topology!
Example – OpenView NNM (cont.) • Most built-in NNM applications work fine • Traceroute obviously won’t, though • Any tool that uses SNMP to locate a route, however, will work
Example – Visionary • MIMIC works especially well with tools like Visionary that have no notion of topology • Just enter the IP address of the agent and start causing faults! Change local.busyPer here
Example – Visionary (cont.) See the event Immediately!
Example – MtTrapd Probe • First, generate traps • Once/periodically via GUI or arbitrarily via scripts • Specify trap type and rate • MIMIC can quite easily cause a trap storm!
Example – MtTrapd Probe (cont.) • Specify Trap Variable bindings • Fun test – try nonsensical values!
Example – MtTrapd Probe (cont.) • Choose security options (optional) • MIMIC supports the full range of traps/informs
Example – MtTrapd Probe (cont.) • Voila! Note the Counts!
Example – OpenNMS • Great example of the limitations of simulation • Discovered the same services on all simulated nodes! • Will still work, but should disable service monitoring for the MIMIC server
What’s My NMS Tool Doing? • Another great use for simulation: tracing the SNMP polling behavior of NMS tools • Lets you explore in detail precisely what your NMS tools are doing and how they react to changes • What happens when you turn off SNMPv1 and turn on v2/v3? • What happens when you radically change a node? • What MIB objects are being polled and how often? • How efficient is my polling engine? • Many, many more examples… • Tremendously useful for customization • Trace output easier to use than Sniffer output, for example • Let’s see what a trace looks like…
PDU packing in action! This single GET had 71 varbinds Very useful for testing/tracing custom rules INFO 09/08.06:49:42 - agent 16, PDU type GET, req ID 1000a3e3 1.3.6.1.4.1.9.9.43.1.1.1.0 = ccmHistoryRunningLastChanged.0 1.3.6.1.4.1.9.9.43.1.1.2.0 = ccmHistoryRunningLastSaved.0 1.3.6.1.4.1.9.9.43.1.1.3.0 = ccmHistoryStartupLastChanged.0 1.3.6.1.4.1.9.2.1.46.0 = bufferFail.0 1.3.6.1.4.1.9.2.1.47.0 = bufferNoMem.0 1.3.6.1.4.1.9.2.1.19.0 = bufferSmMiss.0 1.3.6.1.4.1.9.2.1.27.0 = bufferMdMiss.0 1.3.6.1.4.1.9.2.1.35.0 = bufferBgMiss.0 1.3.6.1.4.1.9.2.1.43.0 = bufferLgMiss.0 1.3.6.1.4.1.9.2.1.67.0 = bufferHgMiss.0 1.3.6.1.4.1.9.9.48.1.1.1.5.1 = ciscoMemoryPoolUsed.1 1.3.6.1.4.1.9.9.48.1.1.1.6.1 = ciscoMemoryPoolFree.1 1.3.6.1.4.1.9.9.48.1.1.1.7.1 = ciscoMemoryPoolLargestFree.1 1.3.6.1.4.1.9.9.48.1.1.1.5.2 = ciscoMemoryPoolUsed.2 1.3.6.1.4.1.9.9.48.1.1.1.6.2 = ciscoMemoryPoolFree.2 1.3.6.1.4.1.9.2.1.56.0 = busyPer.0 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.48 = cpmProcExtUtil1Min.1.48 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.90 = cpmProcExtUtil1Min.1.90 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.96 = cpmProcExtUtil1Min.1.96 1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.97 = cpmProcExtUtil1Min.1.97 1.3.6.1.4.1.9.9.109.1.1.1.1.4.1 = cpmCPUTotal1min.1 1.3.6.1.2.1.10.7.2.1.2.2 = dot3StatsAlignmentErrors.2 … and so on! Trace Example – Visionary
Trace from OpenNMS 1.0.1 Notice: Use of Get BULK Correctly noticed that node supported SNMPv2 INFO 08/25.20:01:34 - agent 25, PDU type BULK, req ID 3191fd84 1.3.6.1.4.1.9.2.1.58 = avgBusy5. 1.3.6.1.4.1.9.2.1.8 = freeMem. 1.3.6.1.4.1.9.2.1.46 = bufferFail. 1.3.6.1.4.1.9.2.1.47 = bufferNoMem. 1.3.6.1.4.1.9.2.1.15 = bufferSmTotal. 1.3.6.1.4.1.9.2.1.16 = bufferSmFree. 1.3.6.1.4.1.9.2.1.18 = bufferSmHit. 1.3.6.1.4.1.9.2.1.19 = bufferSmMiss. 1.3.6.1.4.1.9.2.1.23 = bufferMdTotal. 1.3.6.1.4.1.9.2.1.24 = bufferMdFree. 1.3.6.1.4.1.9.2.1.26 = bufferMdHit. 1.3.6.1.4.1.9.2.1.27 = bufferMdMiss. 1.3.6.1.4.1.9.2.1.31 = bufferBgTotal. 1.3.6.1.4.1.9.2.1.32 = bufferBgFree. 1.3.6.1.4.1.9.2.1.34 = bufferBgHit. 1.3.6.1.4.1.9.2.1.35 = bufferBgMiss. 1.3.6.1.4.1.9.2.1.39 = bufferLgTotal. 1.3.6.1.4.1.9.2.1.40 = bufferLgFree. 1.3.6.1.4.1.9.2.1.42 = bufferLgHit. 1.3.6.1.4.1.9.2.1.43 = bufferLgMiss. 1.3.6.1.4.1.9.2.1.63 = bufferHgTotal. 1.3.6.1.4.1.9.2.1.64 = bufferHgFree. 1.3.6.1.4.1.9.2.1.66 = bufferHgHit. 1.3.6.1.4.1.9.2.1.67 = bufferHgMiss. Trace Example – OpenNMS
Conclusion • Simulation can be a very cost effective method to: • Verify enhancements to your NMS before placing them into production • Proactively ensure that key conditions are handled properly • Test and debug complex workflow Our tools are proactive – why shouldn’t we be as well?
Questions? Contact information: Dennis Morton Snmp@greenwichtech.com Cell: (214) 289-3675