DSM Scalability Considerations for Unicenter NSM r11
290 likes | 394 Vues
DSM Scalability Considerations for Unicenter NSM r11. Last Updated June 5 2006. Best Practice Summary – see notes. 50k local objects polled in one DSM is fine for r11 Manage polling to not exceed 600 polls per second Must configure –m parameter to allow this load
DSM Scalability Considerations for Unicenter NSM r11
E N D
Presentation Transcript
DSM Scalability Considerations for Unicenter NSM r11 Last Updated June 5 2006
Best Practice Summary – see notes • 50k local objects polled in one DSM is fine for r11 • Manage polling to not exceed 600 polls per second • Must configure –m parameter to allow this load • We encourage managing poll cycle use avg >20% and <50% of poll time window • More than 100 DSMs can report to one MDB
Objectives • Understand issues affecting DSM performance • Understand issues affecting scalability • Consider architectural options • Recommendations
Understand issues affecting DSM performance • Hardware • Local vs remote DSM(s) • Cold start vs. warm start • Electronic proximity to hosts • Network configuration and congestion • Number of hosts • Number of managed objects • Polling configuration
Hardware • See Hardware Requirements in NSM r11 Implementation Guide for latest guidance
Hardware • Does hardware matter? • 30,000 objects ~= 2 subnets with 50 objects per host
Local vs remote DSM(s) • For smaller implementations a local DSM on the MDB machine is OK • For larger implementations, remote DSM(s) should be strongly considered • DSM should be electronically close to what it polls and may connect to a remote MDB
Multiple Remote DSMs • Multiple remote DSMs have a synergistic effect
Local vs remote DSM(s) • Local and remote DSM not as strong
Cold start vs. warm start • Set “WarmStart=yes” option in %AGENTWORKS_DIR%\services\config\atmanager.ini • Warm start uses previously discovered objects • Reduces MDB access time • Reduces discovery process time • Must still confirm status
Cold start vs. warm start • Startup measured as time to DSM settling DSM start complete
Cold start vs. warm start • Startup elapsed times
Electronic proximity to hosts • Standard best practice not more than 3 hops • High performance LAN access to hosts and MDB • Avoid WAN links • Given a choice, put a DSM close to what it polls, instead of close to its MDB • Missed traps is in indication of excessive load or network busy – reduce distance of polling/traps
Network configuration and congestion • DSM should usually handle whole subnets • Fast/stable path to MDB • Network utilization • Errors, timeouts, and retries • Missed traps must be addressed • Poll cycle must have free time for lead peaking • Size counts
Number of hosts • Affects startup and first stage discovery • Affects total DSM object population • Affects DSM host configuration
Number of objects • Each managed host may spawn dozens of objects • Agents • Watchers • Split DSMs to keep number of objects constrained • Split DSMs to keep electronically close • Obrowser and query with no argument displays objects – actual polled objects usually is fewer
Polling configuration – see notes • Polling interval • Polling rate for r11 DSM sustained at up to 1,000 polls/second (laboratory only – do not exceed 600) • Speeds discovery (?) • Not needed for status polling • 10 to 20 minutes polling still best practice • 50,000 poll-able objects at 10 minute polling interval is about 80 polls/second • Timeouts are critical • Assume timeout 10, retry 2 = 30 second delay • DSM thread waits for reply or timeout on SNMPGET • IP policy makes extensive use of SNMPGET
Polling configuration • Calculating polling rates • Target no more than 50% MaxPollRate utilization and no less than 20% MaxPollRate utilization • 200/sec: five minute interval is 300 seconds so do not attempt more than 30k polls in five minute interval (300 seconds X .50 X 200 polls per second) = 30k objects polled every 5 minutes • Configure [aws_snmp] MaxPollRate in atservices.ini
Issues affecting scalability • Hardware • What hardware is available? • Can it support MDB + DSM? • Network • How electronically close are managed objects? • Is there capacity to handle polling and trap traffic? • How reliable is the network? • Geographic proximity • Do managed objects exist on other side of WAN? • Polling • What are the polling requirements?
Issues affecting scalability • Type of host activity • Web server • Application server • Database server • Batch server
Architectural Options • Local DSM • Fine for smaller shops • Add remote DSMs as necessary • Add remote DSMs to improve performance • Use several smaller DSMs • Closer to managed objects (most important tuning choice!) • Faster startup • More robust (not single point of failure) • Reduces effect of an outage • Bridged MDBs • Distribute MDBs for better DSM access – not critical unless bandwidth to MDB limited and high update activity