Performance Baselining, Benchmarking, and Monitoring

Performance Baselining, Benchmarking, and Monitoring Kevin Kline SQL Server Specialist Quest Software, Inc.

Agenda • Bio • Intro Question • Presentation • Questions as needed • Q & A • Book Drawing • Get your business cards ready!

Bio • Started in IT in 1986. BS in MIS in 1989 from University of Alabama. • Microsoft SQL Server MVP • Author of 5 database books • 1 on Oracle, 2 on SQL Server, 2 on SQL. • President of PASS (www.sqlpass.org) • Worked for NASA, US Army, and Deloitte & Touche before Quest Software • Chief architect and technology evangelist for Quest’s SQL Server tool suite

SQL Server 2005 in your future? How long after the introduction of SQL Server 2005 before you plan to deploy a major installation / application? • Within 6 months • 6 to 12 months • After 12 months • Only after extensive testing and analysis • No plans to deploy

Introduction • What will we cover today? • What is a performance baseline? • What is a performance benchmark? • How to perform long-term, 24x7 monitoring?

The importance of service monitoring • Service monitoring = observing the health of a service in real-time. • Enables DBA’s to observe service behavior proactively and quantitatively. • A key objective is to know, quantitatively, what the performance of a given server is and then manage to that standard. • Allows you to avoid depending on “user experience” as the key indicator for performance • Helps find problems even when users aren’t on the system

Two approaches to monitoring • Proactivemonitoring – take baseline measurements, benchmark metrics, and maintain an active monitoring regime. Ensures the most comprehensive service levels. • Exceptionsmonitoring – only exceptions to “normal” service are monitored. Provides the most coverage with the least amount of time.

Proactive Monitoring • Benefits – • Best chance of catching errors before they occur. • Gets you out of fire fighting mode. • Best information about environment and apps. Better long-term decisions. • Drawbacks – • Requires more time and deeper understanding of apps. • Requires review and analysis of charts, graphs and other information on an on-going basis.

Building a regimen – start with the Baseline • Build a baseline performance profile. • Makes you familiar with the operational behavior of each app/server. • Clearly documents what is “normal” for a server and/or application. • Identifies types of problems that arise even when the server is behaving normally: • Some problems require a response. • Some problems have no response.

Goal of a baseline • Tells you all about the performance of a server under normal conditions. • Document and understand as many as possible (if not all) background processes • Build in filters to catch “do not respond” situations before DBAs see them; Otherwise, apathy can set in.

Building a baseline • Need a single graphic representation, along with enough information to interpret the results. • Use System Monitor: real-time or saved to a log. • Choose a sampling interval that balances the need for data vs. the disk I/O to record the collections. Every 15 seconds is default. • Local vs. Remote monitoring? Pros and cons to each. • Next, you must assign and assess the SysMon counters. (Need a few more counters for building the baseline than you need for daily monitoring.)

SysMon Counters, OS • Memory – Pages/sec • The number of pages of code or data are read from or written to disk. • Primary indicator of the kinds of faults that cause system-wide delays. • Network Interface – Bytes total/sec • The number of bytes traveling over the network interface per second. • When dropping or trending lower, investigate whether or not network problems are interfering with your application. • PhysicalDisk - Disk Transfers/sec • The rate of read and write operations on the disk. • Define a counter for each physical disk on the server. • Enable Diskperf on some OS’es. • Processor - % Processor Time • The average percentage of time that the processor is executing a non-Idle thread observed during the sample interval.

SysMon Counters, Database I/O • SQLServer:Access Methods - Full Scans/sec. • The number of unrestricted full table or index scans. • SQLServer:Buffer Manager – Buffer Cache Hit Ratio • Percentage of pages found in the buffer pool that did not require a disk read. • SQLServer:Databases - Log Growths • The total number of log growths for a given database. • SQLServer:Databases Application Database - Percent Log Used • The percentage of space in the log that is in use. • SQLServer:Databases Application Database - Transactions/sec • The number of transactions started for the database.

SysMon Counters, Locking • SQLServer:Latches – Average Latch Wait Time • The average latch wait time (in milliseconds) for latch requests that had to wait. • SQLServer:Locks – Average Wait Time • The average amount of wait time (milliseconds) for each lock request that resulted in a wait. • SQLServer:Locks – Lock Waits/sec • The number of lock requests that could not be satisfied immediately and forced the caller to wait before the lock was granted. • SQLServer:Locks - Number of Deadlocks/sec • The number of lock requests that resulted in a deadlock. • Should remain relatively predictable. • High numbers often indicate a poorly designed application or set of transactions.

SysMon Counters, General Health • SQLServer:General Statistics - User Connections • The number of users connected to the database server. • Dramatic shifts in this value should be researched. • SQLServer:Memory Manager - Memory Grants Pending • The current number of processes waiting for a workspace memory grant. • A high or rising number may indicate inadequate memory. • SQLServer:User Settable – Query (a tracer query) • A tracer query is a user-written query that gives you an indication of the overall speed or efficiency of the system. • You define the tracer query using a procedure called sp_user_counter1. • Up to 10 are allowed.

Correlations to DB Activity • In SQL Server 2000, correlate SysMon counters to master..sysperfinfo and to activity shown by SQL Profiler. • In SQL Server 2005, correlate SysMon counters to SQL Profiler and various DMVs. • sys.dm_os_memory_caches, xxx_clerks, xxx_objects for all things memory • sys.dm_exec_query_stats to skip SQL Profiler in some situations. • sys.dm_os_threads to figure out thread memory consumption. • Many, many more…

Using SysMon • Demo

Benchmarking • The next step is to understand server performance under several usage scenarios that could possibly occur. This is known as benchmarking. • Use the PerfMon counters as you would when baselining. • Use a longer polling frequency. • Ideally, you should benchmark based on actual usage. • Use one of several popular benchmark scenarios available in the industry such as TPC-C or SAP, if real load is not available. • Alternately, use a load generation tool. • Best solution is to build benchmarking scenarios specific to your application using T-SQL scripts, SQL Profiler, or other third-party tools. • Capture and review the results of the benchmarked scenarios.

Ongoing Monitoring • Is an important, if not the most important, component of proactive monitoring. • If nothing else, use SysMon set to 15 minute polling frequency checking on these counters: • Memory – Pages/sec • Network Interface – Bytes total/sec • Physical Disk – Disk Transfers/sec • Processor - % Processor Time • SQLServer:Access Methods - Full Scans/sec • SQLServer:Buffer Manager – Buffer Cache Hit Ratio • SQLServer:Databases Application Database - Transactions/sec • SQLServer:General Statistics - User Connections • SQLServer:Latches – Average Latch Wait Time • SQLServer:Locks - Average Wait Time • SQLServer:Locks - Lock Timeouts/sec • SQLServer:Locks - Number of Deadlocks/sec • SQLServer:Memory Manager - Memory Grants Pending

Alerting • Alerts are defined events that raise a notification of some kind. • Use a well-defined tool like SQL Server Alerts & Notifications, SysMon, or Quest products like Foglight (for 24x7) or Spotlight (for real-time and short-term monitoring) to raise alerts. But which alerts to raise? • At a minimum, use this reference list of alerts: • Errors affecting service – specifically errors with a severity of 19 to 25! • Deadlocks • CPU utilization • Disk utilization • Scans (SQLServer:Access Methods)

Sample of Alerting • Demo

Other monitoring activities • If you have a lot of servers, build alerts on the following. If you have one or only a few, perform daily check on the following: • SQL Server log • SQL Agent log • Windows Application, Security, and System log • SQL Server job history and trends • Make sure the app upholds good error logging by ensuring that RAISERROR…WITH LOG is used extensively and that meaningful error messages, as well as all clear & summary messages, are written into the app.

On-going Exceptions Monitoring • The alternative to proactive monitoring is exceptions monitoring. • Only the exceptions to “normal” service are monitored, though the definition of an exception is very flexible. • Requires much less set up time and on-going overhead. • Requires more intimate knowledge of the application since you must anticipate all of the exceptions in advance and only capture information the reveals exceptions. • In other word, an exception management regimen uses limited performance monitoring, more alerts, and heavy use of error logs.

Exceptions Monitoring Regime • SysMon is used only to catch problem situations: • Memory – Pages/sec • Network Interface – Bytes total/sec • Physical Disk – Disk Transfers/sec • Processor - % Processor Time • SQL Buffer Manager – Cache Hit Ratio • Set up real-time alerts for all foreseen problem situations (e.g. user count exceeded, failed logins, disk space low, etc.). • Monitor or build alerts on the SQL Server error log. Foresee recovery & troubleshooting solutions for all common exceptions the application may experience.

Build a performance baseline of server & app. Use SysMon counters judiciously. Use 3rd party tools for added speed, ease of use, and information. Build benchmarks for better understanding of server & app performance. Review and analyze! Perform on-going monitoring, either proactive or exception monitoring. Incorporate alerts into your on-going monitoring regimen. Summary

Additional Resources • Freeware tools at: http://www.bb4.org/ and http://www.deadcat.net/. • Microsoft SQL Server Operations Guide - http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/maintain/operate/opsguide/sqlops5.asp

Questions & Answers • Email me at kevin.kline@quest.com with any additional questions. • Don’t forget to fill out an evaluation for this session! • Pass those business cards in for the drawing! 

Thank you! Thank you for attending this session and the 2005 PASS Community Summit in Grapevine! Please help us improve the quality of our conference by completing your session evaluation form. Completed evaluation forms may be given to the room monitor as you exit or to staff at the registration desk.

Performance Baselining, Benchmarking, and Monitoring

Performance Baselining, Benchmarking, and Monitoring

Presentation Transcript

Benchmarking and Performance Indicators

Performance Monitoring

EasyCBM: Benchmarking and Progress Monitoring System

Performance And Power Benchmarking

Baselining MECO

easyCBM: Benchmarking and Progress Monitoring System

Performance Benchmarking and Strategic Change Management

SIP Performance Benchmarking

Benchmarking and Performance Evaluations

HPCMP Benchmarking and Performance Analysis

Performance Monitoring and Comparative Evaluation (Benchmarking)-Experience from Kosovo

Performance Benchmarking Part 2

Performance Improvement Through Benchmarking and Performance Measurement

Safety Performance Benchmarking

Troubleshooting and performance monitoring

Performance Monitoring

Troubleshooting and performance monitoring

Benchmarking, Performance Evaluation, Modeling and Prediction

The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking

Performance Monitoring

Raspberry Pi Performance Benchmarking

BENCHMARKING PRIVATE EQUITY PERFORMANCE