1 / 28

Building a Performance Monitoring System using XEvents and DMVs

Building a Performance Monitoring System using XEvents and DMVs. Ola Hallengren, Saxo Bank. About me. 2 |. Ola Hallengren https://ola.hallengren.com E-mail: ola@hallengren.com DBA in Saxo Bank, a Danish investment bank Microsoft MVP – Data Platform. Agenda. 3 |.

tasanee
Télécharger la présentation

Building a Performance Monitoring System using XEvents and DMVs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a Performance Monitoring System using XEvents and DMVs Ola Hallengren, Saxo Bank SQLBits 2017

  2. About me 2 | • Ola Hallengren • https://ola.hallengren.com • E-mail: ola@hallengren.com • DBA in Saxo Bank, a Danish investment bank • Microsoft MVP – Data Platform SQLBits 2017

  3. Agenda 3 | • About Extended Events • How we built a monitoring solution using Extended Events and DMVs • Techniques that we used • Demos SQLBits 2017

  4. About Extended Events 4 | • Light weight replacement for SQL Trace / Profilerhttp://msdn.microsoft.com/en-us/library/bb630282.aspx • Introduced in SQL Server 2008 • Greatly improved in SQL Server 2012 (more events) • Event data in xml format • CREATE EVENT SESSION (or SSMS GUI) • sys.fn_xe_file_target_read_file to read event (or SSMS GUI) SQLBits 2017

  5. XEvents – First steps 5 | • Creating an Extended Events session using the SSMS GUI or a script, with Event File as target • Query the events using the SSMS GUI or through XQuery on the production server (or copy the files to another server and query them there) SQLBits 2017

  6. XEvents – Challenges 6 | • XQuery is slow • XQuery is not easy to write if you are not familiar with it • Querying the events on the production server puts a load on the server (and even more if many DBAs are doing it at the same time) • You need to have access to the production server to query the events (makes it difficult to give access to developers) • If you copy the files to another server, you are not getting in new events • Difficult to correlate events with data from DMVs (like SQL Texts and Query Plans) SQLBits 2017

  7. XEvents Monitoring - Requirements 7 | • It should be running all the time on all servers • Events should be stored in a central database • Events should be available for querying very close to real-time (so that it can be used in live incidents) • If the monitoring solution is down, no events should be lost (it should just catch up when it starts again) • Data should be easily available for DBAs and developers, and without using XQuery • No XQuery on the production servers (for performance reasons) • Collection of SQL Texts and Query Plans (triggered by events) • SQL Server 2012 and later SQLBits 2017

  8. XEvents Monitoring – Design 8 | • A company default Extended Events session (database_health) running on all SQL Servers with Event File as target • PowerShell scripts (running on a central server) collecting events every 30 seconds • Using sys.fn_xe_file_target_read_file to read new events • Storing data into a central database • Views to access data • XQuery is performed either at load time in an instead-of-trigger or in the views when data is accessed • PostActions to collect SQL Texts and Query Plans SQLBits 2017

  9. XEvents Monitoring – Overview Events Database Job Server running PowerShell scripts DBAs Database Servers 9 | SQLBits 2017

  10. Scenario I: Timeout 10 | • An application is getting a command timeout in AdventureWorks. What is going on? • Use ExtendedEvents.AbortedExecutions to see the aborted query • See how the columns statement_last, statement, and query_plan are available, even though they are not in the events (this information is coming from sys.dm_exec_sql_text) • If the wait_type = LCK_* it is waiting for locks (it is being blocked) and we can use ExtendedEvents.BlockedProcessesto see who the blocker is • We can also see the root blocker SQLBits 2017

  11. Scenario II: Deadlocks 11 | • An application is getting a deadlock in AdventureWorks • Use ExtendedEvents.Deadlocks to see the deadlock graph • The deadlock graph is parsed in ExtendedEvents.DeadlockProcesses and ExtendedEvents.DeadlockResources • We can also see the “Transaction was deadlocked on lock resources with another process and has been chosen as the deadlock victim.” errors in ExtendedEvents.Errors SQLBits 2017

  12. Scenario III: Errors 12 | • An application is inserting data in a batch and are getting “String or binary data would be truncated.” - errors • Use ExtendedEvents.Errors to see the errors • We can see the statement, but we want to see the actual values that the application tried to insert • We can add the action sql_text to get the input buffer • Can generate very large amount of event data in short time, if there is a batch with many errors • There will be one event for each error, and the sql_text of each event will have the complete batch SQLBits 2017

  13. The session database_health 13 | • SQL Server comes with a system_health default session that contains a lot of useful information • We have created a company default extended event session that is running on all servers (database_health) • Different thresholds on different servers (higher duration thresholds on OLTP - servers, than data warehouse - servers) • Running with target Event File SQLBits 2017

  14. How an event is traveling - I 14 | • An event passes the predicate evaluation (filters) • Additional information (Actions) is collected (e.g. session_id) • The event is buffered to the memory buffers • The event is written to an event file (default 30 seconds latency) SQLBits 2017

  15. How an event is traveling - II 15 | • A job runs a PowerShell script on the job server (every 30 seconds) • The script is querying sys.fn_xe_file_target_read_file • The first time it is getting all events from the files • After that it is passing the last file name and file offset, that it has in its events database (so getting only new events) • The events are inserted into the events database • An instead-of-trigger is fired • The trigger is extracting out the most important elements and attributes using XQuery, and also does some data type conversions SQLBits 2017

  16. How an event is traveling - III 16 | • The PowerShell script is now collecting SQL Texts and Query Plans (PostActions) • Joins and additional logic (and sometimes more XQuery) in views SQLBits 2017

  17. Latency 17 | SQLBits 2017

  18. Where to do the XQuery? 18 | • Extracting out elements and attributes into its own columns at load time in an instead-of-trigger is optimal for query performance, but has a cost in load performance and storage • Doing the XQuery in the views is optimal for load performance and storage, but has a cost in query performance • The attribute timestamp has to be extracted out at load time (as you want to be able to look at the latest events fast) • In general try to avoid queries that need to do XQuery on large number of events • When the performance for a query is not acceptable, then it is time to move some of the elements or attributes to its own columns SQLBits 2017

  19. Blocking 19 | • The blocked_process_report event is very useful when investigating blocking problems • The event is only triggered if ‘blocked process threshold’ has been enabled on the server • It should not be set lower than 5 seconds • Handled by the same thread in SQL Server that is searching for deadlocks • A blocked_process_report event has always one blocked and one blocking process • Every time the thread wakes up and is looking for blocking it has a new monitor_loop_id(filter on monitor_loop_id to get a snapshot of the blocking)<blocked-process-report monitorLoop="1369"> SQLBits 2017

  20. Blocking – Using the Execution Stack 20 | • The executionStack in the blocked_process_reportcan be used to see which stored procedures and statements that are involved • The first frame is always the inner statement SQLBits 2017

  21. Getting SQL Texts 21 | • To get an SQL Text you need an sql_handle • The handle can be used in sys.dm_exec_sql_textto get the text • If you also have a start_offset and an end_offset you can extract the statement from the text • The sql_handle is a hash of the text • The sql_handle and offsets are available in the action tsql_frame(various events), the executionStack(blocked_process_report and xml_deadlock_report), and also in DMVs like sys.dm_exec_requests • By storing the text with the handle, the next time an event comes with the same handle you don’t need to go and get it (as you already have it) SQLBits 2017

  22. Getting Query Plans – The plan_handle 22 | • The action plan_handleis a “A token that refers to the compiled plan that the query is part of.” • The plan_handle can be used in sys.dm_exec_text_query_plan to get the query plan • The problem is that a query plan can change while keeping the same plan_handle SQLBits 2017

  23. Getting Query Plans – Statement - level 23 | • In events like wait_info and sp_statement_completed, and DMVs like sys.dm_exec_requests you have this information available:plan_handlestart_offsetend_offsetquery_hashquery_plan_hash • You can store this information with the plan and the next time you come across the same combination, you don’t need to get the plan (as you already have it) SQLBits 2017

  24. Getting Query Plans – Module - level 24 | • When you only have a plan_handle (like in module_end) then you need to go out and get the plan fast • You should also verify that there hasn’t been a recompile after the event (as it is then not the right plan). You can do that like this:WHERE NOT EXISTS(SELECT * FROM sys.dm_exec_query_stats WHERE plan_handle = @plan_handle AND creation_time > @timestamp) SQLBits 2017

  25. High frequency polling of events 25 | Things to consider when you are polling for new events frequently using sys.fn_xe_file_target_read_file: • Use small files! It is faster to query a small file than a large file (even if you specify a file name and a file offset) • Only specify a wildcard in the [path] when there has been a file rollover (check the current file in sys.dm_xe_session_targets)!When you specify a wildcard, then SQL Server will access all files (even if you specify a file name and a file offset) SQLBits 2017

  26. The offset is invalid for log file … 26 | • When you are querying sys.fn_xe_file_target_read_file with a file name and a file offset it can happen that you get an error like this:“The offset 2394624 is invalid for log file "...\database_health_0_130903806628660001.xel". Specify an offset that exists in the log file and retry your query.” • You get this error if all the files have been rolled over since you read events the last time (so the file name you come with no longer exists) • This is if there has been very large number of events generated in short time or if the monitoring solution has been down • Increasing the number of files reduces the risk of this happening SQLBits 2017

  27. Questions? 27 | The code is available at https://ola.hallengren.com/scripts/PerformanceStore.zip You can contact me at ola@hallengren.com SQLBits 2017

  28. Please fill in the evaluation forms SQLBits 2017

More Related