dynaTrace 6.0 Incidents
Agenda Incidents, Alerts & SLAs Reacting to Alerts & Incidents
contains is based on triggers Incident, Alerts and Thresholds Definition Incident rule (System profile) Condition Condition Condition SubscribedMeasure Action Action Action Incident rule engine Operation Raise incident Condition fulfilled? Check the conditions of the incident rules YES Execute actions
Define Incidents • Incidents are based on MeasuresSystem Profile Edit System Profile Incidents Create Incident Rule
Incident Conditions Specify multiple Conditional Measures and Violation Threshold Specify Aggregation and Timeframe Specify Cool Down Period
Timeframe and Aggregation • Timeframe: 1 Minute • Aggregation: Average 1 Minute Threshold 2 seconds Time 2.0 sec average Incident still condition fulfilled 1.6 sec average 1.9 sec average 2.5 sec average Incident condition fulfilled 1.8 sec average Incident closed
Measure Thresholds • Define Thresholds for subscribed Measures e.g.: • Business Transaction Results • PurePath Measures (Web Request Time, …) • UEM Metrics (User Actions, Visits, ...) • Monitoring Metrics (CPU Load, Memory Usage, ...) • Upper Boundaries trigger when • Value is GREATER or EQUAL to threshold • Lower Boundaries trigger when • Value is SMALLER or EQUAL to threshold
Incident Actions • Actions to take upon Incident • Email Notification • Smart Alerting • Advanced Configuration • Customizable Action Plugins • Change Configuration • Thread/Memory Dumps • Custom Plugins
Incident Charts Split: one indicator per incident current state historic state in timeframe (has been violated before) Aggregated: details show which incidents are violated
Incident Details, Confirmationand Drilldown Graphical history of Incidents • Time in Continuous Session • Incident Actions • Confirm, Set in Progress • Show Measures in Chart • Drill down to Recorded Session
Charting with Thresholds • Show Thresholds on chartsDashlet Properties Series Show Thresholds Traffic Light Chart: adheres to timeframe Show Threshold Limits in Charts
Counting Threshold Violations • Count how often a Measure Threshold was violated • How many Transactions were slower than a defined threshold?
Counting Thresholds Violations Configure a Threshold Violations->Count Measure. Set the Source Measure and Threshold. Measure Type ThresholdViolations Based on another Measure with defined Thresholds
Hands On: Incidents for easyTravel • Scenario • Core Training -> Standard • Goals • We want to raise an Incident whenever the Search Transaction exceeds 1 second in Response Time. • Steps • Add a Threshold to your easyTravel Search Business Transaction -PurePath Response Time Result Measure • Create an Incident based on that Condition • Hint: Use a 10 second evaluation timeframe so as to see the Incident ‘fire’ quickly
Hands On: Incidents for easyTravel • Steps • Create a Dashboard with the following: • A line chart for the Avg. Response Time of the easyTravel Search Business Transaction and show the Thresholds • Copy/Paste this chart and change it to a Traffic Light Chart • Add an Incident Chart for the Incident you created • Save this Dashboard as easyTravel Search • Change to the Core Training -> Incidents Scenario • Monitor your Dashboard to see what happens • Change back to the Core Training -> Standard Scenario • Monitor your Dashboard to see what happens • Do you feel comfortable explaining why some “red x’s” still appear?
Hands On: Incidents for easyTravel • Scenario • Core Training -> Standard • Goals • We need to know if no data is being captured as this may indicate problems with easyTravel. • Steps • Create a Web Request Count Measure and set the Lower Severe Threshold to 0 • Create an Incident named “No Web Requests” using this Measure as the Condition • Hint: Set the aggregation to count • Turn off the easyTravel load and verify your Incident opens.
React to Incidents Receive Notifications Take a look at the Incident Dashboards View Incident Details Drill down to PurePath Open corresponding Dashboard Open Configuration Overview
Receive Notification • Via Email • Popup in dynaTrace client • Integrated Incident Dashlet
Incident Dashboard Ordered by
Hands On: Diagnose and Triage Incidents • Goals • We want to quickly trouble shoot the slow Search transactions which caused the Incident in the previous hand-ons to fire. • We need to triage the data to the Development Team • Steps • Use the Incident Chart Dashlet’s table on your easyTravel Search Dashboard to drill into Incidents or open the Incidents Dashlet • Drill into the PurePaths – can you identify the problem? • Export the Incident Session so you can send it to the Development Team. • See what “Show Measures in Chart” and “Show in Dashboard” do