1 / 38

A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models

This framework aims to discover anomalous regimes in time-series data by estimating local models and comparing them to a reference dataset. It provides a hypothesis testing framework for detecting anomalies and has shown promising results on real and synthetic problems.

kellysmith
Télécharger la présentation

A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models Stephen Bay Stanford University, and Institute for the Study of Learning and Expertise sbay@apres.stanford.edu Joint work with Kazumi Saito, Naonori Ueda, and Pat Langley

  2. Discovering Anomalous Regimes Problem: Discover when a section of an observed time series has been generated by an anomalous regime. • Anomalous: extremely rare or unusual • Regime: the hypothetical true model generating the observed data

  3. Motivation • variables causally related • several different modes charge nasa.gov voltage temp. current www.ndi.org

  4. Other Categories of Irregularities • Outliers • Unusual patterns

  5. Discovering Anomalous Regimes in Time Series DARTS Framework 1. Reference and Test data 2. Local Models Estimate on windows Map into parameter space 4. Anomaly score Estimate density of T according to R compute threshold 3. Parameter space

  6. Local Models Vector Autoregressive models Regression format Ridge Regression

  7. Scoring and Density Estimation Estimate the density of local models from T relative to R in the parameter space Kernels NN style

  8. Determining a Null Distribution • Score function provides a continuous estimate but some tasks require hard cutoff • Null Distribution: • the distribution of anomaly scores we would expect to see if the data was completely normal • Resample R and generate empirical distribution from block cross-validation • Provides hypothesis testing framework for sounding alarms Anomaly score Empirical distribution

  9. Computation Time • Local Models • Linear in N (reference and test) • Cubic in number of variables (for AR) • Linear in window size (for AR) • Density Estimation • Implemented with KD-trees • Potentially NT log NR • Can be worse in higher dimensions

  10. Experiments • Why evaluation is difficult • Data sets • CD Player • Random Walk • ECG Arrhythmia • Financial Time-Series • Comparison Algorithms • Hotelling’s T2 statistic

  11. Hotelling’s T2 Statistic • Commonly used in statistical process control for monitoring multivariate processes • Basically the same as Mahalanobis distance • Applied with time lags for patient monitoring in multivariate data (Gather et al., 2001)

  12. CD Player • Data from mechanical cd player arm • Two inputs relating to actuators (u1,u2) • Two outputs relating to position accuracy (y1,y2)

  13. Output variable y1: artificial anomaly

  14. Output variable y2: unchanged

  15. Hotelling’s T2

  16. Random Walk • No anomalies in random walk data

  17. DARTS

  18. Hotelling’s T2

  19. Cardiac Arrhythmia Data • Electrocardiogram traces from MIT-BIH • Collected to study cardiac dynamics and arrhythmias • Every beat annotated by two cardiologists • 30 minute recording @ 360 Hz • Roughly 650,000 points, 2000 beats • Points 100-3000 reference set • remainder is test data

  20. Cardiac Reference Data

  21. DARTS V a a

  22. Hotelling’s T2 V a a

  23. DARTS j j j

  24. DARTS a

  25. TP/FP Statistics Sensitivity = TP / (TP + FN) Selectivity = TP / (TP + FP)

  26. Japanese Financial Data • Monthly data from 1983-2003 • Variables: • Monetary base • National bond interest rate • Wholesale price index • Index of industrial produce • Machinery orders • Exchange rate yen/dollar • True anomalies unknown • subjective evaluation by expert

  27. DARTS: Bond Rate

  28. DARTS: Monetary Base

  29. DARTS: Wholesale Price Index

  30. DARTS: Index Industrial Produce

  31. DARTS: Machinery Orders

  32. Hotelling’s T2

  33. Hotelling’s T2 vs. DARTS T2 can detect multivariate changes but, • Has little selectivity • Does not distinguish between variables • Does not handle drifts • F-statistical test often grossly underestimates proper threshold

  34. Limitations of DARTS • Suitability of local models • Window-size and sensitivity • Number of parameters • Overlapping data • Efficiency of KD-tree • Explanation

  35. Related Work • Limit checking • Discrepancy checking • Autoregressive models • Unusual patterns • HMM’s

  36. Conclusions • DARTS framework • Data -> local models -> parameter space -> density estimate • Provides hypothesis testing framework for flagging anomalies • Promising results on a variety of real and synthetic problems

  37. DARTS Framework • Preprocess R and T • Select target variable and create local models from R • Create local models from T • Compare models of T to R in space P • Compute Null Distribution • Repeat steps 2-5 for each variable

More Related