1 / 22

Identifying Patterns in Time Series Data

Identifying Patterns in Time Series Data. Daniel Lewis 04/06/06. Time Series Data. Definition: “An ordered set of m real-valued variables” How can patterns that occur in time series data be located?. Comparing Time Series. Euclidean Distance:. Adaptive Piecewise Constant Approximation.

Télécharger la présentation

Identifying Patterns in Time Series Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Patterns in Time Series Data Daniel Lewis 04/06/06

  2. Time Series Data • Definition: • “An ordered set of m real-valued variables” • How can patterns that occur in time series data be located?

  3. Comparing Time Series • Euclidean Distance:

  4. Adaptive Piecewise Constant Approximation • Segments of time series data are represented by 2 values, the mean value of all points in the segment and the right endpoint of the segment • Allows large queries to be quickly compared, but APCA representations must be created first.

  5. Adaptive Piecewise Constant Approximation (cont')

  6. APCA (cont') • Data Compression allows for faster search • Can be used for indexing large series • Can handle large queries • But what if we want to identify patterns in streaming time series data?

  7. Pattern Recognition • For a given query q and a time series s, if the Euclidean distance between q and s < r, the series match • r: user defined, application specific threshold

  8. Detecting Patterns in Streaming Data • Brute Force Method: • P1 ,.., Pn : Set of patterns of length k • S: Input Stream • For every possible substring of length k in s, calculate the distance between the substring and all n patterns • This method is obviously extremely costly in the case of a large pattern set and a large input stream • O(n(|S| - k))

  9. Speeding Up Pattern Identification • Early Abandoning: • If at any point error > r 2, we can stop computation

  10. Wedge Creation • Combine multiple patterns into a wedge: • Define Upper Limit: • Ui = max( C1i , .. , Cki) • Define Lower Limit: • Li = min( C1i , .. , Cki) • This produces a wedge such that:

  11. Wedge Creation (cont')

  12. Wedge Comparison • Distance Between a Query and a Wedge: • If distance > r, then the distance between the query and all component patterns > r, allowing you to eliminate multiple possible matches with a single comparison

  13. Hierarchical Wedges • The usefulness of any wedge is determined by the similarity of the patterns used in its construction. • More similar patterns create smaller, more useful wedges • Patterns can be combined in a tree-like pattern to produce a hierarchy of wedges

  14. Hierarchical Wedges (cont')

  15. Atomic Wedgie • Preparation: • All patterns are clustered by similarity • The most similar patterns are combined into wedges • The resulting wedges are combined to form larger, less specific wedges

  16. Atomic Wedgie (cont') • Usage: • When streaming data arrives, each substring of length k is first compared to the largest wedge, if dist > r, comparison stops, else, the distance is compared against the two component wedges, eliminating any branches where the distance exceeds r. • Eventually, all branches are eliminated or a single (atomic) pattern is matched

  17. Atomic Wedgie Optimization • Optimization: • Summation is order independent • Large sections are less likely to increase error than small sections • Thus, if error is summed starting with the smallest sections first, the requirements for early abandon are more likely to be met earlier

  18. Atomic Wedgie (cont') • Advantages: • If wedges are well formed, large speed increases can occur • A large number of similar possible patterns can be analyzed quickly • Disadvantages: • If wedges are poorly formed, the time required will exceed the Brute Force Method • Dissimilar patterns are not handled well

  19. Special Considerations • The choice of r (similarity threshold) is of great importance: • if r is too large, a substring can match too many patterns to be useful • if r is too small, too little matching may occur • Good Choice: • r = average distance from any pattern to its nearest neighbor

  20. Atomic Wedgie Results

  21. Chakrabarti, K., Keogh, E., Mehrotra, S., and Pazzani, M., Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases, ACM Transactions on Database Systems, Vol 27, 2002. Wei, L., Keogh, E., Van Herle, H., Mafra-Neto, A., Atomic Wedgie: Efficient Query Filtering for Streaming Times Series, Data Mining, Fifth IEEE International Conference on 27-30 Nov. 2005 Page(s):490 - 497 References

  22. Questions?

More Related