1 / 18

Abdullah Mueen Eamonn Keogh University of California, Riverside

Online Discovery and Maintenance of Time Series Motifs. Abdullah Mueen Eamonn Keogh University of California, Riverside. Time Series Motifs. Repeated pattern in a time series Utility: Activity/Event discovery Summarization Classification Application

michi
Télécharger la présentation

Abdullah Mueen Eamonn Keogh University of California, Riverside

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Discovery and Maintenance of Time Series Motifs Abdullah Mueen Eamonn Keogh University of California, Riverside

  2. Time Series Motifs • Repeated pattern in a time series • Utility: • Activity/Event discovery • Summarization • Classification • Application • Data Center Chiller Management (Patnaik, et al. KDD 09) • Designing Smart Cane for Elders (Vahdatpour, et al. IJCAI 09) 30 20 10 0 2000 4000 6000 8000 0

  3. Online Time Series Motifs • Streaming time series • Sliding window of the recent history • What minute long trace repeated in the last hour?

  4. Problem Formulation Discovery 1 2 3 4 5 6 7 8 . . . 873 100 200 300 400 500 600 700 800 900 1000 The most similar pair of non-overlapping subsequences -7000 -7500 -8000 . . . 1 2 3 4 5 6 7 8 . . . 873 874 Maintenance time:1005 time:1003 time:1004 time:1002 time:1001 1 2 3 4 5 6 7 8 . . . 873 874 875 1 2 3 4 5 6 7 8 . . . 873 874 875 876 1 2 3 4 5 6 7 8 . . . 873 874 875 876 877 1 2 3 4 5 6 7 8 . . . 873 874 875 876 877 878 Update motif pair after every time tick time

  5. Challenge • A subsequence is a high dimensional point • The dynamic closest pair of points problem • Closest pair may change upon every update • Naïve approach: Do quadratic comparisons. 3 3 3 4 4 4 1 1 7 7 7 Deletion Insertion 6 6 6 5 5 5 8 8 8 2 2 2 9

  6. Our Approach • Goal: Algorithm with Linear update time • Previous method for dynamic closest pair (Eppstein,00) • A matrix of all-pair distances is maintained • O(w2) space required • Quad-tree is used to update the matrix • Maintain a set of neighbors and reverse neighbors for all points • We do it in O( ) space

  7. Outline • Methods • Evaluation • Extensions • Case Studies • Conclusion

  8. Maintaining Motif • Smallest nearest neighbor → Closest pair • Upon insertion • Find the nearest neighbor; Needs O(w) comparisons. • Upon deletion • Find the next NN of all the reverse NN 3 3 3 4 4 4 1 1 1 7 7 7 6 6 6 5 5 5 8 8 8 2 2 2 9 9 Insertion Deletion

  9. Data Structure 7 8 6 4 5 1 3 2 Total number of nodes in R-lists is w R-lists 4 Points that should be updated 7 4 8 7 1 2 8 1 2 3 4 7 5 1 7 1 3 3 6 FIFO Sliding window (ptr, distance) 1 2 3 4 5 6 7 8 1 7 5 7 6 5 7 2 4 5 6 5 2 1 4 3 8 7 5 7 N-lists Still O(w2) space Neighbors in order of distances 3 4 8 2 4 5 2 1 8 2 8 6 1 8 3 1 8 3 6 3 5 6 6 4 6 4

  10. Observations • While inserting • Updating NN of old points is not necessary • A point can be removed from the neighbor list if it violates the temporal order Average length O( )

  11. Example 4 6 8 7 7 5 9 4 3 4 5 6 7 8 9 10 3 8 3 3 3 10 4 4 4 3 4 3 3 6 6 5 2 5 6 4 5 4 7 8 7 1 2 3 4 5 6 7 8 7 7 7 1 1 6 6 6 5 8 1 1 1 2 3 1 2 5 5 5 6 9 2 3 4 5 3 6 8 8 8 2 2 2 8 6 4 7 5 4 5 10 3 7 9 6 9 9 2 3 4 5 6 7 8 9 1 2 1 2 3 2 3 3 2 6 4 5 4 6 8 5 7 6

  12. Evaluation 0.7 • Up to 8x speedup from general dynamic closest pair • Stable space cost per point with increasing window size RW 0.18 350 0.6 RW 8x 0.16 110 300 0.14 0.5 100 Fast Pair 250 EOG Insect 0.12 90 Insect Fast Pair 0.4 Average Update Time (Sec) RW Average Update Time (Sec) 0.1 200 80 EOG 0.3 EOG 0.08 70 RW 150 Average Length of N-list EEG EOG EEG 60 0.2 0.06 Average Length of N-list 100 Insect Insect 50 0.04 0.1 40 EEG 50 0.02 0 30 EEG 0 0 0 200 400 600 800 1000 20 0 0.5 1 1.5 2 2.5 3 3.5 4 Motif Length (m) 0 200 400 600 800 1000 10 x 104 Window Size (w) Motif Length (m) 0 0.5 1 1.5 2 2.5 3 3.5 4 x 104 Window Size (w)

  13. Outline • Methods • Evaluation • Extensions • Case Studies • Conclusion

  14. Extension • Multidimensional Motif • Maintain motif in Multiple Synchronous time series • Points in one series can have neighbors in others • Arbitrary Data Rate • Load shedding: Skip points if can’t handle 1.0 0.9 0.8 Fraction of Motifs Discovered 0.7 0.6 EOG Random Walk EEG 0.5 Insect 0.4 FIFO for each streams 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Data Used

  15. Raw PLA A MR B A B B B B B A 0 500 1000 1500 2000 Case Study 1: Online Compression • Replace all the occurrences of a motif by a pointer to that motif. • For signals with regular repetitions • Higher compression rate with less error. Dictionary

  16. Case Study 2: Closing The Loop • Check if a robot closed a loop • Convert the stream of video frames to stream of color histograms. • Most similar color histograms are good candidates for loop detection. 16 12 8 4 16 Start Loop Closure Detected 0 12 0 400 800 1200 1600 8 4 0 0 400 800 1200 1600

  17. Conclusion • First attempt to maintain time series motif online • Maintains minutes long repetition in hours long sliding window • Linear update time with less space cost than quad-tree based method • O( ) Vs O(w2) • Faster than general dynamic closest pair solution

  18. Thank You Code and Data: http://www.cs.ucr.edu/~mueen/OnlineMotif

More Related