1 / 27

Time Series Shapelets: A New Primitive for Data Mining

Time Series Shapelets: A New Primitive for Data Mining. Lexiang Ye and Eamonn Keogh University of California, Riverside. Classification. Classification Huge interest in time series Extensive applications Nearest Neighbor Most accurate (in extensive empirical tests) Robust Simple.

Télécharger la présentation

Time Series Shapelets: A New Primitive for Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside

  2. Classification • Classification • Huge interest in time series • Extensive applications • Nearest Neighbor • Most accurate (in extensive empirical tests) • Robust • Simple

  3. Drawback of the NN • Time and space complexity • Results are not interpretable

  4. Solution • Shapelets • shapelets are time series subsequences which are maximally representative of a class • Distinguishing substring selection • Probe design (computational biology)

  5. Motivating example

  6. false nettles stinging nettles false nettles Shapelet Dictionary I Shapelet 5.1 Leaf Decision Tree I yes no 0 1 false nettles stinging nettles stinging nettles false nettles

  7. Brute-Force Algorithm

  8. Extract subsequences of all possible lengths Candidates Pool ca . . .

  9. Testing the utility of a candidate shapelet • Information gain • Arrange the time series objects • Find the optimal split point • Pick the candidate achieving best utility as the shapelet candidate Split Point 0

  10. Candidates Pool Problem • Total number of candidate • Trace dataset • 200 instances, each of length 275 • 7,480,200 shapelet candidates • approximately three days . . .

  11. Speedup • Distance calculations from time series objects to shapelet candidates are the most expensive part • Reduce the time in two ways • Distance Early Abandon (known idea) • Admissible Entropy Pruning (novel idea)

  12. Admissible Entropy Pruning

  13. Admissible Entropy Pruning • Information Gain • Traditional evaluation in decision tree • Easily generalized to the multi-class problem • Reduce the number of distance calculations

  14. stinging nettles false nettles 0

  15. I=0.42 I= 0.29 0 0

  16. false nettles stinging nettles false nettles Shapelet Dictionary I Shapelet 5.1 Classification Leaf Decision Tree I yes no 0 1 false nettles stinging nettles stinging nettles false nettles

  17. EXPERIMENTAL EVALUATION

  18. Performance Comparison 5 *105 1.00 Brute Force 4 *105 0.95 3 *105 seconds accuracy 0.90 2 *105 Currently best published accuracy 91.1% Pruning 0.85 1 *105 0 0.80 160 10 20 40 80 10 20 40 80 320 160 |D|, the number of objects in the database |D|, the number of objects in the database

  19. Projectile Points

  20. Arrowhead Decision Tree I II 0 2 1 Avonlea Clovis 1.0 (Clovis) 11.24 I 0 (Avonlea) 85.47 II Shapelet Dictionary 0 200 400

  21. Wheat Spectrography 1 0.5 0 0 200 400 600 800 1000 1200 one sample from each class

  22. I V II III IV VI 2 4 0 1 3 6 5 Shapelet Dictionary I 0.4 II 0.3 III 0.2 IV 0.1 0.0 V VI 300 0 100 200 Wheat Decision Tree

  23. the Gun/NoGun Problem No Gun Gun (No Gun) 2 38.94 I 0 Shapelet Dictionary 0 50 100 Gun Decision Tree I 1 0

  24. Gait Analysis

  25. 0 100 200 300 Reduces the sensitivity of alignment 1.0 0 0.909 0.902 0.860 right toe 144.075 I left toe (Normal Walk) Walk Decision Tree I 0.535 0 1

  26. Conclusions • Interpretable results • more accurate/robust • significantly faster at classification

  27. Thank You  Question? • All of the datasets are free to download http://www.cs.ucr.edu/~lexiangy/shapelet.html • Code available upon request

More Related