Fast Time Series Classification Using Numerosity Reduction DME Paper Presentation Jonathan Millin & Jonathan Sedar Fri 12th Feb 2010
Fast Time Series Classification Using Numerosity Reduction • Appearing in Proceedings of 23rd International Conference on Machine Learning 2006. • Authors: • Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei, • Computer Science & Engineering Dept, UC Riverside, CA • Chorirat ‘Ann’ Ratanamahatana. • Dept of Computer Engineering, ChulalongkornUni, Bangkok • Cited by 34 papers (Google Scholar)
Fast Time Series Classification Using Numerosity Reduction Overview • High classification accuracy on time-series data is achieved using Dynamic Time Warping and a novel application of numerosity reduction to efficiently reduce computational complexity.
Fast Time Series Classification Using Numerosity Reduction Agenda • Introduction • Methods • Dynamic Time Warping • Numerosity Reduction • Adaptive Warping Window (AWARD) • Fast AWARD • Results • Discussion
Introduction Time Series Classification Time-Series Data Classification • Classifying through pattern matching
Methods Dynamic Time Warping What is Dynamic Time Warping? • Compare similar time series allowing for temporal skew:
Methods Dynamic Time Warping How does DTW Work? • Align series • Construct distance matrix • Find optimal warping path • Introduce warping window to reduce complexity
Methods Dynamic Time Warping DTW Performance Fig. 3 Figs. 4,5,7 Reported comparisons Test sets (shown later)
Methods Dynamic Time Warping DTW Vs Literature ControlChart • Xi et al. (2006) use 1NN-DTW: error rate 0.33% • Rodriguez & Alonso et al (2000) use 1st order logic rules with boosting: error rate 3.6% • Nanopolus & Alcock et al. (2001) use multi-layer perceptron NN: error rate 1.9% • Wu & Chang (2004) use ‘super kernel fusion’: error rate 0.79% • Chen & Kamel (2005) use ‘Static Minimization-Maximization approach’: best error rate 7.2% ECG • Xi et al. (2006) use 1NN-DTW and Euclidian Distance: ‘perfect accuracy’ • Kim & Smyth et al. (2004) use HMM: 98% accuracy Lighting (FORTE-2) • Xi et al. (2006) use 1NN-DTW: error rate 9.09% • Eads & Glocer et al. (2005) use grammar guided feature extraction: error rate 13.22%
Methods Dynamic Time Warping Dynamic Time Warping • DTW is ‘at least as accurate’ as Euclidean distance
Methods Dynamic Time Warping DTW gives great results, but • Naive implementation is computationally expensive • LB_Keogh reduces amortised cost to O(n) • At the limits of DTW algorithm optimisation • Look elsewhere for classification speed gains... ...Numerosity reduction
Methods Numerosity Reduction Techniques Numerosity Reduction Techniques • Naive Rank Reduction • Adaptive Warping Window (AWARD) • Fast Numerosity Reduction (FastAWARD)
Methods Numerosity Reduction: Naive Rank Reduction Naive Rank Reduction • Principle: remove instances in an order which minimises misclassifications. • Ranking (iterative O(n)) • Remove duplicates • Apply 1NN classification • Rank each x according to class of 1st NN • Break ties by proximity of nearest class • Thresholding • User defined, (keep n highest, best n%) x1 d1 x2 d2 x3 d3 x4 d4 x5 d3 >d4 > d2 > d1
Methods Numerosity Reduction: Naive Rank Reduction Naive Rank Reduction • Classification accuracy declines when the size of the dataset decreases • Larger r gives better accuracy on smaller datasets • Motivates adaptive window
Methods Numerosity Reduction: AWARD Adaptive Warping Window (AWARD) • What • Dynamically adjusting the window size during numerosity reduction • Why • Larger windows give better accuracy on smaller datasets • How • Initialise r to best warping size (exhaustive search r=1:100) • Begin Naïve Rank Reduction (shown earlier) • Tests accuracy of the reduced set with r and r+1 • If accuracy(r+1)>accuracy(r) then r++ • Problems • Provides a better accuracy during numerosity reduction, but the additional checks increase complexity from O(n) to O(n3)
Methods Numerosity Reduction: FastAWARD FastAWARD • What • Essentially AWARD, but uses the calculations from previous iterations to reduce complexity • Why • Reduce complexity to reduce execution time • How • performs incremental updates after each step to reduce complexity of future steps
Methods Numerosity Reduction: FastAWARD How - Storing information • Done by storing (for each i=r:100): • Nearest neighbour matrix (A) • Distance matrix (B) • Accuracy array (ACC) C ACC r r Q
Methods Numerosity Reduction: FastAWARD How – Incremental Updates • After each item is discarded: • Update A (Neighbors) • Update B (Distances) • Update ACC (Accuracy) • Check if ACC[r+1]>ACC[r] x1 x1 d1 d1 x2 x2 d2 bob dnew d3 x3 x3 d4 d4 x4 x4 d3 >d4 > d1 > d2 dnew> d1 > d3
Methods Recap Interim Recap • Dynamic Time Warping accounts for skew • Using AWARD numerosity reduction • FastAWARDvs AWARD ...Does it work?
Results Experimental Work Experiments (Accuracy)
Results Experimental Work Experiments
Results Experimental Work Experiments (Accuracy) • etc
Results Experimental Work Experiments (Efficiency) • Massive improvements in efficiency of numerosity reduction process
Results Experimental Work Experiments (Anytime Classification) • Etc
Discussion Summary Summary • 1NN-DTW is an excellent time series classifier • DTW is computationally expensive because of the number of pattern matches • DTW algorithm is at limits of optimisation • Improve speeds by reducing number of required matches • (Fast)AWARD adjusts the warping window with numerosity – increases accuracy • FastAWARD is several orders of magnitude faster than AWARD
Discussion Our Critique Our Critique • Two Patterns dataset seems cherry-picked • DTW model may necessitate bespoke pre-processing • RandomFixvsRankFix – very similar results • AWARD efficiency comparisons ignore initialisation effort and speed wasn’t compared to other methods (RT1, 2, 3) • Comparisons of r incomplete • Anytime classification experiments seem rigged in favour of AWARD
Discussion Our Critique Two Patterns dataset seems cherry-picked Fig. 3 Figs. 4,5,7 Reported comparisons Test sets (shown later)
Discussion Our Critique DTW model may necessitate bespoke pre-processing
Discussion Our Critique RandomFixvsRankFix - similar results
Discussion Our Critique AWARD efficiency comparisons ignore initialisation effort and speed wasn’t compared to other methods (RT1, 2, 3)
Discussion Our Critique Comparisons of r incomplete
Discussion Our Critique Anytime classification is rigged?
Q&A Thank You.