1 / 16

Exact Indexing of Dynamic Time Warping

Exact Indexing of Dynamic Time Warping. Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department. What are Time Series?. A time series is a collection of observations made sequentially in time.

luz
Télécharger la présentation

Exact Indexing of Dynamic Time Warping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department

  2. What are Time Series? • A time series is a collection of observations made sequentially in time. • Lots of useful information can be obtained by measuring time series data • over times. • Time series occur in virtually every medical, scientific and businesses • domain pattern that are commonly being classified • Finding out the similarity between two time series is the heart of many time series data mining applications

  3. What are the challenges of working with Time Series data? • large amount of data • different type of data format • How do we search quickly • subjective notion of similarity • How do we define similarity • Any solutions available? • We need a method that allows an elastic shifting of the time axis, to • accommodate sequences which are similar, but out of time phase • Euclidean Distance • most popular approach for defining similarity and indexing of time series data. • a very brittle distance approach which cannot index time series accurately among two different time phases. • Dynamic Time Warping • base on dynamic programming which proved to be a very reliable method. • does not obey the triangular inequality. This has resisted attempts at exact indexing. • “… performance on large database may be a limitation.”

  4. Times Series A Times Series B What are the challenges of working with Time Series data? cont. • Classification experiment on Cylinder-Bell-Funnel dataset • Training data consists of 10 exemplars from each class. • (One) Nearest Neighbor Algorithm. • “Leaving-one-out” evaluation, averaged over 100 runs. Comparison of two approaches • The result proved the reliability of DTW and motivates the necessity of introducing technique to index DTW

  5. two series in different time phase shifting of time axis optimal warping path two time series Q and C, length n and m respectively an (n*m) matrix is constructed to store the distance between items in Q and C. the result alignment What is Dynamic Time Warping ? • DTW is being used in different area like chemical engineering, pattern matching, bioinformatics, . . . What is Time Warping? Given: two sequences x1,x2,...,xn and y1,y2,...,ym Wanted: align two sequence base on a common time-axis Aligning time series with Dynamic Programming Matrix

  6. There're three basic constraints for time • warping • Boundary conditions • we want the path not to skip a part • at the beginning or ending of utterance • Continuity • no jumps • Monotonicity • - we can't go back in time What is Dynamic Time Warping? cont. • In the matrix, there are many warping paths that satisfy • the three basic constraints. Goal : How can we find a path that gives the minimal overall distance? formula of dynamic programming (i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) } Demonstration of computing the Minimal Editing Distance: http://isl.ira.uka.de/speechCourse/slides/dtw/editdist/applet/applet.html

  7. Q wk p j C w1 1 1 n i How to speed up the calculation of DTW? Basic idea: Approximate the time series with some compressed or down sampled representation, and do DTW on the new representation. Solution : Lower Bounding Measure with Global Path Constraint • What is Global Path Constraints ? • - path should be close to diagonal • in theory, it limits warping path by how far it may • stay from the diagonal • in practice, it constrains the range of indices in the • warping path • Whyusingglobalconstraints ? • - speed up the DTW distance calculation • (reduces the search effort from O(n2) to O(n)) • - to avoid a relatively small section of one sequence • maps onto a relatively large section of another • sequence. warping window

  8. How to speed up the calculation of DTW? cont. • What is Lower Bounding Measure ? • a dimensionality reduction technique • WhyusingLower Bounding Measure ? • - both Euclidean metric and DTW is either heavily I/O bound or very demanding in terms of CPU • time. • - a fast lower bounding function can address this problem by erasing sequences that could not • possibly be a best match. • How to define a good Lower Bounding Measure ? • A good lower bounds function should basically match two criteria • - must be fast to compute • - must produces a relatively tight lower bounds which means that it can more tightly • approximates the true DTW distance • Two existing type of Lower bounding measure [LB_Kim] [LB_Yi] squared different between two series' first (A), last (D), min (B) and max (C) sum of squared length of gray line represent the minimum the corresponding points contribution to the overall DTW

  9. How to speed up the calculation of DTW? cont. Proposed lower bounding measure : LB_Keogh with global constraint Notation A: bounding envelope - Sako-China Band (global constraint) B: bounding envelope - Itakura Parallelogram (global constraint) Q: original sequence U: Upper L: Lower [LB_Keogh] • Limiting the range of warping path by using global constraint • Approximating the tightest lower bound by using LB_Keogh Itakura Parallelogram and LB_Keogh together produces the tightest bounds LB_Keogh lower bound <= DTW bound squared sum of the distances from every part of the candidate sequence C not falling within the bounding envelope, to the nearest orthogonal edge of the bounding envelope is returned as the lower bound.

  10. How to index Dynamic Time Warping? Piecewise Constant Approximation (PAA) Basic idea: Represent the time series as a sequence of box basis functions. Each box is in same length • Reducing the time series from n dimensions to N dimensions, the data is divided into N equal sized “frames”. • Why using PAA ? • time series data may include hundreds to thousands items, this will rapidly degrade the performance of indexing. • 16 dimension time series will be reasonably handled by multi-dimensional index structure. • a way is needed to further reduce the dimension of lower bound by LB_Keogh • PAA is the most efficient technique among other approaches (Wavelets, Fourier Transforms, Adaptive Piecewise Constant Approximation) A sequence of length 256 is reduced to 16 dimensions

  11. How to index Dynamic Time Warping? cont. Modified PAA to index time warped queries [LB_PAA] • there are two time series data sets (Q and C) in length n, both are being divided into N dimension. C is a candidate sequence. Q is a query sequence. • approximate the minimum bounding rectangle (R) in each dimension of candidate sequence C • approximate the max (U^) and min (L^) point in each dimension of query sequence Q • by using LB_PAA

  12. l4 l5 L^2 original dimension h2 reduced dimension U^4 U^5 How to index Dynamic Time Warping? cont. Modified PAA to index time warped queries • define a MINDIST(Q,R) function that returns a lower bounding measure of the distance between a query Q, and R, were R is a Minimum Bounding Rectangle (MBR) of C.

  13. How to search time series with DTW ? K-Nearest Neighbor Search Algorithm • What is K-NN Search - KNNSearch(Q,K)? • query sequence Q and desired number of K time series neighbors from a set C • priority queue is being used for storing the index in an increasing order of distance RangeSearch Algorithm • What is RangeSearch Algorithm - RangeSearch(Q,E,T)? • answering a range queries • a classic R-tree-style recursive search algorithm

  14. Experimental Evaluation • Evaluation among three lower bounding measures (LB_Kim, LB_Yi, LB_Keogh) • Comparing tightness of lower bound against • query length • Comparing pruning power against database size • Evaluation between linear scan and LB_Keogh • Comparing normalized CPU cost against data size

  15. Conclusion • This paper override the traditional believe of "dynamic time warping ...cannot be speeded up by indexing • However, it based on two assumption • - both time series data are in the same length • (out of time phase is allowed) • - index sequence when warping path is constrained • (Boundary conditions, Continuity, Monotonicity, Global constraint) • The proposed approach is state of the art in terms of efficiency and flexibility. It may benefit the matching of 2 and 3 dimensional shapes.

  16. Acknowledge • Dr Eamonnn Keogh • (Computer Science & Engineering Department, University of California – Riverside, Riverside,CA 92521) • Exact Indexing of Dynamic time Warping • A Tutorial on Indexing and Mining Time Series Data • http://www.cs.ucr.edu/~eamonn/tutorial_on_time_series.ppt • Carnegie Mellon University • Automatic Speech Recognition • http://werner.ira.uka.de/speechCourse

More Related