1 / 1

Time Series Representations

Time Series Representations. Normal Time Series. Details 2. Zoom in. Data Dictated. Model Based. Data Adaptive. Non Data Adaptive. Surprising Time Series. Winding. Dataset. A. B. C. (. The angular speed of reel 2. ). Singular Value Approximation. Sorted Coefficients.

kaiser
Télécharger la présentation

Time Series Representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time Series Representations Normal Time Series Details 2 Zoom in Data Dictated Model Based Data Adaptive Non Data Adaptive Surprising Time Series Winding Dataset A B C ( The angular speed of reel 2 ) Singular Value Approximation Sorted Coefficients Random Mappings Piecewise Aggregate Approximation Piecewise Polynomial Symbolic Trees Wavelets Spectral 0 50 0 1000 150 0 2000 2500 Overview Details 1 Piecewise Linear Approximation Adaptive Piecewise Constant Approximation Discrete Fourier Transform Discrete Cosine Transform Natural Language Strings Orthonormal Bi-Orthonormal Chebyshev Polynomials Symbolic Aggregate Approximation Non Lower Bounding Daubechies dbn n > 1 Interpolation Regression Haar Coiflets Symlets Hidden Markov Models Clipped Data Slope Based Statistical Models Value Based Grid Normal sequence Normal sequence Laughing and flailing hand Actor misses holster Briefly swings gun at target, but does not aim 0 100 200 300 400 500 600 700 NSF Career Award 0237918. IIS-0237918-001 University of California RiversideEamonn KeoghEfficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases To date, the vast majority of research on time series data mining has focused on similarity search, and to a lesser extent on clustering. We believe that these problems should now be regarded as essentially solved. In particular, there are now fast exact techniques for searching and clustering patterns under both the Euclidean distance and Dynamic Time Warping, the two most useful distance measures. However, from a knowledge discovery viewpoint, there are several important unsolved problems in time series data mining that are more interesting, important, and challenging. In this project, we are to addressing these problems. Our long-term goal is the creation of efficient algorithms to allow the extraction of knowledge in the form of patterns, anomalies, regularities and rules, from massive time series dataset • Our work has had a large impact on the state of the art of time series indexing and time series data mining. Some concrete examples include: • Our time series motif finding algorithm is being used to find video textures by Celly and Zordan, , and to find repeated motions in motion capture data by  Tanaka and Uehara . • Our time series anomaly detection algorithm is being used by the Aerospace Corp to monitor spacecraft telemetry, and by a joint Berkeley/Stanford group to monitor computer systems. A NASA white paper noted that "it has great promise for the future". • Our time series indexing technique (LB_Keoghindexing) has been expanded by us, and by many others, including groups that use it for Euclidean indexing, for subsequence matching, for indexing handwriting, for indexing multidimensional sequences, for indexing music and for indexing motion capture data. • Our papers in the area have been referenced more than a 1,000 times, see www.cs.ucr.edu/~eamonn/selected_publications.htm • We have being maintaining the UCR time series data mining archive, we have given test datasets and code to more than 400 research groups and individuals. Many high level representations of time series have been proposed for data mining. See the figure to the right for a hierarchy of all the various time series representations in the literature. One representation that the data mining community has not considered in detail is the discretization of the original data into symbolic strings. At first glance this seems a surprising oversight. There is an enormous wealth of existing algorithms and data structures that allow the efficient manipulations of strings. Such algorithms have received decades of attention in the text retrieval community, and more recent attention from the bioinformatics community. Some simple examples of “tools” that are not defined for real-valued sequences but are defined for symbolic approaches include hashing, Markov models, suffix trees, decision trees etc. The core of our contributions are based on a new symbolic representation of time series, called SAX (Symbolic Aggregate ApproXimation ). Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic representation that lower bound corresponding popular distance measures defined on the original data. As we have demonstrated, the latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation. …Indexing, Classification and Clustering (DMKD 2003) SAX has made contributions to… …Motif Discovery (SIGKDD 2003) • It is expected that this work will have a broad impact, because: • Time series are ubiquitous, occurring in virtually every human endeavor, including medicine, finance, entertainment etc. • The proposed work is very general, and has already made contributions to virtually every time series problem, including Visualization (VLDB 2004 and SIGKDD 2004a), Motif Discovery (SIGKDD 2003), Anomaly Detection (SIGKDD 2004a) and Indexing (DMKD 2003). …Visualization ( VLDB 2004 and SIGKDD 2004a) …Anomaly Detection (SIGKDD 2004b)

More Related