Efficient Retrieval of Time-Series Patterns from Databases

General problem Retrieval of time-series similar to a givenpattern.

Example: Stock charts Database of time-series

Example: Stock charts Database of time-series Pattern

Example: Stock charts Database of time-series Pattern Retrieval results

Example: Stock charts Database of time-series Pattern Retrieval results .92 .87 .86 .84

Example: Electrocardiogram Database of time-series

Example: Electrocardiogram Database of time-series Pattern

Example: Electrocardiogram Database of time-series Pattern Retrieval results .91 .87 .98 1.0

Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions

Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions } Contributions

Criteria for retrieval methods Gunopulos [2000]: • Work for erratic time-series • Accept any pattern • Find inexact matches • Work when some points are missing • Work on streaming data

Previous work • Feature choice • Similarity metrics • Indexing and retrieval

Previous work: Feature choice • Discrete Fourier transforms • Alphabets • Statistical features • Subsets of points

Previous work: Similarity metrics • Euclidean distance • Bounding rectangles • Envelope count • Aggregate similarity

Previous work: Indexing and retrieval • Advanced techniques: • B-trees • R-trees • KD-trees • VP-trees • Grids • Applied techniques: • Linear search with compression

Important points Choose “important” maxima and minima, and discard the other points.

Important points Choose “important” maxima and minima, and discard the other points. Example: Original series

Important points Choose “important” maxima and minima, and discard the other points. Example: Compressed series Original series

Definition of important points Important minimum

Definition of important points Important minimum • am is the minimum among ai,…, aj

Definition of important points Important minimum • am is the minimum among ai,…, aj • ai/am  R andaj/am  R

Definition of important points Important minimum • am is the minimum among ai,…, aj • ai/am  R andaj/am  R • R is a knob that determines compression rate

Definition of important points Important maximum • am is the maximum among ai,…, aj • am/ai  R andam/aj  R • R is a knob that determines compression rate

Compression example Originalseries

Compression example Originalseries Compressed series

Compression algorithm • Linear time • Constant memory • Accepts streaming data • For a series with n values, compression time is 0.0133 n milliseconds (300 MHz PC, Visual Basic 6.0).

Retrieval • Retrieval of time-series similar to a given pattern. • Intuition: • Find a prominent feature in the pattern • Find candidate segments with a similar feature • Compare similarity of candidates to the pattern

Example: Stock charts Database of time-series

Example: Stock charts Database of time-series Pattern

Example: Stock charts Database of time-series Pattern Retrieval results .92 .87 .86 .84

Algorithm • Identify the prominent leg in the pattern • Retrieve similar legs from the database • Identify corresponding candidate segments • For each candidate segment, compute its similarity to the pattern • Output the candidates whose similarity is above the threshold

Important details • Use compressed pattern and compressed sequences in the retrieval process • The prominent feature is the leg having the greatest ratio of right end to left end • All legs in the database are indexed by their prominence, using a binary search tree

Alternative versions • Different prominence definitions • Different similarity metrics • The end-point ratio prominence usually gives the best empirical results.

Extended legs Similar sequence

Indexing on extended legs • Advantage: More accurate retrieval • Disadvantage: Larger index, more memory • If a compressed sequence has n legs: • Worst case: n2/2 extended legs • Average case: (n  lg n) extended legs

Data sets • Stock charts • Air and sea temperatures • Wind speeds • Electroencephalograms • Electrocardiograms

Data sets • Stock charts • Air and sea temperatures • Wind speeds • Electroencephalograms • Electrocardiograms 60,000 points 445,000 points 79,000 points 17,000 points 2,000 points

Patterns Compressed patterns with 4 to 27 legs Examples:

Retrieval time Retrieval time: 0.07  m k milliseconds m legs in a pattern k candidates

Efficient Retrieval of Time-Series Patterns from Databases

Efficient Retrieval of Time-Series Patterns from Databases

Presentation Transcript

The Byzantine General Problem

שיעור Lisp General Problem Solver

Teaching tool for the byzantine general problem

General Problem Solving

General problem

A General Problem …

Diabetic Foot Problem in Dr. Sardjito General Hospital

Problem: General Name Transcription Improvement

Data Evolution in General Formal Problem Statement

Introduction to GAMS: Formulation of a general problem

The problem of an elusive general will

General structure of a CV problem

THE GENERAL PROBLEM-SOLVING PROCESS

Contents Statement of the Problem : General Trends

General Procedural Model for Problem Solving

The Two-Body Problem in General Relativity

Managing Nonresponse as a General Methodological Problem

General Management Problem Endless Education/tutorialoutletdotcom

GENERAL PROBLEM SOLVING WITH SEARCH ALGORITHMS

Hip pain, a general problem in old age