1 / 19

Mining Time-Series Databases

Mining Time-Series Databases. Mohamed G. Elfeky. Introduction. A Time-Series Database is a database that contains data for each point in time. Examples: Weather Data Stock Prices. What to Mine?. Full Periodic Patterns

jeneil
Télécharger la présentation

Mining Time-Series Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Time-Series Databases Mohamed G. Elfeky

  2. Introduction • A Time-Series Database is a database that contains data for each point in time. • Examples: • Weather Data • Stock Prices

  3. What to Mine? • Full Periodic Patterns • Every point in time contributes to the cyclic behavior of the time-series for each period. • e.g., describing the weekly stock prices pattern considering all the days of the week. • Partial Periodic Patterns • Describing the behavior of the time-series at some but not all points in time. • e.g., discovering that the stock prices are high every Saturday and small every Tuesday.

  4. Mining Partial Periodic Patterns • Problem Definition • Methods • Apriori • Max-Subpattern Hit Set Jiawei Han, Guozhu Dong, and Yiwen Yin – ICDE98

  5. Problem Definition • The time-series is: S = D1 D2 … Dn • A pattern is: s = s1 … sp over the set of features L and the letter *. • |s| = p is the period of the pattern s. • L-length of s is the number of si that is not *. • If s has L-length j, it is called a j-pattern. • A subpattern of s is: s’ = s’1 … s’psuch that for each position i: s’iis a * or subset of si.

  6. Problem Definition (Cont.) • Each segment of the form Di|s|+1 … Di|s|+|s|is called a period segment. • A period segment matchess if for each position j, either sjis * or subset of Di|s|+j. • The frequency count of s in a time-series S is the number of period segments of S that matches s. • The confidence of s is defined as the division of its frequency count by the maximum number of periods of length |s| in S. • A pattern is called frequent if its confidence not less than a minimum threshold.

  7. Problem Definition (Example) • The pattern: a*{a,c}de is of length 5 and of L-length 4 and so it is called 4-pattern. • The patterns: a*{a,c}** and **cde are subpatterns of the above pattern. • In the series a{b,c}baebaced, the pattern: a*b, whose period is 3, has frequency count 2. Its confidence is 2/3 where 3 is the maximum number of periods of length 3.

  8. Apriori Method • Apriori Property: Each subpattern of a frequent pattern of period p is itself a frequent pattern of period p. • Method: • Find F1, the set of frequent 1-patterns of period p. • Find all frequent i-patterns of period p, for i from 2 to p, based on the idea of Apriori, and terminate when the candidate i-pattern set is empty.

  9. Max-Subpattern Hit Set Method • Definitions • Algorithm • Implementation Data Structure

  10. Definitions • A candidate max-patternCmax is the maximal pattern which can be generated from F1 (the set of frequent 1-patterns). • Example: • If F1 = {a***, *b** , *c** , **d*}, • Then Cmax = a{b,c}d*

  11. Definitions (Cont.) • A subpattern of Cmax is hit in a period segment Si if it is the maximal subpattern of Cmaxin Si. • Example: • For Cmax = a{b,c}d* and Si = a{b,c}ce, • The hit subpattern is: a{b,c}** • The hit setH is the set of all hit subpatterns of Cmax in S.

  12. Algorithm • Scan S once to find F1 and form the candidate max-pattern Cmax. • Scan S again, and for each period segment, add its max-subpattern to the hit set setting its count to 1 if it is not exist, or increase its count by 1. • Derive the frequent patterns from the hit set.

  13. Implementation Data Structure Max-Subpattern Tree • The root node is: Cmax. • A child node is a subpattern of the parent node with one non-* letter missing. The link is labeled by this letter. • A node containing only 2 non-* letters have no children since they are already in F1. • Each node has a count field which registers its number of hits.

  14. 10 d a b c 0 50 40 32 acd* abd* a{b,c}** *{b,c}d* a d a d b b c b b c d a 2 18 8 0 5 19 *bd* *{b,c}** a*d* ac** ab** *cd* Max-Subpattern Tree (Example) a{b,c}d*

  15. Max-Subpattern Tree (Construction) • Finding w the max-subpattern in the current segment. • Search for w in the tree, starting from the root and following the path corresponds to the missing non-* letters in order. • If the node w is found, increase its count by 1. Otherwise, create a new node w (with count 1) and its missing ancestors in the followed path (with count 0).

  16. Max-Subpattern Tree (Construction) *cd* 0 a{b,c}d* a 0 *{b,c}d* b 1 *cd*

  17. Max-Subpattern Tree (Traversal) • After the second scan, the tree will contain all the max subpatterns of the time-series. • Now the tree must be traversed to compute the confidence value of each subpattern.

  18. Max-Subpattern Tree (Traversal) • The frequency count of each node is the sum of its count and those of all its reachable ancestors. • For Example: • The frequency count of *cd* is 78. • The frequency count of a*d* is 105.

  19. 10 d a b c 0 50 40 32 acd* abd* a{b,c}** *{b,c}d* a d a d b b c b b c d a 2 18 8 0 5 19 *bd* *{b,c}** a*d* ac** ab** *cd* Max-Subpattern Tree (Example) a{b,c}d*

More Related