1 / 27

Temporal Data Mining

Temporal Data Mining. Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu. Outline. What is Data Mining? Formal Problem Definition TAG (Timed Automaton with Granularity) A Naive Solution Techniques for Improving Performance

giolla
Télécharger la présentation

Temporal Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

  2. Outline • What is Data Mining? • Formal Problem Definition • TAG (Timed Automaton with Granularity) • A Naive Solution • Techniques for Improving Performance • Experimental Results

  3. What is Data Mining • Data Mining A non-trivial extraction of implicit, previously unknown & potentially useful information from data • Common Data Mining Techniques • association-rule mining • Sequential mining (Temporal mining) • Clustering • Classification • Outlier detection

  4. Temporal Data Mining • Finding time-related frequent patterns (frequent sub-sequences) • which pairs of events occur frequently one week after another • A simple example: user may be interested in finding all those events that frequently follow within 2 business days of a rise of the IBM stock price.

  5. Definition • Event Type (E): e.g.deposit to an account e.g. price increase of a specific stock • Event e: An event eis a pair e=(E, t), where E is an event type and t is a positive integer, called the timestampof e . • Event Sequence An Event Sequencea finite set of events. Each event (E, t) appearing in an event sequence represents the occurrence of event type E at time t.

  6. Granularity • Granularity is a mappingμfrom the set of the positive integers to subset of the time domain such that for all positive integers i and j with i<j: (1) implies that each number in i is less than all the numbers in j, and (2) implies . • Example: year, month, week, day, business-day, business-week etc.

  7. TCG • A temporal constraint with granularity (TCG) [m,n] is a binary relation on positive integers. For positive integers t1 and t2, (t1, t2) satisfies [m,n]  iff (1) t1  t2 (2) and are both defined, and (3) • Example: TCG[0,0]day, [0,2]hour, [1,1]month

  8. Event Structure • An event structure (with granularities)is a rooted directed acyclic graph (W,A,Γ), where W is a finite set of event variables, A  W  W andΓ is a mapping from A to the finite set of TCGs. • Complex event typederived from S each variable associated with a specific event type. • Complex event matching S each variable associated with a distinct event such that the event timestamps satisfy the time constraints.

  9. Example of Event Structure Assign the event types for x0 ,x1, x2, x3, to be IBM-rise, IBM-earnings-report, HP-rise, and IBM-fall, respectively, we have a complex event type. This complex event type describes that the IBM earnings were reported one business day after the IBM stock rose, and in the same or the next week the IBM stock fell; while the HP stock rose within 5 business days after the same rise of the IBM stock and within 8 hours before the same fall of the IBM stock. [1,1]b-day [0,1]week [0,8]hours [0,5]b-day Figure 1: An event structure

  10. Formal Problem Definition • An event-mining problem is a quadruple (S, , E0 , ), where S is an event structure,  is the minimum confidence value, E0 an event type, and  is a partial mapping which assigns a set of event types to some of the variables (expect root). • An event-mining problem is the problem of finding all complex event types such that each occurs frequently in the input sequence and is derived from S by assigning E to the root and a specific event type to each of the other variables. • Example (S, 0.8, IBM-rise, )

  11. TAG • Timed Automaton with Granularities • A basic component to test if a candidate complex event type appears frequent in a time sequence. • A timed automaton with granularities is a 6-tuple , S, S0, C, T, F), where (1)  is a finite set of input letters, (2) S is a finite set of states, (3) S0  S is a set of start states, (4) C is a finite set of clocks, (5) T  S  S    2C (C) is a set of transitions, (6) F  S is a set of accepting states.

  12. TAG • (C) is the set of all the formulas called clock constraints. • A transition (s, s’, e, , ) represents a transition from state s to state s’ on input symbol e. the set   C gives the clocks to be reset with this transition. And  is a clock constraint over C. • Is essentially standard finite automata with some modifications. • Each TAG maintains a set of clocks. • Both input symbol and clock determine the next state. • A run is an accepting run if the last state is in the set F. An event sequence is accepted by a TAG if there exists an accepting run.

  13. A Naïve Solution • Consider all the event types that occur in the given event sequence, and consider all the complex types derived from the given event structure, one from each assignment of these event types to the variables. Each of these complex types is called a candidate complex typefor the event-mining problem. • For each candidate complex type, start the corresponding TAG at every occurrence of E0. That is, for each occurrence of E0 in the event structure, use the rest of the event sequence as the input to one copy of the TAG. By counting the number of TAGs reaching a final state, versus the number of occurrences of E0 , all the solutions of the event-mining problem will be derived. • The number of candidate types is exponential in the number of event types occurring in the event structure. Too costly.

  14. Techniques to improve performance The performance of this algorithm can be improved by: • identifying the possible inconsistencies in the given event structure before starting the process, • reducing the length of the sequence, • reducing the number of times an automaton has to be started, • reducing the number of different automata to be started, • applying the naïve algorithm.

  15. Recognition of Inconsistent Event Structures • A event structure is consistent if there exists a complex event that matches that event structure. • If an event structure is inconsistent, it should be discarded even before the mining process starts. • It is difficult to determine the consistency of event structures. • Use approximated polynomial algorithms to check the consistency of event structures.

  16. Recognition of Inconsistent Event Structures • If one of the constraints implied by the given ones is the “empty” one, i.e. unsatisfiable, the whole event structure is inconsistent. • A TCG [m’, n’] is logically implied by a TCG [m, n]  if each pair (x, y) satisfying the second constraint, satisfies also the first one. • For example, a TCG [1,2]b-week can be converted into [3,18]day or [0,1]month, while it cannot be converted into [2,3]week-end or [1,3]week, since the resulting constraints are not implied by [1,2]b-week.

  17. Reduction of the Event Sequence We can reduce the event sequence by • exploiting the granularities. • For example, if a discovery problem is defined on the sub-structure excluding variable x3, the input event sequence can be reduced discarding any event that does not occur in a business day.

  18. Reduction of the occurrences of the root • The basic idea is to remove those occurrences of reference types which cannot be the root of a complex event matching the given structure. • It is possible that for some occurrences of the reference types in the sequence, a constraint is unsatisfiable. • Consider all the non-empty sets of explicit and implicit constraints on the pair of the root and each non-root node. Check if one of the constraints cannot be satisfied. • For example, if no event occurs in the sequence in the next business day of an IBM-rise event, this particular reference event can be discarded. (No automaton is started for it.)

  19. Reduction of the occurrences of the root • LetN be the number of occurrences of the reference event type in the sequence. • Let N’ be the number of occurrences of reference events for which one of the constraints is unsatisfiable. These are reference events that are certainly not the root of a complex event satisfying the given event structure. • If N’/N ≤1-, there cannot be any frequent complex event type and the empty set should be returned to the user. • Otherwise, remove these occurrences of the reference type and modify  into ’= ( *N) / (N- N’) .

  20. Reduction of the Candidate Type • Based on the property: if a complex event type occurs frequently, then any of its sub-type should also occur frequently. • In other words, if one assignment to two variables is not frequent, any candidate complex event type including this assignment won’t be frequent. So we can remove these complex event type from the candidate complex event type. • For each subset W’ of W, the induced approximated sub-structure of W’ is (W’, A’, Γ’), where A’ consists of all pairs (X, Y)  W’  W’, such that there is a path from X to Y in S and there is at least one constraint on (X,Y).

  21. Reduction of the Candidate Type • To find the solutions to the induced discovery problems is rather straightforward and simple in time complexity. Indeed, the induced sub-structure gives the distance from the root to the variable (in effect, two distances, namely the minimum distance and the maximum distance). • For each occurrence of E0 , this distance translates into a window, i.e., a period of time during which the event for X must appear. • Extend the sub-structure to more than one non-root variable. These variable form a chain in S.

  22. [0,2]b-day [1,2]b-day [0,0]b-week X1 X3 X0 X2 The event structure used in the experiment Experimental Results • Closing prices of 439 stocks for 517 trading days • Price changes are partitioned into 7 categories: (- , -5%), (-5%, -3%), (-3%, 0), (0, 0), (0, 3%), (3%, 5%), (5%, ) • Total number of event types is 2978. The number of event is 181089. • The reference event type X0: the drop of IBM stock of less than 3%. Minimum confidence value is 0.7. There is no other assignment to other variables.

  23. Experimental Results cont.

  24. Experimental Results cont. • This experiment focuses on Step 4, namely reduction of the candidate complex event types by using sub-structures. • The result shows that after using heuristics the number of candidate complex event types reduces significantly.

  25. Experimental Results cont. The two frequent event combinations discovered in the experiment

  26. References • C. Bettini, Wang, X.S., Jajodia, S. and Jia-Ling, L. "Discovering Temporal Relationships with Multiple Granularities in Time Sequences". IEEE Transations on Knowledge and Data Engineering, Vol. 10 (2), 1998. • C. Bettini, X. Wang, and S. Jajodia. A General Framework for Time Granularity and its Application to Temporal Reasoning. Annals of Mathematics and Artificial Intelligence, Vol. 22 (1-2), pages 29-58, Baltzer Science Publishers, 1998. • C. Bettini, X. S. Wang, and S. Jajodia. Testing complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings of the Fifteenth ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems (PODS'96), pages 68-78, Montreal, Canada, June 1996 • C. Bettini, X. Sean Wang, and S. Jajodia. Mining temporal relationships with multiple granularities in time sequences. Data Engineering Bulletin, 21:32--38, 1998.

  27. Thank you Question?

More Related