1 / 33

CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data

CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data. Yi-Cheng Chen, Wen- Chih Peng and Suh -Yin Lee ICDM 2011. Outlines. Motivation Preliminaries Endpoint representation CEMiner algorithm Experimental result Conclusion. Motivation.

petula
Télécharger la présentation

CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CEMiner – An Efficient Algorithm for Mining Closed Patterns fromTime Interval-based Data Yi-Cheng Chen, Wen-ChihPeng and Suh-Yin Lee ICDM2011

  2. Outlines • Motivation • Preliminaries • Endpoint representation • CEMiner algorithm • Experimental result • Conclusion

  3. Motivation • Existing studies only focus on mining closed sequential patterns from time point-based data.

  4. Cont. • In this paper, we discuss and design an efficient method to discover closed temporal patterns from interval-based data. • Three contributions: • We simplify the processing of complex relations. i.e., only “before”, “after” and “equal.” • Endpoint representation • A novel algorithm, CEMiner(Closed Endpoint Temporal Miner).

  5. Preliminaries • Definition 1. Event interval and event sequence • E= {e1, e2,…, ek} be the set of event symbols: {A, B, C, D, E } • The triplet (ei , si, fi) is an event interval : (A , 2 , 7) • An event sequence is a series of event interval triplets : <(A, 2 , 7), (B, 5, 10), …, (E, 18 , 20)>.

  6. Cont. • Definition 2. Temporal database • Database DB = {r1, r2, …, rm}, each record ri , consists of a sequence-id, SID and an event. • DB is called a temporal database.

  7. Endpoint representation • When describing relationships among more than three events, Allen’s temporal logics may suffer several problems. • A suitable representation is very important for describing a temporal pattern. • A new expression, endpoint representation is proposed to address the ambiguous and scalable problem.

  8. Cont. • Definition 3. Endpoint sequence • event sequence q= <( A , 2 , 7 ), ( B , 5 , 10 ), ( C , 5 , 12 ), ( D , 16 , 22 ), ( E , 18 , 20 )> • Tq = { 2 ,7 ,5 ,10 ,5 ,12 ,16 ,22 ,18 ,20 } • endpoint sequence : qe = <2 ,5 ,5 ,7 ,10 ,12 ,16 ,18 ,20 ,22> • endpoint representation : <>

  9. Cont. • The endpoint representation has several benefits : • Scalability • Nonambiguity • Simplicity

  10. CEMineralgorithm • CEMiner (standing for Closed Endpoint temporal Miner) utilizes the arrangement of endpoints to accomplish the closed temporal pattern mining. • Closure Checking • subsequence & supersequence • Ex. Given two sequences = <A, B, C>,𝛽 = <A, D, B, C, E>, we say is a subsequence of 𝛽, and 𝛽 is a supersequence of.

  11. Cont. • Definition 4. Closed temporal pattern • CTP = {( 𝛼 ∈ TP ) ˄ ( ∄𝛽 ∈ TP ) such that (𝛼 ⊆ β) ∧ ( support (𝛼) = support (𝛽) )} • Given two sequence 𝛼and 𝛽 • If 𝛼is a closed temporal pattern, • 𝛼is a temporal pattern and • there doesn’t exist a supersequence𝛽 and support (𝛼) = support (𝛽).

  12. Cont. • Ex. • min_sup = 2 • The endpoint sequence = <> is a temporal pattern but not a closed temporal pattern. • Because<> ⊂ <> and both support = 2.

  13. Cont. • Closure Checking • To verify a new closed temporal pattern p, we require checking whether p is a sub-sequence or super-sequence of an existing temporal pattern p’ and the projected database of p and p’ is equal. • This paper borrow BI-Directional Extension [WH04] to check patterns’ closure. • Forward-extension • Backward-extension

  14. Cont. • Definition 5. Forward-extension and backward-extension • If = <> is non-closed, there must exist at least one endpoint x, which can be used to extend to a new endpoint sequence ’, support () = support (’). • can be extended in five ways: (1)’=〈〉 (2)’=〈〉 • 𝛼’ a forward-extension sequence (3)’=〈〉 (4)’=〈〉 (5)’=〈〉 • ’ backward-extension sequence

  15. Cont. • If there exists no forward-extension endpoint nor backward-extension , 𝛼must be a closed endpoint sequence. • The CEMinerchecks closure in two directions as follows, • Forward directional checking • Backward directional checking

  16. Cont. • Definition First instance of a prefix sequence • Ex. • The first instance of the prefix sequence ABin sequence CAABCis CAAB.

  17. Cont. • Definition 6. The i-th last-in-first appearance • Ex. • 〈ABAB(AB)(AB) 〉 • p =〈〉 1. The last-in-first appearance w.r.t. prefix p in? (1) 1≤ i < n,n=4,i=2 first instance :〈ABAB(AB)(AB) 〉 2. The last-in-first appearance w.r.t. prefix p in? (2) i = n, i = n = 4 first instance :〈ABAB(AB)(AB) 〉

  18. Cont. • Definition 7. The i-th semi-maximum period • Ex. • 〈ABAB(AB)(AB) 〉 • p =〈〉 1. semi-maximum period of prefix p in (1) i =1 , before the last-in-first appearance : 〈ABAB(AB)(AB) 〉 2. semi-maximum period of prefix p in (2) 1< i ≤n, n=4, i=2 a. end of the first instance of 〈〉:〈AB〉 b. the 2-th last-in-first appearance w.r.t p: B 〈ABAB(AB)(AB) 〉

  19. Cont. • EbackScan search • Let an endpoint sequence, if there exists i, 1 ≤ i ≤ n and there exists an endpoint x which appears in each of the i-th semi-maximum periods of the prefixin database. • We can derive a new endpoint sequenceand we can stop growing the endpoint sequence . • Ex. • Prefix sequence p = <A, C> • B is the 2nd semi-max. period of the prefix p in database • We can derive a new prefix sequence p’ = <A, B, C>

  20. CEMiner Algorithm • We use three pruning strategies to reduce the searching space efficiently and effectively. • (1)pre-pruning • (2) post-pruning • (3) pair-pruning

  21. CEMinerAlgo.

  22. CEMinerAlgo.

  23. CEMinerAlgo.

  24. CEMinerAlgo.

  25. CEMinerAlgo. • Pair-pruning: • If the endpoint is a starting endpoint, we can omit the closure checking. • Because the starting endpoint and finishing endpoint always occur in pairs in an endpoint sequence.

  26. CEMinerAlgo. • Ex. • Prefix p =<> • Endpoint B+ is a backward-extension endpoint of p. • So we can stop growingp.

  27. CEMinerAlgo.

  28. CEMinerAlgo.

  29. CEMinerAlgo. • Pre-pruning: • If y is finishing endpoint and it has corresponding starting endpoint in.

  30. CEMinerAlgo. • Post-pruning: • A finish point is called significant, if it has a corresponding starting endpoint in projected postfix or in.

  31. Cont.

  32. Experimental result

  33. Conclusion • We develop an efficient algorithm, CEMiner, to discover closed temporal patterns without candidate generation, based on proposed endpoint representation. • The algorithm further employs three pruning methods to reduce the search space effectively.

More Related