220 likes | 224 Vues
Summarizing Sequential Data with Closed Partial Orders. Gemma Casas-Garriga Proceedings of the SIAM International Conference on Data Mining (SDM'05) Advisor : Jia-Ling Koh Speaker : Chun-Wei Hsieh 03/10/2006. Introduction.
E N D
Summarizing Sequential Data with Closed Partial Orders Gemma Casas-Garriga Proceedings of the SIAM International Conference on Data Mining (SDM'05) Advisor:Jia-Ling Koh Speaker:Chun-Wei Hsieh 03/10/2006
Introduction • Closed patterns is a compact and significative set • The number of closed patterns may be still quite large • Summarizing closed patterns with post-processing
<(A)(C)(C)(C)(A)>,<(C)(A)(C)(C)(A)> Which is better than the other ? Motivation
Main steps • Grouping Closed Sequential Patterns • Obtaining Closed Partial Orders
Grouping Closed Sequential Patterns • A valid pair (S, T ) • S ⊆CS is a nonredundant set of closed sequences, whose tid lists are at least T • T ⊆ D is the maximal set of transactions where all s ∈ S are contained.
<(C)(A)(C)>? • The naive way may miss some element • Ex: <(C)(A)(C)> Grouping Closed Sequential Patterns • A naive approach is to group closed sequences with the same tid list
Grouping Closed Sequential Patterns • Let (S, T ) be a valid pair, then we have that S = t • for all s ∈ S we have that tid(s) is at least T • It has to use the transactions of the database
(S′, T ′) (S, T ) Grouping Closed Sequential Patterns • Given two valid pairs (S′, T′) and (S, T ), if T ⊆ T′ then for all s′∈ S′ there exists s ∈ S s.t. s′⊆ s.
Obtaining Closed Partial Orders • obtain a compact representation from each valid pair (S, T ) • A partial order can be modelled as a triple p = (V,E, l)
Obtaining Closed Partial Orders • Given a set of sequences S and let s, s′ ∈ S be two sequences s = , = • if − = ; and, − head (s, I ) ⋄ tail ( , j + 1) ⊆ , for some ∈ S; and, − head ( , j ) ⋄ tail ( s , i + 1) ⊆ , for some ∈ S. then that position i of s matches with position j of ; note it by p[i] ∼ q[j].
CCCA ACACCA ACC CA CACCA CAC CA ACCCA Obtaining Closed Partial Orders • S={<(A)(C)(C)(C)(A)>,<(C)(A)(C)(C)(A)>} AC CCA C ACCA
Obtaining Closed Partial Orders • Using the transitivity property to improve the algorithm • Transitivity: Given a valid pair (S, T ) let s, , ∈ S, if s[i] ∼ [j] and [j] ∼ [k], then s[i] ∼ [k].
Experiment • 3 different sequential database • Synthetic data (1000 transactions) • The command history of a unix computer user (607 transactions) • The first chapter of the book “1984” by George Orwell (340 transactions)