200 likes | 330 Vues
This document presents a comprehensive study on directed mining of sequential patterns in sequence data, authored by Valery Guralnik, Duminda Wijesekera, and Jaideep Srivastava, and presented by Jyothsna R. Nayak. It explores sequential patterns, the data structures and algorithms involved, and their experimental evaluation. The study focuses on the optimization of Sequential Pattern Trees (SP-Trees) and introduces a language for specifying episodes. The results showcase a robust, flexible, and efficient approach to handling complex patterns in sequential data.
E N D
Pattern Directed Mining Of Sequence Data By Valery Guralnik, Duminda Wijesekera, Jaideep Srivastava Presenter : Jyothsna R Nayak
contents • Introduction • Sequential Patterns • Data Structure and Algorithm • Experimental Evaluation • SP Tree Optimization • Conclusions • References
Introduction • Sequence data • event has an associated time of occurrence • Episode is a collection of events • Frequent Episodes : Episodes occurring with a frequency above a certain threshold
Steps involved in mining of frequent episodes • Present a language for specifying episodes of interest • Describe a data structure: Sequential Pattern Tree • Mining algorithm to generate frequent episodes • Optimize SP Tree
Sequential Patterns • Pattern language • A = {A1,A2,….Am} • D1, D2,…,Dm = Domains • e over A is a (m + 2) tuple(a1, a2,..,am, tbeg, tend)
Example of Events in the Stock Market Domain Activenes Event ID Date Comp Type Comp Name Movement Volatility e1 Low 01/02/91 Computer Microsoft Down High e2 Medium 01/03/91 Computer Microsoft Up High e3 High 01/02/91 Computer Low Microsoft NoMovmt e4 01/03/91 Computer Down High High Microsoft
Definitions • Ordering Constraint • Serial Occurrence e -> f , e.tend < f.tbegin • Parrallel Occurrence (e || f) • Attribute constraint • Selection Constraint e.type = ‘computer’ • Join Constraint e.name = f.name
Event specification • Partial specifications e[(e.type = ‘computer’ v e.type = ‘electronic’) ^ e.movement_direction = ‘down’] • comparing some characteristics e[e.movement_direction = ‘up’] -> [e.name = f.name] f[f. movement_direction = ‘down’]
Data Structure • Leaf node represents an event • An interior node represents an ordering constraint • If is an ordering constraint labeling some interior node, and if e and f are the left and right children of that node then e f is a sequential pattern. • Associated with each node is a table of matching events • Attached to each node is a Boolean expression tree representing attribute constraints . .
SP Tree Matching episodes Matching events Matching events = e f = e.name f.name = e.mvmt up f.mvmt down SP Tree for e[e.mvmt = ‘up’] -> [e.name = f.name]f[f.mvmt = ‘down’] User specified pattern
Bottom-up algorithm Intialize queue Q to empty for (each leaf 1 in T) do begin generate events from S that match constraints of 1 if(the parent p of 1 is not ready in Q) then put p in Q end While (Q is not empty) do begin Remove node n from Q Generate_Events(n) if(for n’s parent p another child was processed) then put p in Q end
Generate-events Algorithm • for(each episode e from left child l of n) do begin for (each episode f from right child r of n) do begin if(node n is serial) then if(e.tend >= f.tbegin) then continue if(events in e and f match the join constraint) then form new episode g from events from e and f end end
Experimental evaluation • Results • window size variation • data set size • number of event specifications • attribute constraints
Time in Secs Window Size in Days Minimum Frequency = 0.8
Time in Secs Number of Event specifications Minimum Frequency = 0.8 Window size = 11
Time in Secs Number of constraints Minimum Frequency = 0.8 window size = 5
Time in Secs Number of Events in Data sets Minimum Frequency = 0.7 Window size = 5
SP Tree Optimization • If two event nodes represent the same event, then only one of the nodes can be used. • If two ordering nodes have the same join constraints, and they both have the left and right children representing the same events then one such node is sufficient.
Conclusions Approach is • Robust • Flexible • Efficient • Complex pattern • Good performance
References • Discovering frequent episodes in sequences by Mannila. H., Toivonen, H and Verkamo • Agarwal, R., and Srikanth “Mining sequential patterns” • Mannila. H., Toivonen, H “ Discovering generalised episodes using minimal occurences • Agarwal, R., and Srikanth”Mining generalised association rules