1 / 18

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis). Sequential Pattern Mining. Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business. Sequential Patterns. Given: A Transaction Database { cid, tid, date, item }

yoko
Télécharger la présentation

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACCTG 6910Building Enterprise & Business Intelligence Systems(e.bis) Sequential Pattern Mining Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

  2. Sequential Patterns Given: A Transaction Database { cid, tid, date, item } Find: inter-transaction patterns among customers Example: customers typically rent “ Star Wars”, then “Empire Strikes Back” and then “Return of the Jedi”

  3. Sequential Patterns cid tid date item 1 1 01/01/2000 30 1 2 01/02/2000 90 2 3 01/01/2000 40,70 2 4 01/02/2000 30 2 5 01/03/2000 40,60,70 3 6 01/01/2000 30,50,70 4 7 01/01/2000 30 4 8 01/02/2000 40,70 4 9 01/03/2000 90 5 10 01/01/2000 90

  4. Sequential Patterns Itemset : is a non-empty set of items, e.g., {30} , {40, 70}. Sequence: is an ordered list of itemsets, e.g. <{30} {40,70}> , <{40,70} {30} >. Size of sequence is the number of itemsets in that sequence.

  5. Sequential Patterns cid tid date item 1 1 01/01/2000 30 1 2 01/02/2000 90 2 3 01/01/2000 40,70 2 4 01/02/2000 30 2 5 01/03/2000 40,60,70 3 6 01/01/2000 30,50,70 4 7 01/01/2000 30 4 8 01/02/2000 40,70 4 9 01/03/2000 90 5 10 01/01/2000 90 Each transaction of a customer can be viewed as an itemset A customer’s sequences contains the customer’s ordered itemsets

  6. Sequential Patterns cid customer sequence 1 <{30} {90} > 2 <{40,70} {30} {40,60,70}> 3 <{30,50,70}> 4 <{30} {40,70} {90}> 5 <{90}>

  7. Sequential Patterns Sequence <a1 a2 ….an> is contained in sequence <b1 b2 ….bm> if there exist indexes i1<i2….<in such that a1 bi1, a2 bi2, …, and an bin. E.g., <{3} {4,5} {8}> is contained in < {3,8}{4,5,6} {8}> Is <{3} {4,5} {8}> contained in <{7} {3,8} {9}{4,5,6} {8}> ? Is <{3} {4,5} {8}> contained in <{7} {9} {4,5,6} {3,8} {8}> ? Is <{3} {4,5} {8}> contained in <{7} {9} {3,8}{4,5,6} > ?

  8. Sequential Patterns • cid customer sequence • 1 <{30} {90} > • 2 <{40,70} {30} {40,60,70}> • 3 <{30,50,70}> • 4 <{30} {40,70} {90}> • <{90}> • A customer supports sequence s if s is contained in the • sequence for this customer. • E.g., customers 1 and 4 support sequence <{30} {90}>

  9. Sequential Patterns • cid customer sequence • 1 <{30} {90} > • 2 <{40,70} {30} {40,60,70}> • 3 <{30,50,70}> • 4 <{30} {40,70} {90}> • <{90}> • The support for a sequence s is defined as the fraction of • total customers who support s . • E.g., customers 1 and 4 support sequence <{30} {90}> • Supp(<{30} {90}>) = 2/5 = 40%

  10. Sequential Patterns • cid customer sequence • 1 <{30} {90} > • 2 <{40,70} {30} {40,60,70}> • 3 <{30,50,70}> • 4 <{30} {40,70} {90}> • <{90}> • Supp(<{40,70}>) = 2/5 = 40% • Supp({40,70}) = 3/10 = 30%

  11. Sequential Patterns Mining Given: A Transaction Database { cid, tid, date, item } Find: All sequences that have support larger than user-specified minimum support Apriori property: if a sequence is large then all sequences contained in that sequence should be large.

  12. Sequential Patterns Mining • Identify all Large 1-Sequences • Repeat until there is no more Candidate k-Sequences • Identify all Candidate k-Sequences using Large (k-1)-Sequences • Join:Two large (k-1)-sequences, L1 amd L2, that are joinable • must satisfy the following conditions: • L1(1)=L2(1) and L1(2)=L2(2) and …. L1(K-2)=L2(K-2) • L1(K-1) L2(K-1) Prune :prune candidate k-sequences generated in step 2-1 that have sub-sequences not large. Determine Large k-Sequences from Candidate k-Sequences

  13. Sequential Patterns Mining cid customer sequence 1 <{30} {90} > 2 <{40,70} {30} {40,60,70}> 3 <{30,50,70}> 4 <{30} {40,70} {90}> 5 <{90}> Minimum Support: 40%

  14. Sequential Patterns Mining cid customer sequence 1 <{30} {90} > 2 <{40,70} {30} {40,60,70}> 3 <{30,50,70}> 4 <{30} {40,70} {90}> 5 <{90}> Minimum Support: 40% Large 1-Sequence: <{30}> support=4/5=80% <{40}> support=2/5=40% <{70}> support=3/5=60% <{90}> support=3/5=60% <{40,70}> support=2/5=40%

  15. Sequential Patterns Mining Large 1-Sequence: <{30}> support=4/5=80% <{40}> support=2/5=40% <{70}> support=3/5=60% <{90}> support=3/5=60% <{40,70}> support=2/5=40% Candidate 2-Sequence: <{30} {40}> <{30} {70}> <{30} {90}> <{30} {40,70}> <{40} {30}> <{40} {70}> <{40} {90}> <{40} {40,70}> <{70} {30}> <{70} {40}> <{70} {90}> <{70} {40,70}> <{90} {30}> <{90} {40}> <{90} {70}> <{90} {40,70}> <{40,70} {30}> <{40,70} {40}> <{40,70} {70}> <{40,70} {90}>

  16. Sequential Patterns Mining Candidate 2-Sequence: <{30} {40}> <{30} {70}> <{30} {90}> <{30} {40,70}> <{40} {30}> <{40} {70}> <{40} {90}> <{40} {40,70}> <{70} {30}> <{70} {40}> <{70} {90}> <{70} {40,70}> <{90} {30}> <{90} {40}> <{90} {70}> <{90} {40,70}> <{40,70} {30}> <{40,70} {40}> <{40,70} {70}> <{40,70} {90}> Large 2-Sequence: <{30} {40}> support=2/5=40% <{30} {70}> support=2/5=40% <{30} {90}> support=2/5=40% <{30} {40,70}> support=2/5=40%

  17. Sequential Patterns Mining Large 2-Sequence: <{30} {40}> support=2/5=40% <{30} {70}> support=2/5=40% <{30} {90}> support=2/5=40% <{30} {40,70}> support=2/5=40% Candidate 3-Sequence: <{30} {40} {70}> <{30} {40} {40,70}> <{30} {70} {40}> <{30} {70} {40,70}> <{30} {40,70} {40}> <{30} {40,70} {70}> <{30} {40} {90}> <{30} {90} {40}> <{30} {70} {90}> <{30} {90} {70}> <{30} {90} {40,70}> <{30} {40,70} {90}> Prune: All sub-sequences of a candidate k-sequence should be large. Candidate 3-Sequence: No candidate 3-sequence. Stop.

  18. Summary • What is a sequential pattern? • What is support for a sequential pattern? • How to mine sequential patterns? • What are the similarities and dissimilarities between association rules and sequential patterns mining?

More Related