1 / 41

Reducing Number of Candidates

Reducing Number of Candidates. Apriori principle : If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due to the following property of the support measure: Support of an itemset never exceeds the support of its subsets

Télécharger la présentation

Reducing Number of Candidates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing Number of Candidates • Apriori principle: • If an itemset is frequent, then all of its subsets must also be frequent • Apriori principle holds due to the following property of the support measure: • Support of an itemset never exceeds the support of its subsets • This is known as the anti-monotone property of support

  2. Apriori Algorithm • Method: • Let k=1 • Generate frequent itemsets of length 1 • Repeat until no new frequent itemsets are identified • Generate length (k+1) candidate itemsets from length k frequent itemsets • Prunecandidate itemsets containing subsets of length k that are infrequent • Count the support of each candidate by scanning the DB • Eliminatecandidates that are infrequent, leaving only those that are frequent

  3. L1 Large 2-itemset Generation Candidate Generation C2 “Large” Itemset Generation L2 Large 3-itemset Generation Candidate Generation C3 “Large” Itemset Generation L3 … Apriori • Join Step • Prune Step Disadvantage 1: It is costly to handle a large number of candidate sets Disadvantage 2: It is tedious to repeatedly scan the database and check the candidate patterns Counting Step

  4. Max-patterns • Max-pattern: frequent patterns without proper frequent super pattern • BCDE, ACD are max-patterns • BCD is not a max-pattern Min_sup=2 Frequent-pattern mining methods

  5. Maximal Frequent Itemset An itemset is maximal frequent if none of its immediate supersets is frequent Maximal Itemsets Infrequent Itemsets Border Frequent-pattern mining methods

  6. A Simple Example Database TDB Here is a database that has 5 transactions: A, B, C, D, E. Let the min support = 2. Find all frequent max itemsets using Apriori algorithm. 1stscan: count support Generate frequent itemsets of length 1 C1 L1 Eliminate candidates that are infrequent

  7. A Simple Example C2 L1 Generate length 2 candidate itemsets from length 1 frequent itemsets 2nd scan: Count support C2 L2 generate frequent itemsets of length 2

  8. A Simple Example L2 Generate length 3 candidate itemsets from length 2 frequent itemsets C3 3rd scan: Count support C3 L3 generate frequent itemsets of length 3 The algorithm stops here

  9. A Simple Example Supmin = 2 Database TDB L1 C1 1st scan C2 C2 L2 2nd scan L3 C3 3rd scan

  10. Draw a graph to illustrate the result Null Green and Yellow: Frequent Itemsets Border A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABDE ACDE ABCD ABCE BCDE ABCDE Maximum Frequent Itemsets

  11. FP-growth Algorithm • Scan the database once to store all essential information in a data structure called FP-tree(Frequent Pattern Tree) • The FP-tree is concise and is used in directly generating large itemsets • Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets

  12. FP-growth Algorithm • Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. • Step 2: Construct the FP-tree from the above data • Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). • Step 4: Determine the frequent patterns.

  13. A Simple Example of FP-tree Problem: Find all frequent itemsets with support >= 2)

  14. FP-growth Algorithm • Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. • Step 2: Construct the FP-tree from the above data • Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). • Step 4: Determine the frequent patterns.

  15. A Simple Example: Step 1 Threshold = 2

  16. A Simple Example: Step 1

  17. FP-growth Algorithm • Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. • Step 2: Construct the FP-tree from the above data • Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). • Step 4: Determine the frequent patterns.

  18. Step 2: Construct the FP-tree from the above data Root A: 1 B: 1

  19. Step 2: Construct the FP-tree from the above data Root A: 1 B: 1 B: 1 C: 1 D: 1

  20. Step 2: Construct the FP-tree from the above data Root A: 2 B: 1 C: 1 B: 1 C: 1 D: 1 D: 1 E: 1

  21. Step 2: Construct the FP-tree from the above data Root A: 3 B: 1 B: 1 C: 1 C: 1 D: 1 D: 1 D: 1 E: 1 E: 1

  22. Step 2: Construct the FP-tree from the above data Root A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 D: 1 D: 1 E: 1 E: 1

  23. FP-growth Algorithm • Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. • Step 2: Construct the FP-tree from the above data • Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). • Step 4: Determine the frequent patterns.

  24. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 Cond. FP-tree on “E” D: 1 D: 1 { } (A:1, C:1, D:1, E:1), E: 1 E: 1

  25. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 Cond. FP-tree on “E” D: 1 D: 1 { } (A:1, C:1, D:1, E:1), (A:1, D:1, E:1), E: 1 E: 1

  26. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 2 Cond. FP-tree on “E” D: 1 D: 1 { } (A:1, C:1, D:1, E:1), (A:1, D:1, E:1), E: 1 E: 1

  27. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Threshold = 2 conditional pattern base of “E” Cond. FP-tree on “E” (A:1, C:1, D:1, E:1), { } { } (A:1, D:1, E:1), (A:1, D:1, E:1), Root (A:1, D:1, E:1), A:2 D:2

  28. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 Cond. FP-tree on “D” D: 1 D: 1 { } (A:1, C:1, D:1), E: 1 E: 1

  29. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 Cond. FP-tree on “D” D: 1 D: 1 { } (A:1, C:1, D:1), (A:1, D:1), E: 1 E: 1

  30. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 Cond. FP-tree on “D” D: 1 D: 1 { } (A:1, C:1, D:1), (A:1, D:1), E: 1 (B:1, C:1, D:1), E: 1

  31. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 3 Cond. FP-tree on “D” D: 1 D: 1 { } (A:1, C:1, D:1), (A:1, D:1), E: 1 (B:1, C:1, D:1), E: 1

  32. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Threshold = 2 Cond. FP-tree on “D” { } (A:1, C:1, D:1), { } (A:1, D:1), (A:1, D:1), (C:1, D:1), (B:1, C:1, D:1), Root A:2 C:2

  33. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 3 Cond. FP-tree on “C” D: 1 D: 1 { } (A:1, B:1, C:1), (A:1, C:1), E: 1 (B:1, C:1), E: 1

  34. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Threshold = 2 Cond. FP-tree on “C” { } (A:1, B:1, C:1), { } (A:1, C:1), (A:1, C:1), (B:1, C:1), (B:1, C:1), Root A:2 B:2

  35. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 3 Cond. FP-tree on “B” D: 1 D: 1 { } (A:2, B:2), (B:1), E: 1 E: 1

  36. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Threshold = 2 Cond. FP-tree on “B” { } (A:2, B:2), { } (A:2, B:2), (B:1), (B:1), Root A:2

  37. Step 3:From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Root Threshold = 2 A: 4 B: 1 B: 2 C: 1 C: 1 D: 1 C: 1 4 Cond. FP-tree on “A” D: 1 D: 1 { } (A:4), E: 1 { } (A:4), E: 1 Root

  38. FP-growth Algorithm • Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. • Step 2: Construct the FP-tree from the above data • Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). • Step 4: Determine the frequent patterns.

  39. 3 Cond. FP-tree on “C” Root Cond. FP-tree on “E” 2 Root A:2 A:2 B:2 D:2 Cond. FP-tree on “B” 3 Root Root A:2 Cond. FP-tree on “D” 3 A:2 Cond. FP-tree on “A” 4 C:2 Root

  40. 3 Cond. FP-tree on “C” Root Cond. FP-tree on “E” 2 Root 1. Before generating this cond. tree, we generate {C} (support = 3) 1. Before generating this cond. tree, we generate {E} (support = 2) A:2 A:2 2. After generating this cond. tree, we generate {A, C}, {B, C}(support = 2) B:2 2. After generating this cond. tree, we generate {A, D, E}, {A, E}, {D, E} (support = 2) D:2 Cond. FP-tree on “B” 3 Root 1. Before generating this cond. tree, we generate {B} (support = 3) Root A:2 2. After generating this cond. tree, we generate {A, B} (support = 2) Cond. FP-tree on “D” 3 1. Before generating this cond. tree, we generate {D} (support = 3) A:2 C:2 Cond. FP-tree on “A” 4 2. After generating this cond. tree, we generate {A, D}, {C, D} (support = 2) 1. Before generating this cond. tree, we generate {A} (support = 4) Root 2. After generating this cond. tree, we do not generate any itemset.

  41. Step 4: Determine the frequent patterns Answer: According to the above step 4, we can find all frequent itemsets with support >= 2): {E}, {A,D,E}, {A,E}, {D,E}, {D}, {A,D}, {C,D}, {C}, {A,C}, {B,C}, {B}, {A,B} {A}

More Related