CBW: An Efficient Algorithm for Frequent Itemset Mining

CBW: An Efficient Algorithm for Frequent Itemset Mining Ja-Hwung Su, Wen-Yang Lin System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference

Outline • Introduction • CBW algorithm • Experiment result • Conclusion

Introduction • Mining association rules from a large database of business data has been a hot topic. • When the minimum support threshold decreases, the number of candidate itemsets exponentially increases. • The paper propose a new algorithm that maintains its performance even at relative low supports.

Algorithm • Cut-Both-Ways( CBW )employs a bi-directional search strategy and hybridizes various techniques in frequent itemset generation. • Step1. Pursue an appropriate cutting level α to divide the space into two different parts. • Step2. After identifying all frequent itemset at this level, we perform a downward search to enumerate all frequent itemsets below the cutting level α and determine their support values. • Step3. Upward search to enumerate all frequent itemsets with cardinalities larger than α .

Input : The transaction database D and minimum support minsup ; Output : The set of frequent itemsets F ; 1.scan D to generate all frequent 1-itemsets F1 ; 2.Trans ( D, T, F1, F2, α ) ; 3.Dwnsearch ( D, DF, Fα, α, minsup ) ; 4.Upsearch ( T, UF, Fα, α, minsup ) ; 5.return F = DF ∪ UF ; Concept illustration of CBW

Cutting level α • Problem : • If it is too low, unnecessary intersections will happen frequently during upward searching. • If it is too high, the downward search will spend much more time in itemsets enumeration and counting their supports.

Cutting level α(cont.) • Solution : • INT[r] : the nearest integer of r, for r >=1. • ti⊥ minsup : the set of items in ti with support larger than minsup. • ti⊥ minsup = {x|x ti and sup(x) >= minsup }

Assume that minsup = 40%. The frequent 1-itemsets include {A}, {B}, {C}, and {D}. The cutting level α is (3+2+1+4+3+2+3+3+3+3)/10 Example

C2 = {{A,B}, {A,C}, {A,D}, {B,C}, {C,D}} Since item E is not frequent , there is no need to create the tidlist of E. Tids of t2, t3, and t6 are not included because their cardinalities are less than 3. The resulting 3-itemsets is {{B, C, D}} Example (cont.)

Experiment result(1)

Experiment result(2)

Conclusion • The paper employs a clever guess on the most promising itemset level ( cutting-level) to generate all frequent itemsets located there.

CBW: An Efficient Algorithm for Frequent Itemset Mining

CBW: An Efficient Algorithm for Frequent Itemset Mining

Presentation Transcript

Advanced Topics in Data Mining: Web Mining

Data mining and its application and usage in medicine

Omeprazole Magnesium

Microbial Mining

Technologies for Mining Frequent Patterns in Large Databases

Hungarian Algorithm

Advanced Topics in Data Mining

Web Log, Text, and Other Data Mining

DATA MINING LECTURE 4

Data Mining in the Real-World

Data Mining Chapter 5 Credibility: Evaluating What’s Been Learned

Final Presentation

Educational Data Mining

Data Mining Tutorial

CS 277, Data Mining Introduction

DATA MINING: AN INTRODUCTION

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques — Chapter 10 — 10.3.2 Mining Text and Web Data (II)

Web Mining

Data Mining: How to make islands of knowledge emerging out of oceans of data

Link Mining

Chapter 1. Introduction