1 / 12

CBW: An Efficient Algorithm for Frequent Itemset Mining

CBW: An Efficient Algorithm for Frequent Itemset Mining. Ja-Hwung Su, Wen-Yang Lin System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference . Outline. Introduction CBW algorithm Experiment result Conclusion . Introduction .

gefjun
Télécharger la présentation

CBW: An Efficient Algorithm for Frequent Itemset Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CBW: An Efficient Algorithm for Frequent Itemset Mining Ja-Hwung Su, Wen-Yang Lin System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference

  2. Outline • Introduction • CBW algorithm • Experiment result • Conclusion

  3. Introduction • Mining association rules from a large database of business data has been a hot topic. • When the minimum support threshold decreases, the number of candidate itemsets exponentially increases. • The paper propose a new algorithm that maintains its performance even at relative low supports.

  4. Algorithm • Cut-Both-Ways( CBW )employs a bi-directional search strategy and hybridizes various techniques in frequent itemset generation. • Step1. Pursue an appropriate cutting level α to divide the space into two different parts. • Step2. After identifying all frequent itemset at this level, we perform a downward search to enumerate all frequent itemsets below the cutting level α and determine their support values. • Step3. Upward search to enumerate all frequent itemsets with cardinalities larger than α .

  5. Input : The transaction database D and minimum support minsup ; Output : The set of frequent itemsets F ; 1.scan D to generate all frequent 1-itemsets F1 ; 2.Trans ( D, T, F1, F2, α ) ; 3.Dwnsearch ( D, DF, Fα, α, minsup ) ; 4.Upsearch ( T, UF, Fα, α, minsup ) ; 5.return F = DF ∪ UF ; Concept illustration of CBW

  6. Cutting level α • Problem : • If it is too low, unnecessary intersections will happen frequently during upward searching. • If it is too high, the downward search will spend much more time in itemsets enumeration and counting their supports.

  7. Cutting level α(cont.) • Solution : • INT[r] : the nearest integer of r, for r >=1. • ti⊥ minsup : the set of items in ti with support larger than minsup. • ti⊥ minsup = {x|x ti and sup(x) >= minsup }

  8. Assume that minsup = 40%. The frequent 1-itemsets include {A}, {B}, {C}, and {D}. The cutting level α is (3+2+1+4+3+2+3+3+3+3)/10 Example

  9. C2 = {{A,B}, {A,C}, {A,D}, {B,C}, {C,D}} Since item E is not frequent , there is no need to create the tidlist of E. Tids of t2, t3, and t6 are not included because their cardinalities are less than 3. The resulting 3-itemsets is {{B, C, D}} Example (cont.)

  10. Experiment result(1)

  11. Experiment result(2)

  12. Conclusion • The paper employs a clever guess on the most promising itemset level ( cutting-level) to generate all frequent itemsets located there.

More Related