1 / 10

MapReduce -based Closed Frequent Itemset Mining with Efficient Redundancy Filtering

MapReduce -based Closed Frequent Itemset Mining with Efficient Redundancy Filtering. Su-Qi Wang∗, Yu-Bin Yang∗, Guang -Peng Chen∗, Yang Gao∗ and Yao Zhang† ∗State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China

jalene
Télécharger la présentation

MapReduce -based Closed Frequent Itemset Mining with Efficient Redundancy Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MapReduce-based Closed Frequent Itemset Mining with Efficient Redundancy Filtering Su-Qi Wang∗, Yu-Bin Yang∗, Guang-Peng Chen∗, Yang Gao∗ and Yao Zhang† ∗State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China †JinLing College, Nanjing University, Nanjing, China ICDMW 2012 11 July 2014 SNU IDB Hyesung Oh

  2. Introduction • Closed frequent itemset • Proposed in 1999 by Pasquier et al* • Alternative of the frequent itemset mining(FIM) • Has the same power of FIM, reduce redundancy • Existing CFI mining algorithms • Candidate generate-and-test approach • Pattern growth approach • Limitations of data size • memory use and communication costs • Some algorithms using PC clusters • Workload balancing, … * N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering frequent closed itemsets for association rules,” Database Theory–ICDT’99, pp. 398–416, 1999.

  3. Closed frequent itemset • Frequent itemset • Closed, greater than or equal to minsup minsup = 2

  4. Parallelized AFOPT-close algorithm • 4 steps • Step 1: Parallel counting. (MR pass) • Count the support of each item • Step 2: Constructing the global F-list. • Sort the items by their frequency descorder • Exclude items of which sup is lower than minsup • Step 3: Parallel mining closed frequent itemset. (MR pass) • Mining locally closed frequent itemset • Step 4: Parallel filtering the redundant itemsets. (MR pass) • Filter itemset which is locally closed but not globally closed

  5. Example Minsup = 3 Sort desc order: F-list Word count { fm 4}, { fpc 3}, { fp 3} are closed locally but not in global

  6. Detail of Step 3

  7. Efficient Redundant itemsets Filtering Mapper Output Reducer Reducer Output

  8. Experimental Results - 1 • Two real datasets • “connect” • contains game state information • 8.8 Megabytes • “webdocs” • 1,692,082 taransactions with 5,267,656 distinct items • Max length of a transaction is 71,472 • 1.4 Gigabytes • 6 nodes with Hadoop 0.21.0 • Each node • 4 Intel Core processors • 4GB RAM • 500G HDD • Ubuntu 10.10 • Java openjdk-6-jdk

  9. Experimental Results - 2 [12] G. Chen, Y. Yang, Y. Gao, and L. Shang, “Mining closed frequent itemset based on mapreduce,” in Proceedings of the 4th China Conference on Data Mining. CCDM, 2011.

  10. Conclusion • Good scalability on large-scale datasets • When locally closed frequent itemset is large • Communication cost becomes an important factor

More Related