190 likes | 722 Vues
2. Problem statement. Find all frequent itemsetsFrequent itemsets: the itemsets above the minimum percentage of support requirementMaximal frequent itemsetsMain task of association mining All other frequent itemsets are subsets of maximal frequent itemsetsUse minimum support and minimum confide
E N D
1. 1 Scalable Algorithms for Association MiningMohammed J. ZakiIEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, pp. 372-390, May/June 2000
2. 2 Problem statement Find all frequent itemsets
Frequent itemsets: the itemsets above the minimum percentage of support requirement
Maximal frequent itemsets
Main task of association mining
All other frequent itemsets are subsets of maximal frequent itemsets
Use minimum support and minimum confidence to determine association rules
3. 3 Itemset enumeration All subsets of a frequent itemsets are frequent
The maximal frequent itemsets uniquely determine all frequent itemsets
4. 4 Key features of six new algorithms
Use vertical tid-list
a data format where we associate with each item a list of transactions in which it occurs
All frequent itemsets can be enumerated via simple tid-list intersection
Use a lattice-theoretic approach to decompose original search space into smaller pieces if main memory is not enough
Pre-fix-based approach
Maximal-clique-based approach
Search strategies: enumerating the frequent itemsets within each class
Bottom-up search
Top-down search
Hybrid search
Only a few database scans, minimizing the I/O costs
5. 5 Austin: ?????
Christie: ???????
Canon Doyle: ????Austin: ?????
Christie: ???????
Canon Doyle: ????
6. 6
7. 7
8. 8
9. 9 If we do not have enough memory to enumerate all the frequent itemsets in the lattice, we need to decompose the whole lattice into pieces
Prefix-based classes
Recursive class decomposition
Maximal-clique-based approach
Smaller sub-lattices have fewer items and can save unnecessary intersections
Graph theory
Complete graph (clique)
10. 10
11. 11 Prefix-based classes: bottom-up search Bottom-up: search parents of frequent itemsetsBottom-up: search parents of frequent itemsets
12. 12 Top-down: search children of infrequent itemsetsTop-down: search children of infrequent itemsets
13. 13
14. 14
15. 15 Algorithm design and implementation New algorithms
Eclat (Equivalence class transformation)
Prefix-based with bottom-up search
MaxEclat
Prefix-based with hybrid search
Clique
Maximal-clique-based with bottom-up search
MaxClique
Maximal-clique-based with hybrid search
TopDown
Maximal-clique-based with top-down search
AprClique
Maximal-clique-based
Horizontal layout Horizontal layout: generate all possible subsets of the maximum elements in each sub-lattice
Horizontal layout: generate all possible subsets of the maximum elements in each sub-lattice
16. 16 Experimental Results
17. 17
18. 18
19. 19
20. 20 Conclusion Partition search space into small, independent subspace
Decomposition can solve main-memory problem
Prefix-based method
Maximal-clique-based method
Search strategies
Bottom-up search
Top-down search
Hybrid search
Entire process takes only few database scans
Best performance of new algorithms
MaxClique
combine with hybrid search and maximal-clique-based decomposition