Efficient Mining of High Utility Itemsets from Large Datasets

Efficient Mining of High Utility Itemsets from Large Datasets Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley Western Australia PAKDD08

Outline • Introduction • Definition • Method–Compressed Transaction Utility-Prol • Experiments • Conclusions

Introduction • Frequent itemset mining is to find items that co-occur in a transaction database above a user given frequency threshold, without considering the quantity or weight such as profit of the items. • TwoPhase based on Apriori issuitable for sparse data sets with short patterns, CTU-Mine based on the pattern growth is suitable for dense data.

Definition • u(3 4, t1) =$60u(3 4, t3)=$60u(3 4) = $120 ,

Definition • Transaction Utility : • Transaction weighted Utility: • tu(1) = 80 twu(3 4)=$190

Compressed Transaction Utility-Prol 99<min_Utility(129.9)

Compressed Utility Pattern-Tree • Parallel projection of transaction database

CUP-tree • Traverse index 1 (110) from 5, 2 (310) from (2,3,4), • 3 (195) from 2, and 4 (190)from (3,5)

ProCUP-tree • index 1 (110) from 5, cause 110<min_Utility(129.9) • 2 (310) from (2,3,4),3 (195) from 2, and 4 (190)from (3,5)

ProCUP-tree • oriUtility*itemQuantity + proUtility*proQuantity = Utility • 35*2+25*2=120, 150*1+25*1=175,10*5+25*3=125 • High_Utility_Itemset = (3,2) (3,2,1)

Experiments

Conclusion • CTU-Pro algorithm to mine the complete set of high utility itemsets from both sparse and relatively dense datasets with short or longer high utility patterns. • The algorithm adapts to large data by constructing parallel subdivisions on disk that can be mined independently.

Efficient Mining of High Utility Itemsets from Large Datasets

Efficient Mining of High Utility Itemsets from Large Datasets

Presentation Transcript

Parallel Mining of Maximal Frequent Itemsets form Databases

Color Compatibility From Large Datasets

LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets

Towards Efficient Learning of Neural Network Ensembles from Arbitrarily Large Datasets

Mining High Utility Itemsets without Candidate Generation

UP-Growth: An efficient algorithm for high utility itemset mining

Mining Top-K High Utility Itemsets

Challenges in Mining Large Image Datasets

Algorithmic Analysis of Large Datasets

A Fast High Utility Itemsets Mining Algorithm

An Efficient Candidate Pruning Technique for High Utility Pattern Mining

Efficient Algorithms for Mining Share-Frequent Itemsets

Novel algorithm for mining high utility itemsets

Fast and Memory Efficient Mining of Frequent Closed Itemsets

Efficient Bitmap Indexing Techniques for Very Large Datasets

Mining of Massive Datasets: Knowledge discovery from data

Implication Networks from Large Gene-expression Datasets

Mining High Utility Dataset