1 / 29

Mining High Utility Itemsets without Candidate Generation

Mining High Utility Itemsets without Candidate Generation. Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source : CIKM "12 Advisor: Jia -ling Koh Speaker: I- Chih Chiu. Outline. Introduction Problem Definition Utility-List Structure High Utility Itemset Miner

milt
Télécharger la présentation

Mining High Utility Itemsets without Candidate Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, JunfengQu Source: CIKM "12 Advisor: Jia-ling Koh Speaker: I-Chih Chiu

  2. Outline • Introduction • Problem Definition • Utility-List Structure • High Utility Itemset Miner • Experiment • Conclusion

  3. Introduction • The rapid development of database techniques facilitates the storage and usage of massive data from business corporations, governments, and scientific organizations. • The high utility itemset mining problem is one of the most important from the famous frequent itemsetmining problem.

  4. Introduction • Traditional frequent itemset mining algorithms cannot evaluate the utility information about itemsets. • In a supermarket database • Each item has a distinct price/profit. • Each item in a transaction is associated with a distinct quantity. • An itemset with high support may have low utility Ex :

  5. Motivation • Recently, a number of high utility itemset mining algorithms have been proposed. • Generate candidate high utility itemsets. • Compute the exact utilitiesof the candidates by scanning the database to identify high utility itemsets. • However, the algorithms often generate a very large number of candidate itemsets. • Excessive memory requirement for storing candidate itemsets. • A large amount of running time for generating candidates and computing their exact utilities.

  6. Goal • A novel structure, called utility-list, is proposed. • the utility information about an itemset • the heuristic information about whether the itemset should be pruned or not. • An efficient algorithm, called HUI-Miner (High Utility Itemset Miner), is developed. • It does not generate candidate high utility itemsets. • It can mine high utility itemsets after constructing the initial utility-lists.

  7. transactions High utility itemsets Construct utility list HUI-Miner Diagram

  8. Outline • Introduction • Problem Definition • Utility-List Structure • High Utility Itemset Miner • Experiment • Conclusion

  9. Problem Definition • : a set of items. • Each transaction() has a unique identifier(). Def. 1.: is the associated with in T in the . Def. 2.: is the of in the . Def. 3.: is the product of and . Ex :

  10. Def. 4.: The of in is the sum of the utilities of all the items in in , where . Def. 5.: The of is the sum of the utilities of in all the transactions in , where . Def. 6.: The of is the sum of the utilities of all the items in , where . Ex : Ex :

  11. Def. 7.: The of itemset in is the sum of the utilities of all the transactions containing X in DB, where . Property 1.If is less than a given “minutil”, all supersets of are not high utility. Rationale. Ex : Ex : Assume minutil=30, According to Property 1, all supersets of are nothigh utility.

  12. Outline • Introduction • Problem Definition • Utility-List Structure • Initial Utility-Lists • Utility-Lists of 2-Itemsets • Utility-Lists of k-Itemsets(k3) • High Utility Itemset Miner • Experiment • Conclusion

  13. Initial Utility-Lists Def. 8. A transaction is considered as “revised“ after (1) all the items whose transaction-weighted utilities are less than a given are deleted from the transaction. (2) the remaining items are sorted in transaction-weighted- utility-ascending order. • The remaining items are sorted: e<c<b<a<d Suppose

  14. Def. 9: The set of all the items after in . : an itemset, : a transaction (or itemset) Def. 10.: The of itemset X in transaction T is the sum of the utilities of all the items in in , where . Ex : Tids : a transaction T containing X Iutils :the utility of X in T, i.e., Rutils : the remaining utility of X in T, i.e., Ex : <3,2,9> is in the utility-list of {c}.

  15. Utility-Lists of 2-Itemsets • No need for database scan. Utility-lists of 2-itemset identifying common transactions

  16. Utility-Lists of k-Itemsets • To construct the utility-list of k-itemset () • Intersect the utility-list of and Ex : {} (k3) (k=2)

  17. Outline • Introduction • Problem Definition • Utility-List Structure • High Utility Itemset Miner • Search space • Pruning Strategy • HUI-Miner Algorithm • Experiment • Conclusion

  18. Search space • Set-Enumeration Tree Def. 11. Given a set-enumeration tree, an itemset represented by a node is called an extension of an itemset represented by an ancestor node of the node. For an itemset containing items, its extension containing items is called an -of the itemset. Ex : : the 1-extension of : the 2-extensionof Def. 9 Property 2.If is an extension of , Rationale. Any extension of X is a combination of X with the item(s) after X.

  19. Pruning Strategy • Exhaustive search → Time consuming Lemma 1.Given the utility-list of , if the sumof allthe and in the utility-list is less than a given “”, any extension of is not high utility.

  20. : the of transaction • : the set in the utility-list of • : the set in the utility-list of ’ Ex : Suppose The sum of all the iutilsamdrutils 7+6+11=24 < 30

  21. HUI-Miner Algorithm

  22. Outline • Introduction • Problem Definition • Utility-List Structure • High Utility Itemset Miner • Experiment • Conclusion

  23. Experimental Setup • Besides HUI-Miner, experiments include three algorithms • IHUPTWU • UP-Growth • UP-Growth+ • Eight databases real synthetic

  24. Running Time • Terminated a mining task, once its running time exceeds 10000 seconds. • For most sparse databases, the performance superiorityof HUI-Miner becomes very significant when the decreases.

  25. Memory Consumption • Except for database accidents in (a), HUI-Miner always consumes less memory than the other algorithms. • Another observation is that UP-Growth+ consumes more memory than UP-Growth in (b) and(d). • UP-Growth+ holds more information than UPGrowth in sparse and large database.

  26. Experiment • Processing Order of Items • The processing order of items significantly influences the performance of a high utility itemset mining algorithm.

  27. Outline • Introduction • Problem Definition • Utility-List Structure • High Utility Itemset Miner • Experiment • Conclusion

  28. Conclusion • Proposed a novel data structure, utility-list, and developed an efficient algorithm, HUI-Miner, for high utility itemset mining. • Utility-lists provide not only utility information about itemsets but also important pruning information for HUI-Miner. • HUI-Miner can mine high utility itemsetswithout candidate generation, which avoids the costly generation and utility computation of candidates.

More Related