UP-Growth: An efficient algorithm for high utility itemset mining

UP-Growth: An efficient algorithm for high utility itemset mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie and Philip S. Yu Presenters: 林世祥、張蕙玲、莊善淯

Outline • Introduction • Problem definition • Related work • Proposed method • The proposed data structure: UP-Tree • The proposed mining method: UP-Growth • Discussion

Introduction • The target is to discover itemsets with high utility. • The basic meaning of utility is the profitability of items in the transaction.

Introduction • If we apply the frequency pruning method…… • We may lose valuable item which is infrequent. • We may get too many frequent but not valuable item.

Introduction • Mining high utility itemsets is not an easy task since the downward closure property (used in frequent itemset mining) cannot be applied here. • The superset of a low utility set may be high utility!

Introduction • This paper propose an efficient algorithm: UP-Growth (Utility Pattern Growth) and a special data structure: UP-Tree(Utility Pattern Tree). • 4 strategies are proposed for efficient construction of UP-tree.

Problem definition • Transaction utility : • database • Profit table • ex: Utility of T1 is u(A,T1) + u(C,T1) + u(D,T1) • = 1x5 + 1x1 + 1x2 = 8

Problem definition • Itemset utility : • database • Profit table • Utility of an itemset X in the database is

Problem definition • Itemset utility : • database • Profit table • ex: u({AD}) = u({AD},T1) + u({AD},T3) • = (1x5 + 1x2) + (1x5 + 6x2) • = 7+17 = 24

Problem definition • High utility itemset : u(X) > min utility • database • Profit table • If we define min_utility = 30

Problem definition • High utility itemset : u(X) > min utility • database • Profit table • If we define min_utility = 30 • u({B}) = 16 - low utility • u({BD})=30 - high utility ?

Problem definition • Transaction-weighted utilization(TWU) : • database • Profit table • TWU(X) is

Problem definition • Transaction-weighted utilization(TWU) : • database • Profit table • TWU(A) = TU(T1) + TU(T2) + TU(T3) • = 8 + 27 + 30 = 56

Problem definition • Transaction-weighted utilization(TWU) : • database • Profit table • TWU of each item

Problem definition • Problem Statement : Given a transaction database D and a user specified minimum utility. Mining high utility itemsets from the transaction database is equivalent to discover all itemsets from D whose utilities are no less than minimum utility.

Related work : IHUP Algorithm • Step 1: compute TWU, remove unpromising items, and rearrange the items.

Related work : IHUP Algorithm • Step 1: compute TWU, remove unpromising items, and rearrange the items. X X X min_utility = 40 X X

Related work : IHUP Algorithm • Step 1: compute TWU, remove unpromising items, and rearrange the items.

Related work : IHUP Algorithm • Step 2: construct an IHUP tree(similar to FP-tree) by inserting each transactions.

Related work : IHUP Algorithm • Step 2: construct an IHUP tree(similar to FP-tree) by inserting each transactions. Root C :1,8 A :1,8 D :1,8

Related work : IHUP Algorithm • Step 2: construct an IHUP tree(similar to FP-tree) by inserting each transactions. Root C :2,35 E :1,27 A :1,8 A :1,27 D :1,8

Related work : IHUP Algorithm • Step 2: construct an IHUP tree(similar to FP-tree) by inserting each transactions. Root C :3,65 E :2,57 A :1,8 A :2,57 D :1,8 B :1,30 D :1,30

Related work : IHUP Algorithm • Step 2: construct an IHUP tree(similar to FP-tree) by inserting each transactions. Root C :4,85 E :3,77 A :1,8 B :1,20 A :2,57 D :1,8 B :1,30 D :1,20 D :1,30

Related work : IHUP Algorithm • Step 2: construct an IHUP tree(similar to FP-tree) by inserting each transactions. Root C :5,96 E :4,88 A :1,8 B :2,31 A :2,57 D :1,8 B :1,30 D :1,20 D :1,30

Related work : IHUP Algorithm • Step 3: execute FP-growth on this tree Root C :5,96 E :4,88 A :1,8 B :2,31 A :2,57 D :1,8 B :1,30 D :1,20 D :1,30

Related work : IHUP Algorithm • Step 3: execute FP-growth on this tree • First, by tracing the node {D} to root, we can get 3 paths : DAC, DBAEC, DBEC, and generate {D}’s conditional pattern base.

Related work : IHUP Algorithm • Step 3: execute FP-growth on this tree • Repeat the method of computing TWU, we can get local utilities in the paths

Related work : IHUP Algorithm • Step 3: execute FP-growth on this tree • Again, remove the item whose utility less than threshold. (min_util = 40) X

Related work : IHUP Algorithm • Step 3: execute FP-growth on this tree • Finally, rearrange the path and generate a new tree. Root C :3,58 B :2,50 E :2,50

Outline • Introduction • Problem definition • Related work • Proposed method • The proposed data structure: UP-Tree • The proposed mining method – UP-Growth • Discussion

UP-Tree (Utility Pattern Tree) • Purpose： 1. facilitate the mining performance 2. avoid scanning original database repeatedly • Effect： 1. maintain the information of transactions 2. maintain the high utility itemsets

Introduction of elements in UP-Tree • N.name • N.count • N.nu • N.parent • N.hlink • A set of child nodes • Header table

Promising and unpromising item • An item is called a promising item if its TWU >min_util. • Otherwise, the item is called an unpromising item.

The first scan • TU & TWUs • Eliminate unpromising items • Header table

The second scan • Transactions are inserted into the UP-Tree.

Strategy 1. Discarding global unpromising items (DGU) Discard the information of unpromising items from the database since an unpromising item plays no role in high utility itemsets.

Generating PHUIs from the global UP-Tree • Start from the bottom of the header table. • By tracing the nodes to root, tree paths • (D->A->C:1, 8) (D->B->A->E->C:1, 25) (D->B->E->C:1, 20) the support count the path utility -

Path utility of a path in a CPB of node X = node {X}.nu • Path utility of an item in a path in a CPB of node X = node {X}.nu • Path utility of an item in a CPB of node X add the path utility of an item in all the path together in a CPB • Local promising item in a CPB An item is Y called a local promising item in {X}-CPB if pu(Y, {X}-CPB)min_util; otherwise, Y is called a local unpromising item.

{D}’s conditional UP-Tree {D}-Tree

Generating PHUIs from {D}-Tree A set of PHUIs which are involved with item {D} are obtained. {{D}:58, {DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45, {DBC}:45, {DC}:53}

Consider the next item. • After finding all PHUIs, high utility itemsets and their utilities are identified from the set of PHUIs by scanning original database once.

Decreasing node utilities in construction of a global UP-Tree • An item doesn’t appear in its ancestor’s conditional pattern base, but the utility of the item is involved in the path utilities of the paths in its ancestor’s CPB. • Any item which is a descendant of the node {B} will not appear in {B}-CPB. Therefore, the utilities of {B}’s descendants can be removed from the path utility of each path in {B}-CPB. The process can be done during the construction of global UP-Tree since the paths in the conditional pattern bases are directly derived from the global UP-Tree.

:1,1 :1,6

Strategy 2. Discarding global node utilities(DGN) For any node in a global UP-Tree, the utilities of its descendants are discarded from the utility of the node during the construction of a global UP-Tree.

By applying strategy DGN, the utilities of the nodes which are closer to the root of the global UP-Tree are effectively reduced. • DGN strategy is especially suitable for the database which contains lots of long transactions since the more items are in the transactions and the more utilities can be discarded by DGN.

UP-Growth: An efficient algorithm for high utility itemset mining