240 likes | 353 Vues
This paper presents a novel approach to mining approximate closed itemsets from transaction databases in the presence of random noise and measurement errors. By defining core patterns as initial seeds, the method extends to approximate frequent itemsets while maintaining efficiency through pruning techniques like top-down and closeness-based pruning. Using a lattice structure, the technique generates candidate itemsets by leveraging supporting transactions, leading to enhanced identification of interesting patterns in complex datasets. Experiments validate the method's effectiveness on synthetic datasets. ###
E N D
AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery H.Cheng, P.S. Yu, and J.Han ICDM’06 報告者:林靜怡 2007/01/17
Introduction • In real applications, a database contains random noise or measurement error • some interesting patterns would previously be fragmented • discover approximate frequent itemsets in the presence of random noise
Definition • D:a transaction database take the form of an n x m binary matrix • I = be a set of all items • T:the set of transactions in D
Definition • :The exact support of an itemset x • : The exact supporting transactions of x • :the support of an approximate itemset x • : The supporting transactions of an approximate itemset x
approximate closed frequent itemset mining • The problem of approximate closed frequent itemset mining from core patternsis the mining of all itemsets which are (1) core patterns w.r.t.α (2) approximate frequent itemsets w.r.t. , and min sup (3) closed.
Approximate Closed Itemset Mining • Mine the set of core patterns with min_sup = αs • Treat core patterns as the initial seeds for possible further extension to approximate frequent itemsets • C:the set of core patterns • L:A lattice whichis built over C
Example • For core pattern ,the number of 0s allowed in a supporting transaction is • extension space for : traverse upward in the lattice for 2 levels (i.e., levels 2 and 3)
for a core itemset yand each sub-pattern , any transaction supporting x also approximately supports y • is the unionof the transaction set • Ex: is the union of the transaction sets of all itemsets at levels 2 and 3
identify candidate approximateitemsets • steps to identify candidate approximate itemsets include
Topdown Mining and Pruning by Closeness • effective pruning by the closeness definition and the min_sup threshold • starts with the largest pattern in L and proceeds level by level, in the size decreasing order of core patterns.
{0,2,4} {0,2,5} {2,5} {0,2,6} {2,6} {2,3,5,6} {0,2} {2} {2,5} {2,6}
Example • = {0,2,3,4,5,6} The number of 0s allowed in a transaction is => extension space includes its sub- patterns at level 2.
=> • prune{a,b,c} without actual computation: (1) if {a,b,c,d} satisfies the min_sup threshold and the constraint, then no matter whether it is closed or non-closed {a,b,c} can be pruned (2) if {a,b,c,d}does not satisfy the min_sup, then {a,b,c}can be pruned
Experiment • The IBM synthetic data generator • A dataset T10.I100.D20K is generated with 20K transactions, 100 distinct items and an average of 10 items per transaction