160 likes | 299 Vues
Mining Non-Derivable Association Rules. Bart Goethals, Juho Muhonen, Hannu Toivonen Proceeding of SIAM2005. Speaker:Pei-Min Chou Date:05/12/30. Introduction. Association rule Support: Ex: A=>C;2/4=50% Confidence: Ex: A=>C;(AC)/(A)=2/3=67% X=>Y X,Y: itemset Frequent: X∪Y is frequent
E N D
Mining Non-Derivable Association Rules Bart Goethals, Juho Muhonen, Hannu Toivonen Proceeding of SIAM2005 Speaker:Pei-Min Chou Date:05/12/30
Introduction • Association rule • Support: • Ex: A=>C;2/4=50% • Confidence: • Ex: A=>C;(AC)/(A)=2/3=67% • X=>Y X,Y: itemset • Frequent: X∪Y is frequent • Confident :supp(X∪Y)/supp(X)≥confidence threshold • Typically association rule: • large • Redundant
Introduction (cont.) • Related • Apply rule with the same confidence • Use specific inference system to prune • Does not give error bound • Mining non-derivable association rule • Find tight bounds on confidence of rule from its subrule • If low bound=upper bound derivable
Non-Derivable set property • Downward closed • all supersets of a derivable set are derivable • all subsets of a non-derivable set are non-derivable • Given all subrules of X=>Y • X=>Y is derivable if and only if X∪Y is a derivable set
Method • Goal: remove all derivable association rule • Different case • Rules have exactly same condition and consequent • Fixed consequent • Single item: ex. abc=>d • Multiple item: ex. abc=>de • Fixed condition or consequent • consequent: use method above • condition: • use inclusion-exclusion principle • Some subrules
Example • Consider rule abc=>d • All subrules: • We miss information of abc and abcd
Bounds on supp(abc) • Use inclusion-exclusion principle • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)- supp(a)-supp(b)-supp(c)+supp({}) • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) • Supp(abc) ≤ supp(ab) • Supp(abc) ≤ supp(bc) • Supp(abc) ≤ supp(ac) • Supp(abc) ≥ 0
Example (cont.) • ab=>c For supp(ab)=4 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =4+3+3-7-7-5+10=1 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =4+3-7=0 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =4+3-7=0 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=4 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 For supp(ab)=5 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =5+3+3-7-7-5+10=2 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =5+3-7=1 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =5+3-7=1 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=5 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 For supp(ab)=6 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =6+3+3-7-7-5+10=3 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =6+3-7=2 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =6+3-7=2 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=6 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 For supp(ab)=7 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =7+3+3-7-7-5+10=4 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =7+3-7=3 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =7+3-7=3 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=7 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 Supp(ac) =3 Supp(bc) =3 Supp(a) =7 Supp(b) =7 Supp(c) =5 Supp({})=10 Supp(ab) ≥ supp(a)+supp(b)-supp({}) =7+7-10=4 (low) Supp(ab) ≤supp(a)=supp(b)=7 (upper) Confidence interval: ab=>c is [1/5,1/2]
Example (cont.) • ab=>c For supp(ab)=4 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =4+7+7-7-7-10+10=4 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =4+7-7=4 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =4+7-7=4 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =7+7-10=4 • Supp(abc) ≤ supp(ab)=4 • Supp(abc) ≤ supp(bc)=7 • Supp(abc) ≤ supp(ac)=7 • Supp(abc) ≥ 0 Supp(ac) =7 Supp(bc) =7 Supp(a) =7 Supp(b) =7 Supp(c) =10 Supp({})=10 Supp(ab)≥supp(a)+supp(b)-supp({}) =7+7-10=4 Supp(ab)≤supp(a)=supp(b)=7 Supp(ab)=[4,7] non-derivable ab=>c is [1,1] derivable
Use subrules • For any subset J I, such that |I\J|≥k-1 • K>0: user given parameterdepth • Ex. depth=4 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)- supp(a)-supp(b)-supp(c)+supp({}) • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) • Supp(abc) ≤ supp(ab) • Supp(abc) ≤ supp(bc) • Supp(abc) ≤ supp(ac) • Supp(abc) ≥ 0
Experiments • Dataset characteristics • Number of rules after different pruning methods
Exp(1) • non-derivable • Minimal closed association rules
Exp(2) • non-derivable • basic association rules • maximum entropy method