1 / 15

Mining Non-Derivable Association Rules

Mining Non-Derivable Association Rules. Bart Goethals, Juho Muhonen, Hannu Toivonen Proceeding of SIAM2005. Speaker:Pei-Min Chou Date:05/12/30. Introduction. Association rule Support: Ex: A=>C;2/4=50% Confidence: Ex: A=>C;(AC)/(A)=2/3=67% X=>Y X,Y: itemset Frequent: X∪Y is frequent

omar
Télécharger la présentation

Mining Non-Derivable Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Non-Derivable Association Rules Bart Goethals, Juho Muhonen, Hannu Toivonen Proceeding of SIAM2005 Speaker:Pei-Min Chou Date:05/12/30

  2. Introduction • Association rule • Support: • Ex: A=>C;2/4=50% • Confidence: • Ex: A=>C;(AC)/(A)=2/3=67% • X=>Y X,Y: itemset • Frequent: X∪Y is frequent • Confident :supp(X∪Y)/supp(X)≥confidence threshold • Typically association rule: • large • Redundant

  3. Introduction (cont.) • Related • Apply rule with the same confidence • Use specific inference system to prune • Does not give error bound • Mining non-derivable association rule • Find tight bounds on confidence of rule from its subrule • If low bound=upper bound derivable

  4. Non-Derivable set property • Downward closed • all supersets of a derivable set are derivable • all subsets of a non-derivable set are non-derivable • Given all subrules of X=>Y • X=>Y is derivable if and only if X∪Y is a derivable set

  5. Method • Goal: remove all derivable association rule • Different case • Rules have exactly same condition and consequent • Fixed consequent • Single item: ex. abc=>d • Multiple item: ex. abc=>de • Fixed condition or consequent • consequent: use method above • condition: • use inclusion-exclusion principle • Some subrules

  6. Example • Consider rule abc=>d • All subrules: • We miss information of abc and abcd

  7. Bounds on supp(abc) • Use inclusion-exclusion principle • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)- supp(a)-supp(b)-supp(c)+supp({}) • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) • Supp(abc) ≤ supp(ab) • Supp(abc) ≤ supp(bc) • Supp(abc) ≤ supp(ac) • Supp(abc) ≥ 0

  8. Example (cont.) • ab=>c For supp(ab)=4 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =4+3+3-7-7-5+10=1 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =4+3-7=0 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =4+3-7=0 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=4 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 For supp(ab)=5 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =5+3+3-7-7-5+10=2 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =5+3-7=1 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =5+3-7=1 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=5 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 For supp(ab)=6 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =6+3+3-7-7-5+10=3 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =6+3-7=2 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =6+3-7=2 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=6 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 For supp(ab)=7 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =7+3+3-7-7-5+10=4 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =7+3-7=3 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =7+3-7=3 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =3+3-5=1 • Supp(abc) ≤ supp(ab)=7 • Supp(abc) ≤ supp(bc)=3 • Supp(abc) ≤ supp(ac)=3 • Supp(abc) ≥ 0 Supp(ac) =3 Supp(bc) =3 Supp(a) =7 Supp(b) =7 Supp(c) =5 Supp({})=10 Supp(ab) ≥ supp(a)+supp(b)-supp({}) =7+7-10=4 (low) Supp(ab) ≤supp(a)=supp(b)=7 (upper) Confidence interval: ab=>c is [1/5,1/2]

  9. Example (cont.) • ab=>c For supp(ab)=4 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)-supp(a)-supp(b)- supp(c)+supp({}) =4+7+7-7-7-10+10=4 • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) =4+7-7=4 • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) =4+7-7=4 • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) =7+7-10=4 • Supp(abc) ≤ supp(ab)=4 • Supp(abc) ≤ supp(bc)=7 • Supp(abc) ≤ supp(ac)=7 • Supp(abc) ≥ 0 Supp(ac) =7 Supp(bc) =7 Supp(a) =7 Supp(b) =7 Supp(c) =10 Supp({})=10 Supp(ab)≥supp(a)+supp(b)-supp({}) =7+7-10=4 Supp(ab)≤supp(a)=supp(b)=7 Supp(ab)=[4,7] non-derivable ab=>c is [1,1] derivable

  10. Use subrules • For any subset J I, such that |I\J|≥k-1 • K>0: user given parameterdepth • Ex. depth=4 • Supp(abc) ≤ supp(ab)+ supp(bc)+ supp(ac)- supp(a)-supp(b)-supp(c)+supp({}) • Supp(abc) ≥ supp(ab)+supp(ac)-supp(a) • Supp(abc) ≥ supp(ab)+supp(bc)-supp(b) • Supp(abc) ≥ supp(bc)+supp(ac)-supp(c) • Supp(abc) ≤ supp(ab) • Supp(abc) ≤ supp(bc) • Supp(abc) ≤ supp(ac) • Supp(abc) ≥ 0

  11. Experiments • Dataset characteristics • Number of rules after different pruning methods

  12. Exp(1) • non-derivable • Minimal closed association rules

  13. Exp(2) • non-derivable • basic association rules • maximum entropy method

  14. Exp(3) ---non-derivable with singular consequent

  15. Exp(4) ---non-derivable with different support

More Related