1 / 14

Department of Information & Computer Education, NTNU

Department of Information & Computer Education, NTNU. SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets. Qinghua Zou, Wesley W. Chu, and Baojing Lu,

ghada
Télécharger la présentation

Department of Information & Computer Education, NTNU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets Qinghua Zou, Wesley W. Chu, and Baojing Lu, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02),  9-12 Dec. 2002, pp. 570 – 577. Advisor:Jia-Ling Koh Speaker:Chen-Yi Lin

  2. Department of Information & Computer Education, NTNU Outline • Introduction • The strategy of SmartMiner • Experimental Results • Conclusions

  3. Department of Information & Computer Education, NTNU Dataset id: item set 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e MinSup=2 Introductions (1/5)The problem of mining frequent patterns What itemsets are frequent itemsets (FI)? a, b, c, d, e, ab, ac, ad, bc, bd, be, cd, ce, de, abc, abd, acd, bcd, cde, abcd Maximal frequent itemset(MFI): No superset is frequent. abcd, be, cde

  4. Department of Information & Computer Education, NTNU Introductions (2/5)Current status and techniques – Why MFI not FI • Mining FI is infeasible when there exists long FI. • E.g, Suppose we have a 20-item frequent set a1 a2 …a20. All of its subset are frequent, i.e., 220=1,048,576 • Given a unknown large dataset, mining MFI is fast and gives us an overview of the characteristics of the dataset.

  5. Department of Information & Computer Education, NTNU :abcde a:bcde b:cde c:de d:e e: ab:cde ac:de ad:e ae: bc:de bd:e be: cd:e ce: de: abc:de abd:e abe: acd:e ace: ade: bcd:e bce: bde: cde: abcd:e abce: abde: acde: bcde: abcde: Introductions (3/5) • Enumeration tree: • Each node has a head and a tail representing a state. • The head is a candidate while the tail contains items to form new heads. head tail An enumeration tree for abcde for the given order of a, b, c, d, e

  6. Department of Information & Computer Education, NTNU MFI Dataset |D|=5 2 4 4 4 3 id: item set abcd: :a b c d e Superset chk 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e eb: :a e b c d ecd: |Da|=2 1 2 2 2 |De|=3 2 2 2 a: e b c d c: d d: ec: d ed: e: b c d eb: c d b: c d e: b c d MinSup=2 |Deb|=2 1 1 |Dec|=2 2 eb: ecd: Introductions (4/5)Current status and techniques – Mafia: an example Answer abcd eb,ecd abcd: MFI abcd, be, cde

  7. Department of Information & Computer Education, NTNU Introductions (5/5)Current status and techniques – the limitations • Constant superset checking. • A study shows that CPU spends 40% time for superset checking. • The size of the search tree is too large. • It can be reduced. • The number of support counting is large. • Counting support is expensive.

  8. Department of Information & Computer Education, NTNU A1 A1 B1 B2 … Bn B1 B’ … Creating B2 before exploring B1 Creating B’ after exploring B1 … … The strategy of SmartMiner (1/2) Using information from B1 to prune the space at B’ (a) Previous approach (b) SmartMiner Strategy SmartMiner takes advantages of the information from previous steps.

  9. Department of Information & Computer Education, NTNU S0 S0 Inf0 Inf0 S1 S1 Inf1 Inf1 Mfi Mfi |D|=5 :aebcd :ebcd :bcd Dataset id: item set 2 4 4 4 3 nil bcd,b,cd :a b c d e a:ebcd e:bcd nil 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e nil nil bcd b,cd |Da|=2 |De|=3 :bcd :cd :d nil nil d 1 2 2 2 2 2 2 b:cd c:d nil :e b c d :b c d nil nil |Dec|=2 MinSup=2 :b c d [] d 2 MFI abcd, be, cde |Deb|=2 d: 1 1 :c d :d : The strategy of SmartMiner (2/2) Answer abcd eb,ecd bcd :a e b c d bcd:

  10. Department of Information & Computer Education, NTNU ExperimentalResults (1/4) Running time on Mushroom

  11. Department of Information & Computer Education, NTNU ExperimentalResults (2/4) Search tree size on Mushroom

  12. Department of Information & Computer Education, NTNU ExperimentalResults (3/4) The number of support counting on Mushroom

  13. Department of Information & Computer Education, NTNU ExperimentalResults (4/4) Running time on Connect

  14. Department of Information & Computer Education, NTNU Conclusions • The SmartMiner algorithm is able to take advantage of the information gathered from previous steps to search for MFI. • Compared with Mafia and GenMax, SmartMiner generates a smaller search tree, requires a smaller number of support counting, and does not require superset checking.

More Related