1 / 19

Mining Association Rules from Stars

Department of Information & Computer Education, NTNU. Mining Association Rules from Stars. Eric Ka Ka Ng, Ada Wai-Chee Fu, and Ke Wang, 2002 IEEE International Conference on Data Mining (ICDM'02) , December 09 - 12 2002, Maebashi City, Japan. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin.

gaura
Télécharger la présentation

Mining Association Rules from Stars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Department of Information & Computer Education, NTNU Mining Association Rules from Stars Eric Ka Ka Ng, Ada Wai-Chee Fu, and Ke Wang, 2002 IEEE International Conference on Data Mining (ICDM'02),December 09 - 12 2002, Maebashi City, Japan. Advisor:Jia-Ling Koh Speaker:Chen-Yi Lin

  2. Department of Information & Computer Education, NTNU Outline • Introductions • Problem Definition • The Proposed Method • Experimental Results • Conclusions

  3. Department of Information & Computer Education, NTNU Dimension table Fact table (FT) Introductions • In real life, a database is typically made up of multiple tables and one important case is where some of the tables form a star schema.

  4. Department of Information & Computer Education, NTNU Problem Definition (1/2) • Dimension table contains primary key (tid), some other attributes and no foreign keys. • The attributes in the dimension tables are unique. • The attributes take categorical values. • Fact table (FT) • stores the tids from dimension tables as foreign keys.

  5. Department of Information & Computer Education, NTNU tid categorical value Problem Definition (2/2) Dimension table and its binary representation

  6. Department of Information & Computer Education, NTNU The Proposed Method (1/8) • tid_list is an ordered list of elements of the form tid(count). • : e.g. • : e.g. • : e.g.

  7. Department of Information & Computer Education, NTNU count=6 count=5 Hence the itemset is frequent The Proposed Method (2/8) Minsup=5

  8. Department of Information & Computer Education, NTNU The Proposed Method (3/8) • Binding multiple Dimension Tables • (1) To assign each combination of tid from A and tid from B in FT a new tid • (2) and to set the tid in the tid_lists for items in AB to the corresponding new tid.

  9. Department of Information & Computer Education, NTNU The set of frequent itemsets with items from tables A and/or B The Proposed Method (4/8) The set of frequent itemsets with items from tables A An example of “binding” order

  10. Department of Information & Computer Education, NTNU (1) (2) The Proposed Method (5/8)

  11. Department of Information & Computer Education, NTNU The Proposed Method (6/8) • The fact table FT is scanned once and the information is stored into a data structure • Prefix Tree • each node has a label (a tid) and a counter.

  12. Department of Information & Computer Education, NTNU The Proposed Method (7/8) counter tid Prefix tree structure representing

  13. Department of Information & Computer Education, NTNU The Proposed Method (8/8) Collapsing the prefix tree

  14. Department of Information & Computer Education, NTNU Experimental Results (1/5) • All experiments are conducted on SUN Ultra-Enterprise Generic_106541-18 with SunOS 5.7 and 8192MB Main Memory. • Programs are written in C++.

  15. Department of Information & Computer Education, NTNU Experimental Results (2/5) • In the first dataset, items in A and B are strongly related, such that frequent itemsets contain items across A and B, while items in C are not involved. • In the second dataset, items in A, B and C are all strongly related, so that maximal frequent itemsets always contain items from all of A, B and C.

  16. Department of Information & Computer Education, NTNU Experimental Results (3/5) masl: implementing tid_list as a linked list structure masb: implementing tid_list as a fixed-size bitmap and an array of count fpt: the join-before-mine approach with FP-tree algorithm [HPY00] Running time for (A, B) related and (A, B, C) related datasets

  17. Department of Information & Computer Education, NTNU Experimental Results (4/5) • Mixture datasets • 10% of transactions contain frequent itemsets from only A, B, C, respectively. • 15% contain frequent itemsets from AB, BC, AC, respectively. • 10% contain frequent itemsets from ABC. • 15% are random noise.

  18. Department of Information & Computer Education, NTNU Experimental Results (5/5) Running time for mixture datasets

  19. Department of Information & Computer Education, NTNU Conclusions • In the paper, the proposed method is a new algorithm for mining association rules on a star schema without performing the natural join. • The proposed method can be generalized to be applied to a snowflake structure.

More Related