1 / 26

Incremental Maintenance of Ontology-Exploiting Association Rules

Incremental Maintenance of Ontology-Exploiting Association Rules. Ming-Cheng Tseng 1 , Wen-Yang Lin 2 and Rong Jeng 3 1, 3 Institute of Information Engineering, I-Shou University, Taiwan 2 Dept. of Comp. Sci. & Info. Eng., National University of Kaohsiung, Taiwan August 20, 2007. Outline.

azia
Télécharger la présentation

Incremental Maintenance of Ontology-Exploiting Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incremental Maintenance of Ontology-Exploiting Association Rules Ming-Cheng Tseng1, Wen-Yang Lin2 and Rong Jeng3 1, 3 Institute of Information Engineering, I-Shou University, Taiwan 2 Dept. of Comp. Sci. & Info. Eng., National University of Kaohsiung, Taiwan August 20, 2007

  2. Outline • Introduction • Problem description • The proposed algorithm • Performance evaluation • Conclusions

  3. Introduction • Motivation • In general, there exist lots of semantic relationships (domain knowledge) among items • It is natural to incorporate domain ontology into the process of data mining to explore more innovative rules • The source databases are changing over time • E.g., insertion, deletion, modification • The discovered knowledge (rules) has to be updated to reflect new situation

  4. Introduction (cont.) • Association rules • Given: • A database of customer transactions • Each transaction is a set of items • Find all rules XY that correlate the presence of one set of items X with another set of items Y • Example: Sony VAIOHP LaserJet 1300 (Sup. = 30%, Conf.= 60%)

  5. Introduction (cont.) • Strong association rules • Given: • User’s specified constraints • Minimum support (min_sup) • minimum confidence (min_conf) • Finding rules XY with support and confidence larger than the user’s specified minimum values • Example: • min_sup = 25%, min_conf = 50% Sony VAIOHP LaserJet 1300 (Sup. = 30%, Conf.= 60%)

  6. Introduction (cont.) • Frequent itemsets (patterns) mining • The association mining problem can be reduced to the problem of mining frequent itemsets, i.e., itemsets with support larger than min_sup • Example • min_sup = 25%, min_conf = 50% sup({Sony VAIO, HP LaserJet 1300}) = 30% sup({Sony VAIO}) = 50% Sony VAIOHP LaserJet 1300 (Sup. = 30%, Conf.= 60%)

  7. Introduction (cont.) • Ontology • W3C Web Ontology Working Group “An ontology formally defines a common set of terms that are used to describe and represent a domain knowledge.” • e.g., taxonomy: a kind of ontology presenting classification relationship among objects

  8. Introduction (cont.) • Ontology-exploiting association rules IBM 60GB HD => HP DeskJet

  9. Problem Description • Incremental maintenance of ontology-exploiting association rules • Given: • A database of customer transactions DB • An incremental database db • An item ontology T • Discovered frequent itemsets in DB,L • minimum support, ms, and minimum confidence, mc • Find all frequent itemsets in UD = DB + db w.r.t. ms • Construct all strong rules from the frequent itemsets w.r.t. mc

  10. Problem Description (cont.) -- Example Customer transactions DB Item ontology G minsup = 70% (algorithms AROC, AROS) Discovered frequent itemsets L

  11. Problem Description (cont.) • Example Item ontology G Customer transactions DB minsup = 70% Updated frequent itemsets L’ ?? Incremental transactions db

  12. ABCD ABC ABD ACD BCD AB AC AD BC BD CD A B C D The Proposed Algorithm – IMARO • Basic scheme • An Apriori-based maintenance algorithm • Employing a bottom-up, level-wise searching strategy • Starting from frequent 1-itemset, L1, then L2, …, Lk, etc.

  13. The Proposed Algorithm – IMARO (cont.) • Terminology

  14. The Proposed Algorithm – IMARO (cont.) • Example

  15. The Proposed Algorithm – IMARO (cont.) • Note on database extension • A component item may exist as a primitive item itself • To clarify the meaning of associations involving such an item, we have to differentiate the role this item play e.g., IBM TP => Ink Cartridge buy an IBM TP notebook, also buy an Ink Cartridge buy an IBM TP notebook, also buy an product composed of Ink Cartridge

  16. The Proposed Algorithm – IMARO (cont.) • Process flow for updating frequent k-itemsets e.g., AROC or AROS

  17. The Proposed Algorithm – IMARO (cont.) • Frequent/infrequent itemsets inference

  18. The Proposed Algorithm – IMARO (cont.) • Optimization 1: Candidate pruning • Any candidate itemset that contains both an item and anyone of its extensions (generalized item or component) is pruned. {Epson EPL, Printer} {Epson EPL, Toner Cartridge*}

  19. Printer PC - - - - HP Epson Sony IBM DeskJet EPL VAIO TP - - - - Ink Photo Toner S RAM IBM Cartridge Conductor Cartridge 60GB 256MB 60GB The Proposed Algorithm – IMARO (cont.) • Optimization 2: Extension filtering • The extension of an item can be added only if that item does appear in at least one candidate itemset being counted currently

  20. Performance Evaluation • Compared with applying our proposed algorithms, AROC and AROS, to the whole database DB+db with T • Test data • A synthetic dataset generated by the IBM data generator with artificially–built ontology

  21. Performance Evaluation (cont.) • Varying minimum supports |db| = 40,000

  22. Performance Evaluation (cont.) • Varying incremental transaction size ms = 1.5%

  23. Conclusions • We have investigated the problem of updating ontology-exploiting association rules when new transactions are inserted into the database • An Apriori-based algorithm is proposed • Other issues • More complicatedsemantic relationships and knowledge • Non-uniform minimum support • Generalized item or composite item occurs more frequently • Towards a total solution for evolving environments • Ontology evolution, database update • Interactive refinement of support constraints • …

  24. Thanks for your attention!

  25. *source: 1993, Veda C. Storey, VLDB journal Conclusions (cont.) • Taxonomy of semantic relationships

  26. Related Work • Comparison with previous work

More Related