1 / 13

Discriminative Frequent Pattern Analysis for Effective Classification

Discriminative Frequent Pattern Analysis for Effective Classification. Hong Cheng, Xifeng Yan, Jiawei Han and Chih-Wei Hsu ICDE 2007. Outline. Introduction The framework of Frequent Pattern-based Classification Experimental Results Conclusion. Introduction.

aileen
Télécharger la présentation

Discriminative Frequent Pattern Analysis for Effective Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminative Frequent Pattern Analysis for Effective Classification Hong Cheng, Xifeng Yan, Jiawei Han and Chih-Wei Hsu ICDE 2007

  2. Outline • Introduction • The framework of Frequent Pattern-based Classification • Experimental Results • Conclusion

  3. Introduction • The use of frequent patterns without feature selection will result in a huge feature space. • This might slow down the model learning process. • The classification accuracy deteriorates. • An effective and efficient feature selection algorithm is proposed to select a set of frequent and discriminative patterns for classification.

  4. Frequent Pattern vs. Single Feature • The discriminative power of some frequent patterns is higher than that of single features. (a) Austral (b) Cleve (c) Sonar Fig. 1. Information Gain vs. Pattern Length

  5. The Framework of Frequent Pattern-based Classification • It includes three steps: • Feature generation • Feature selection • Model learning

  6. Discriminative Power v.s. Pattern frequency • This paper demonstrates that the discriminative power of low-support features is limited. • The low-support features could harm the classification accuracy due to overfitting.

  7. Cont. • The discriminative power of a pattern is closely related to its support For a pattern represented by a random variable X, Given a DB with a fixed class distribution, H(C) is a constant. IGub(C|X) is closely related to If H(C|X) reaches its lower bound when q=0 or 1 Therefore, the discriminative power of low frequency patterns is bounded by a small value.

  8. Empirical Results (b) Breast (c) Sonar (a) Austral Fig. 2. Information Gain vs. Pattern Frequency

  9. Set min_sup • A subset of high quality features are selected for classification,with • Because , features with support can be skipped. • The major steps: • Compute • Choose • Find • Mine frequent patterns with

  10. Feature Selection • Given a set of frequent patterns, both non-discriminative and redundant patterns exist. • We want to single out the discriminative patterns and remove redundant ones • The notion of Maximal Marginal Relevance (MMR) is borrowed

  11. Experimental Results

  12. Scalability Tests

  13. Conclusion • An Effective and efficient feature selection algorithm is proposed to select a set of frequent and discriminative patterns for classification. • Scalability issue • It is computationally infeasible to generate all feature combinations and filter them with an information gain threshold • Efficient method (DDPMine: FPtree pruning): H. Cheng, X. Yan, J. Han, and P. S. Yu, "Direct Discriminative Pattern Mining for Effective Classification", ICDE'08.

More Related