1 / 9

Classification in Complex Systems

Classification in Complex Systems. Why we should look at the paper: CAEP: Classification by Aggregating Emerging Patterns G. Dong, X. Zhang, L. Wong, and J Li. What are Common Problems in Classification?. Many variables Graphs that relate tuples Protein-protein interactions (KDD-cup 02)

enan
Télécharger la présentation

Classification in Complex Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification in Complex Systems Why we should look at the paper: CAEP: Classification by Aggregating Emerging Patterns G. Dong, X. Zhang, L. Wong, and J Li

  2. What are Common Problems in Classification? • Many variables • Graphs that relate tuples • Protein-protein interactions (KDD-cup 02) • Citations (KDD-cup 03) • Anything that violates standard table format

  3. Many Variables Solution: • Naïve Bayes way of multiplying probabilities • Other additive models Problems: • Many factors • May be correlated • Noise … but it gets worse

  4. Graphs • 2 kinds of attributes • Attributes within nodes • Attributes of neighbor and more distant nodes • How do neighbor attributes count? • Take disjunction? • “At least one neighbor that has a particular property” • Probably preferable: • Use links or, more general, paths as basis • Integration into classification???

  5. Idea • Get away from strict set of n attributes • If an attribute or combination of attributes is “interesting” use them • Combining rules? • I would have guessed as in Naïve Bayes • CAEP adds probabilities!?

  6. What is “interesting” • CAEP paper claims “growth rate” • Support of a rule increases significantly from one class label to another • Note: Only increase, not decrease! • What does that mean? • For pattern e and classes P and N • growth_ratePN (e) = suppN (e) / suppP (e)

  7. 2 Things Worth Investigating • Is “interestingness” measure related to information gain? • Under certain assumptions: Yes • Can the “score” be justified? • Sum of P(C)!?

  8. Other Issues • Normalization • Emerging patterns only consider increase in support => different number of relevant patterns • How to mine for EPs

  9. Conclusions • Idea very valuable • Classification split into ARM-step and rule combination • Justification of details? • Not great • Should be possible to do it right – with poorer accuracy ;-)

More Related