1 / 21

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier. IBM Research – China IBM T.J.Watson Research Center Presenter: Xiatian Zhang xiatianz@cn.ibm.com Authors: Xiatian Zhang, Quan Yuan, Shiwan Zhao,

astin
Télécharger la présentation

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier IBM Research – China IBM T.J.Watson Research Center Presenter: Xiatian Zhang xiatianz@cn.ibm.com Authors: Xiatian Zhang, Quan Yuan, Shiwan Zhao, Wei Fan, Wentao Zheng, Zhong Wang

  2. Multi-label Classification Tree Winter • Classical Classification (Single Label Classification) • The classes are exclusive: if an example belongs to one class, it can’t be belongs to others • Multi-label Classification • A picture, video, article may belong to several compatible categories • A pieces of gene can control several biological functions Park Ice Lake

  3. Existed Multi-label Classification Methods • Grigorios Tsoumakas et al[2007] summarize the existing methods for ML-Classification • Two Strategies • Problem Transformation • Transfer Multi-label Classification Problem to Single Classification Problem • Algorithm Adaptation • Adapt Single-label Classifiers to Solve the Multi-label Classification Problem • With high complexity

  4. Classifier L1 L2 L3 L1 L1 L2 L2 L1 L3 L3 L3 L4 L1+ L2+ L3+ L1- L2- L3- Classifier1 Classifier2 Classifier3 Problem Transformation Approaches • Label Powerset (LP) • Label Powerset considers each unique subset of labels that exists in the multi-label dataset as a single label • Binary Relevance (BR) • Binary Relevance learns one binary classier for each label

  5. Large Number of Labels Problem • Hundreds and even more labels • Text categorization • protein function classification • semantic annotation of multimedia • The Impacts to Multi-label Classification Methods • Label Powerset: the number of training examples for each particular label will be much less • Binary Relevance: The computational complexity is with linear complexity with respect to the number of labels • Algorithm Adaptation: Even more worse than Binary Relevance

  6. HOMER for Large Number of Labels Problem • HOMER (Hierarchy Of Multilabel classifERs) is developed by Grigorios Tsoumakas et al, 2008. • The HOMER algorithm constructs a Hierarchy Of Mul-tilabel classifERs, each one dealing with a much smaller set of labels.

  7. Our Method – Without Label Cost • Without Label Cost • Training Time is almost irrelevant with number of labels |L| • But with Reliable Quality • The classification Quality can be compared to mainstream methods over different data sets. • How to make it?

  8. Our Method – Without Label Cost cont. • Binary Relevance Method based on Random Decision Tree • Random Decision Tree [Fan et al, 2003] • Training Process is irrelevant with label information • Random Construction with very low cost • Stable quality on many applications

  9. Random Decision Tree – Tree Construction • At each node, an un-used feature is chosen randomly • A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node. • A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen • It stop when one of the following happens: • A node becomes too small (<= 4 examples). • Or the total height of the tree exceeds some limits: • Such as the total number of features. • The construction process is irrelevant with label information

  10. F1<0.5 Y N F2>0.7 F3>0.3 Y N N +:200 -: 10 … … +:30 -: 70 Random Decision Tree - Node Statistics • Classification and Probability Estimation: • Each node of the tree keeps the number of examples belonging to each class. • The node statistics process cost a little computation resource

  11. F1<0.5 Y N F2>0.7 F3>0.3 Y N N +:200 -: 10 … … +:30 -: 70 Random Decision Tree - Classification • During classification, each tree outputs posterior probability: P(+|x)=30/100 =0.3

  12. F3>0.3 F1<0.5 Y Y N N F2>0.7 F2<0.6 F3>0.3 F1>0.7 Y Y N N N N +:200 -: 10 +:100 -:120 … … … … +:30 -: 20 +:30 -: 70 Random Decision Tree - Ensemble • For a instance x, average the estimated probability on each tree and take the average probability as the predicted probability of x. P’(+|x)=30/50 =0.6 P(+|x)=30/100=0.3 (P(+|x)+P’(+|x))/2 = 0.45

  13. F1<0.5 F3>0.5 Y Y N N F2>0.7 F2<0.7 F3>0.3 F1>0.7 N N Y Y N N L1+:100 L1-:120 L1+:200 L1-: 10 L2+:40 L2-: 60 L1+:200 L1-: 10 … … … … L1+:30 L1-: 70 L1+:30 L1-: 20 L2+:20 L2-: 80 L2+:50 L2-: 50 Multi-label Random Decision Tree P(L1+|x)=30/100=0.3 P’(L1+|x)=30/50 =0.6 P(L2+|x)=50/100=0.5 P’(L2+|x)=20/100=0.2 (P(L1+|x)+P’(L1+|x))/2 = 0.45 (P(L2+|x)+P’(L2+|x))/2 = 0.35

  14. Why RDT Works? • Ensemble Learning View • Our Analysis • Other Explanations • Non-Parametric Estimation

  15. Complexity of Multi-label Random Decision Tree • Training Complexity: • m is the number of trees, and n is the number of instances • t is the average number of labels on each leaf nodes, t<<n, and t<<|L|. • It is irrelevant with number of labels |L|. • Complexity of C4.5: Viis the size of values of i-th attribute. • Complexity of HOMER: • Test Complexity: • q is the average depth of branches of trees • It is also irrelevant with number of labels |L|

  16. Experiment – Metrics and Datasets • Quality Metrics: • Datasets:

  17. Experiment - Quality

  18. Experiment – Computational Cost

  19. Experiment – Computational Cost cont.

  20. Experiment – Computational Cost cont

  21. Future Works • Leverage the relationship of labels. • Apply ML-RDT for Recommendation • Parallelization and Streaming Implementation

More Related