1 / 26

Multi-Label Collective Classification

Multi-Label Collective Classification. Xiangnan Kong Xiaoxiao Shi Philip S. Yu. University of Illinois at Chicago. Collective Classification. Conventional classification approaches assume that instances are independent identically distributed ( i.i.d . ) . instance. label.

parley
Télécharger la présentation

Multi-Label Collective Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Label Collective Classification Xiangnan Kong Xiaoxiao Shi Philip S. Yu University of Illinois at Chicago

  2. Collective Classification • Conventional classification approaches assume that instances are independentidentically distributed (i.i.d.) instance label x1 y1 independent x2 y2 • In relational data and information networks, instances are correlated with each other. x1 y1 related x2 x3 y2 y3

  3. Example: Collective Classification y1 • Given a set of web pageslinkedwith each other,we need to classify categories • Task:Predict the labelsof webpages collectively, while considering dependencies among linked webpages y2 y5 y3 Training Data y6 y4 ? ? Test Data ? ? ? ? ?

  4. Examples Coauthor Networks Business Network

  5. Collective Classification • Collective Classification Given a set of instances which are related to each other, how to predict their labels simultaneously • Existing Methods • Exploit the dependencies among related instances • Focused on single-label settings • Assume one instance can only have one label

  6. Multi-label Collective Classification DM IR Research Area How to effectively predict the label sets of a group of related instances? DM collaborations DB DB IR AI AI

  7. The Problem • Multiple labels: # possible label sets is very large (the power set of all labels) 20 labels  1 million label sets The key is to exploit the correlationsamong the multiple labels • Relational Data: the label sets of related instances are correlated with each other.

  8. Intra-instance Cross-label Dependency Y1 … x2 Yk … Ym Y1 e.g. “X1 is more likely to be DM, if labeled with DB or ML” e.g. “X1 is unlikely to be Bio, if labeled with OS … x1 Yk … Ym Y1 … x3 Yk … Ym

  9. Inter-instance Single-label Dependency Y1 … x2 Yk … Ym Y1 e.g. “X1 is more likely to be DM, if collaborators (X2 X3) are labeled with DM” … x1 Yk … Ym Y1 … x3 Yk … Ym

  10. Inter-instance Cross-label Dependency Y1 … x2 Yk … Ym Y1 e.g. “X1 is more likely to be DM, if collaborators (X2 X3) are labeled with DB or ML” … x1 Yk … Ym Y1 … x3 Yk … Ym

  11. All dependencies: our approach Y1 … x2 Yk … Ym Y1 … x1 Yk … Ym Y1 … x3 Yk … Ym

  12. Relational Feature Aggregation relational features content features labels 1 2 3 1 1 0 1 Y1 1 0 1 1 0 x1 1 1 0 Y2 0 1 1 0 1 1 Y3 1 0 1 1 0 Y1 1 x2 1 1 1 1 0 0 Y2 Intra-Instance Cross-Label … 1 1 0 0 1 1 1 Y3 Inter-Instance Single-Label 2 Inter-Instance Cross-Label 3

  13. Inference only use Content Features IterativeClassification of Multiple Labels Initialize label sets usingpredicted label sets Update Relational Features usingcontent feature + relational feature Update label sets

  14. The ICML Approach Properties: Simple & Efficient: train multiple local models to perform collective classification on multiple labels Effective: By considering the dependencies among related instances and multiple labels, the classification performance can be greatly improved over independent models.

  15. Experiments: Compared Methods Dependencies Exploited • Binary classification • Binary SVMbinary decomposition + SVM [Boutell et.al., PR’04]none • Multi-label classification • ECC & CCensemble + classifier chains[Read et.al., ECML’09] 1 • Collective classification • ICAiterative classification algorithm [Lu&Getoor, ICML’03] 2 • Multi-label collective classification • ML-ICAa proposed baseline[this paper] • ICMLthe proposed approach [this paper] 1 2 1 2 3 Intra-Instance Cross-Label 1 Inter-Instance Cross-Label Inter-Instance Single-Label 3 2

  16. Experiments: Data Sets • Research Collaboration Networks (DBLP) • Node: Researcher • Features: bag-of-words for paper titles • Link: Collaboration • Label: Research Area (DB, AI, IR, OS, etc) • Movie Database (IMDB) • Node: movie • Features: bag-of-words for movie plot • Link: share director • Label: movie type (comedy, horror, etc)

  17. Evaluation • Multi-Label Metrics • Hamming Loss ↓[Elisseef&Weston NIPS’02] average #labels being misclassified • Subset 0/1 Loss ↓[Ghamrawi&McCallum CIKM’05] average #label sets being misclassified • Micro-F1 ↑[Ghamrawi&McCallum CIKM’05] micro average of F1 score • Macro-F1 ↑[Ghamrawi&McCallum CIKM’05] macro average of F1 score ↓the smaller the better ↑ the larger the better • 5-fold cross-validation

  18. Experiment Results DBLP-A Dataset Y1 Y1 … … x1 x2 Yk Yk … … Ym Ym

  19. Experiment Results DBLP-A Dataset Y1 Y1 … … … x2 x1 Yk Yk … … Ym Ym Intra-Instance Cross-Label 1

  20. Experiment Results DBLP-A Dataset … Y1 Y1 … … x2 x1 Yk Yk … … … Ym Ym Inter-Instance Single-Label 2

  21. Experiment Results DBLP-A Dataset Y1 Y1 … … x2 x1 Yk Yk … … Ym Ym Intra-Instance Cross-Label 1 Inter-Instance Single-Label 2

  22. Experiment Results DBLP-A Dataset Y1 Y1 … … x2 x1 Yk Yk … … Ym Ym Intra-Instance Cross-Label 1 Inter-Instance Cross-Label Inter-Instance Single-Label 3 2

  23. Experiment Results DBLP-B Dataset

  24. Experiment Results IMDB Dataset • Our approach performed best at DBLP and IMDB datasets

  25. Experiment Results ICML approach #Iteration DBLP-A dataset

  26. Conclusions • Multi-labelCollective Classification • Propose an algorithm to exploit the dependencies among label sets of related instances • Intra-instance Cross-label Dependency • Inter-instance Single-label Dependency • Intra-instance Cross-label Dependency • Classification performances can be improved by considering the dependencies among instances and different labels. Thank you!

More Related