1 / 29

Tighter and Convex Maximum Margin Clustering

Tighter and Convex Maximum Margin Clustering. Yu-Feng Li (LAMDA, Nanjing University, China) (liyf@lamda.nju.edu.cn) Ivor W. Tsang (NTU, Singapore) (IvorTsang@ntu.edu.sg)

penney
Télécharger la présentation

Tighter and Convex Maximum Margin Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) (liyf@lamda.nju.edu.cn) Ivor W. Tsang (NTU, Singapore) (IvorTsang@ntu.edu.sg) James T. Kwok (HKUST, Hong Kong) (jamesk@cse.ust.hk) Zhi-Hua Zhou (LAMDA, Nanjing University, China) (zhouzh@lamda.nju.edu.cn)

  2. Summary • Maximum Margin Clustering (MMC) [Xu et al., nips05] • inspired by the success of large margin criterion in SVM • the state-of-the-art performance in many clustering problems. • The problem of existing methods • SDP relaxation: global but not scalable • Local search: efficient but non-convex • We propose a convex LG-MMC method which is also scalable to large datasets via Label Generation strategy.

  3. Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion

  4. Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion

  5. Maximum Margin Clustering [Xu et.al., NIPS05] • Perform clustering (i.e., determining the unknown label y) by simultaneously finding maximum margin hyperplane in the data • Setting • Given a set of unlabeled pattern • Goal • Learn a decision function and a label vector Error Margin Balance Constraint

  6. Maximum Margin Clustering [Xu et.al., NIPS05] • The Dual problem • Key • Some kind of relaxation maybe helpful  Mixed integer program, intractable for large scale dataset 

  7. Related work • MMC with SDP relaxation [Xu et.al., nips05] • convex, state-of-the-art performance • Expensive: the worse-case O(n^6.5) • Generalized MMC [Valizadegan & Jin, nips07] • a smaller SDP problem which speedup MMC by 100 times • Still expensive: cannot handle medium datasets • Some efficient algorithms [Zhang et.al., icml07][Zhao et.al.,sdm08] • Much more scalable than global methods • Non-convex: may get struck in local minima To investigate a convex method which is also scalable for large datasets

  8. Outline • Introduction • The Proposed LG-MMC Method • Experiment Results • Conclusion

  9. Intuition efficient hard ? ? 1 -1 SVM ? SVM -1 ? 1 ? -1 combination efficient -1 1 1 -1 1 - yy’ : label-kernel SVM -1 -1 1 1 -1 - Multiple label-kernel learning

  10. Flow Chart of LG-MMC • LG-MMC: transform MMC problem to multiple label-kernel learning via minmax relaxation • Cutting Plane Algorithm • multiple label-kernel learning • Finding the most violated y • LG-MMC achieves tighter relaxation than SDP relaxation [Xu et al., nips05]

  11. LG-MMC: Minmax relaxation of MMC problem • Consider interchanging the order of and , leading to: • According to the minmax theorem, the optimal objective of LG-MMC is upper bound of that of MMC problem.

  12. LG-MMC: multiple label-kernel learning • Firstly, LG-MMC can be rewritten as: • For the inner optimization subproblem, let be the dual variable for each constraint. Its Lagrangian can be obtained as:

  13. LG-MMC: multiple label-kernel learning (cont.) • Setting its derivative w.r.t. to zero, we have • Let be the simplex • Replace the inner subproblem with its dual and one can have: • Similar to single label learning, the above formulation can be regarded as multiple label-kernel learning.

  14. Cutting Plane Algorithm • Problem: Exponential number of possible labeling assignment • the set of base kernels is also exponential in size • direct multiple kernel learning (MKL) is computationally intractable • Observation • only a subset of these constraints are active at optimality • cutting-planemethod

  15. Cutting Plane Algorithm 1. Initialize . Find the most violated y and set = {y,−y}. ( is the subset of constraints). 2. Run MKL for the subset of kernel matrices selected in . 3. Find the most violated y and set 4. Repeat steps 2-3 until convergence. How? How?

  16. Cutting Plane AlgorithmStep2: Multiple Label-Kernel Learning • Suppose that the current working set is • The feature map for the base kernel matrix : • SimpleMKL 1. Fix and solve the SVM’s dual 2. Fix and use gradient method for updating 3. Iterate until converge

  17. Cutting Plane Algorithm Step 3: Finding the most violated y • Find the most violated y: • Problem: Concave QP • Observation: • The cutting plane algorithm only requires the addition of a violated constraint at each iteration • Replace the L2 norm above with infinity-norm

  18. Cutting Plane Algorithm Step 3: Finding the most violated y • Each of these is of the form: • Sort ‘s • Balance constraint

  19. LG-MMC achieves tighter relaxation • Consider the set of all feasible label matrices and two relaxations Convex hull

  20. LG-MMC achieves tighter relaxation (cont.) • Define • One can find that • Maximum margin clustering is the same as • LG-MMC problem is the same as • SDP based MMC problem is the same as

  21. LG-MMC achieves tighter relaxation (cont.) • is the convex-hull of , which is the smallest convex set containing . • LG-MMC gives the tightest convex relaxation. • It can be shown that is more relaxed than . • SDP MMC is a looser relaxation than the proposed formulation.

  22. Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion

  23. Experiments Data sets 17 UCI dataset MNIST dataset Implementation Matlab 7.6 Evaluation Misclassification error

  24. Compared Methods • k-means • One of most mature baseline methods • Normalized Cut [Shi & Malik, PAMI00] • The first spectral based clustering method • GMMC [Valizadegan & Jin, nips07] • One of the most efficient global methods for MMC • IterSVR [Zhang et.al., icml07] • An efficient algorithm for MMC • CPMMC [Zhao et.al., sdm08] • Another state-of-the-art efficient method for MMC

  25. Clustering Error

  26. Win-tie-loss • Global method vs local method • Global method are better than local method. • LG-MMC vs GMMC • LG-MMC is competitive to GMMC method.

  27. Speed LG-MMC is about 10 times faster than GMMC However, In general, local methods are faster than global method.

  28. Outline • Introduction • The Proposed LG-MMC Method • Experiment Results • Conclusion

  29. Conclusion • Main Contribution • In this paper, we propose a scalable and global optimization method for maximum margin clustering • To our best knowledge, it is first time to use label-generation strategy for clustering which might be useful in other domains • Further work • In further, we will extend the proposed approach for semi-supervised learning. Thank you

More Related