1 / 19

Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18. This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007. Outline. Background and Motivation Support Cluster Machine - SCM Kernel in SCM

sheena
Télécharger la présentation

Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007.

  2. Outline • Background and Motivation • Support Cluster Machine - SCM • Kernel in SCM • Experiments • An Interesting Application: Privacy-preserving Data Mining • Discussions

  3. Large scale classification problem Decomposition methods Osuna et al., 1997; Joachims, 1999; Platt, 1999; Collobert & Bengio, 2001; Keerthi et al., 2001; Incremental algorithms Cauwenberghs & Poggio, 2000; Fung & Mangasarian, 2002; Laskov et al., 2006; Parallel techniques Collobert et al., 2001; Graf et al., 2004; Approximate formula Fung & Mangasarian, 2001; Lee & Mangasarian, 2001; Choose representatives Active learning -Schohn & Cohn, 2003; Cluster Based-SVM -Yu et al., 2003; Core Vector Machine (CVM) -Tsang et al., 2005; Clustering SVM -Boley, D. & Cao, 2004; Background and Motivation

  4. Given training samples: Procedure Support Cluster Machine - SCM

  5. Dual representation Decision function SCM Solution

  6. Probability product kernel By Gaussian assumption, i.e., Hence Kernel

  7. Property I That is Decision function Property II Kernel

  8. Datasets Toydata MNIST – Handwritten digits (‘0’-’9’) classification Adult – Privacy-preserving Dataset Clustering algorithms Threshold Order Dependent (TOD) EM algorithm Classification methods libSVM SVMTorch SVMlight CVM (Core Vector Machine) SCM Model selection CPU: 3.0GHz Experiments

  9. Toydata • Samples: 2500 samples/class generated from a mixture of Gaussian distribution • Clustering algorithm: TOD • Clustering results: 25 positive, 25 negative

  10. MNIST • Data description • 10 classes: Handwritten digits ‘0’-’9’ • Training samples: 60,000, about 6000 for each class • Testing samples: 10,000 • Construct 45 binary classifiers • Results • 25 Clusters for EM algorithm

  11. MNIST • Test results for TOD algorithm

  12. Privacy-preserving Data Mining • Inter-Enterprise data mining • Problem: Two parties owning confidential databases wish to build a decision-tree classifier on the union of their databases, without revealing any unnecessary information. • Horizontally partitioned • Records (users) split across companies • Example: Credit card fraud detection model • Vertically partitioned • Attributes split across companies • Example: Associations across websites

  13. Privacy-preserving Data Mining • Randomization approach 30 | 70K | ... 50 | 40K | ... ... Randomizer Randomizer 65 | 20K | ... 25 | 60K | ... ... Reconstruct distribution of Age Reconstruct distribution of Salary ... Data Mining Algorithms Model

  14. Classification Example

  15. Privacy-preserving Dataset: Adult • Data description • Training samples: 30162 • Testing samples: 15060 • Percentage of positive samples: 24.78% • Procedure • Horizontally partition data into three subsets (parties) • Cluster by TOD algorithm • Obtain three positive and three negative GMMs • Combine positive and negative GMMs into one positive and one negative GMMs with modified priors • Classify them by SCM

  16. Privacy-preserving Dataset: Adult • Partition results • Experimental results

  17. Discussions • Solved problems • Large scale problems: downsample by clustering + classifier • Privacy-preserving problems: hide individual information • Differences to other methods • Training units are generative model, testing units are vectors • Training units contain complete statistical information • Only one parameter for model selection • Easy implementation • Generalization ability is not clear, while the RBF kernel in SVM has the property of larger width leads to lower VC dimension.

  18. Discussions • Advantages of using priors and covariances

  19. Thank you!

More Related