Support Cluster Machine: A Privacy-Preserving Data Mining Approach

Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007.

Outline • Background and Motivation • Support Cluster Machine － SCM • Kernel in SCM • Experiments • An Interesting Application: Privacy-preserving Data Mining • Discussions

Large scale classification problem Decomposition methods Osuna et al., 1997; Joachims, 1999; Platt, 1999; Collobert & Bengio, 2001; Keerthi et al., 2001; Incremental algorithms Cauwenberghs & Poggio, 2000; Fung & Mangasarian, 2002; Laskov et al., 2006; Parallel techniques Collobert et al., 2001; Graf et al., 2004; Approximate formula Fung & Mangasarian, 2001; Lee & Mangasarian, 2001; Choose representatives Active learning －Schohn & Cohn, 2003; Cluster Based-SVM －Yu et al., 2003; Core Vector Machine (CVM) －Tsang et al., 2005; Clustering SVM －Boley, D. & Cao, 2004; Background and Motivation

Given training samples: Procedure Support Cluster Machine － SCM

Dual representation Decision function SCM Solution

Probability product kernel By Gaussian assumption, i.e., Hence Kernel

Property I That is Decision function Property II Kernel

Datasets Toydata MNIST – Handwritten digits (‘0’-’9’) classification Adult – Privacy-preserving Dataset Clustering algorithms Threshold Order Dependent (TOD) EM algorithm Classification methods libSVM SVMTorch SVMlight CVM (Core Vector Machine) SCM Model selection CPU: 3.0GHz Experiments

Toydata • Samples: 2500 samples/class generated from a mixture of Gaussian distribution • Clustering algorithm: TOD • Clustering results: 25 positive, 25 negative

MNIST • Data description • 10 classes: Handwritten digits ‘0’-’9’ • Training samples: 60,000, about 6000 for each class • Testing samples: 10,000 • Construct 45 binary classifiers • Results • 25 Clusters for EM algorithm

MNIST • Test results for TOD algorithm

Privacy-preserving Data Mining • Inter-Enterprise data mining • Problem: Two parties owning confidential databases wish to build a decision-tree classifier on the union of their databases, without revealing any unnecessary information. • Horizontally partitioned • Records (users) split across companies • Example: Credit card fraud detection model • Vertically partitioned • Attributes split across companies • Example: Associations across websites

Privacy-preserving Data Mining • Randomization approach 30 | 70K | ... 50 | 40K | ... ... Randomizer Randomizer 65 | 20K | ... 25 | 60K | ... ... Reconstruct distribution of Age Reconstruct distribution of Salary ... Data Mining Algorithms Model

Classification Example

Privacy-preserving Dataset: Adult • Data description • Training samples: 30162 • Testing samples: 15060 • Percentage of positive samples: 24.78% • Procedure • Horizontally partition data into three subsets (parties) • Cluster by TOD algorithm • Obtain three positive and three negative GMMs • Combine positive and negative GMMs into one positive and one negative GMMs with modified priors • Classify them by SCM

Privacy-preserving Dataset: Adult • Partition results • Experimental results

Discussions • Solved problems • Large scale problems: downsample by clustering + classifier • Privacy-preserving problems: hide individual information • Differences to other methods • Training units are generative model, testing units are vectors • Training units contain complete statistical information • Only one parameter for model selection • Easy implementation • Generalization ability is not clear, while the RBF kernel in SVM has the property of larger width leads to lower VC dimension.

Discussions • Advantages of using priors and covariances

Thank you!

Support Cluster Machine: A Privacy-Preserving Data Mining Approach

Support Cluster Machine: A Privacy-Preserving Data Mining Approach

Presentation Transcript

Continuing Cluster Support

Read this paper

RFID Paper Read Report

Lessons from 9 countries, 18 cluster programmes and 150 cluster organisations in Europe

Keon Jang 2007. 10. 18

CPMR GENERAL ASSEMBLY Florence - 18/10/2007

Peshawar Cluster 2007

paper cutting machine

MBM 207M Paper Folding Machine by Printfinish.com

Toilet paper machine

#THALI MAKING MACHINE#SHEET MACHINE#PAPER PUNCHING MACHINE

Paper Circle Cutting Machine-sokhilaminationandpaperproducts.com- Paper Slitting Machine-paper lamination machine

Paper cup Machine India | Paper cup Manufacturers | Bharath Paper cup Machine

PRINCIPALS’ SUPPORT CLUSTER

Scientific Cluster Support Program

Scientific Cluster Support Program

PaPCo Support for Cluster