340 likes | 361 Vues
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002. Olvi L. Mangasarian. Data Mining Institute University of Wisconsin - Madison. What is a Support Vector Machine?. An optimally defined surface
E N D
Support Vector Machines in Data MiningAFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University of Wisconsin - Madison
What is a Support Vector Machine? • An optimally defined surface • Linear or nonlinear in the input space • Linear in a higher dimensional feature space • Implicitly defined by a kernel function
What are Support Vector Machines Used For? • Classification • Regression & Data Fitting • Supervised & Unsupervised Learning
Principal Contributions • Lagrangian support vector machine classification • Fast, simple, unconstrained iterative method • Reduced support vector machine classification • Accurate nonlinear classifier using random sampling • Proximal support vector machine classification • Classify by proximity to planes instead of halfspaces • Massive incremental classification • Classify by retiring old data & adding new data • Knowledge-based classification • Incorporate expert knowledge into classifier • Fast Newton method classifier • Finitely terminating fast algorithm for classification • Breast cancer prognosis & chemotherapy • Classify patients on basis of distinct survival curves
Principal Contributions • Proximal support vector machine classification
Support Vector MachinesMaximize the Margin between Bounding Planes A+ A-
Proximal Support Vector Machines Maximize the Margin between Proximal Planes A+ A-
Given m points in n dimensional space • Represented by an m-by-n matrix A • Membership of each in class +1 or –1 specified by: • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, • More succinctly: where e is a vector of ones. Standard Support Vector MachineAlgebra of 2-Category Linearly Separable Case
Solve the quadratic program for some : min (QP) , s. t. where , denotes or membership. • Marginis maximized by minimizing Standard Support Vector Machine Formulation
min (QP) s. t. Solving for in terms of and gives: min PSVM Formulation Standard SVM formulation: This simple, but critical modification, changes the nature of the optimization problem tremendously!!
Advantages of New Formulation • Objective function remains strongly convex. • An explicit exact solution can be written in terms of the problem data. • PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space. • Exact leave-one-out-correctness can be obtained in terms of problem data.
We want to solve: min Linear PSVM • Setting the gradient equal to zero, gives a nonsingular system of linear equations. • Solution of the system gives the desired PSVM classifier.
Here, • The linear system to solve depends on: which is of size is usually much smaller than Linear PSVM Solution
Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r
Numerical experimentsOne-Billion Two-Class Dataset • Synthetic dataset consisting of 1 billion points in 10- dimensional input space • Generated by NDC (Normally Distributed Clustered) dataset generator • Dataset divided into 500 blocks of 2 million points each. • Solution obtained in less than 2 hours and 26 minutes • About 30% of the time was spent reading data from disk. • Testing set Correctness 90.79%
Principal Contributions • Knowledge-based classification
Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace : • We therefore have the implication: Incoporating Knowledge Sets Into an SVM Classifier • This implication is equivalent to a set of constraints that can be imposed on the classification problem.
Numerical TestingThe Promoter Recognition Dataset • Promoter: Short DNA sequence that precedes a gene sequence. • A promoter consists of 57 consecutive DNA nucleotides belonging to {A,G,C,T} . • Important to distinguish between promoters and nonpromoters • This distinction identifies starting locations of genes in long uncharacterizedDNA sequences.
Wisconsin Breast Cancer Prognosis Dataset Description of the data • 110 instances corresponding to 41 patients whose cancer had recurred and 69 patients whose cancer had not recurred • 32 numerical features • The domain theory: two simple rules used by doctors:
Wisconsin Breast Cancer Prognosis Dataset Numerical Testing Results • Doctor’s rules applicable to only 32 out of 110 patients. • Only 22 of 32 patients are classified correctly by this rule (20% Correctness). • KSVM linear classifier applicable to allpatients with correctness of 66.4%. • Correctness comparable to best available results using conventional SVMs. • KSVM can get classifiers based on knowledge without using any data.
Principal Contributions • Fast Newton method classifier
Fast Newton Algorithm for Classification Standard quadratic programming (QP) formulation of SVM:
Newton Algorithm • Newton algorithm terminates in a finite number of steps • Termination at global minimum • Error rate decreases linearly • Can generate complex nonlinear classifiers • By using nonlinear kernels: K(x,y)
Principal Contributions • Breast cancer prognosis & chemotherapy
Kaplan-Meier Curves for Overall Patients:With & Without Chemotherapy
Breast Cancer Prognosis & ChemotherapyGood, Intermediate & Poor Patient Clustering
Kaplan-Meier Survival Curvesfor Good, Intermediate & Poor Patients
Kaplan-Meier Survival Curves for Intermediate Group: With & Without Chemotherapy
Conclusion • New methods for classification proposed • All based on rigorous mathematical foundation • Fast computational algorithms capable of classifying massive datasets • Classifiers based on both abstract prior knowledge as well as conventional datasets • Identification of breast cancer patients that can benefit from chemotherapy
Future Work • Extend proposed methods to standard optimization problems • Linear & quadratic programming • Preleminary results beat state-of-the-art software • Incorporate abstract concepts into optimization problems as constraints • Develop fast online algorithms for intrusion and fraud detection • Classify the effectiveness of new drug cocktails in combating various forms of cancer • Encouraging preliminary results