1 / 38

Support Vector Machine - SVM

Support Vector Machine - SVM. Outline. Background: Classification Problem SVM Linear Separable SVM Lagrange Multiplier Method Karush-Kuhn-Tucker (KKT) Conditions Non-linear SVM: Kernel Non-Linear Separable SVM Lagrange Multiplier Method Karush-Kuhn-Tucker (KKT) Conditions. (X). (X).

natane
Télécharger la présentation

Support Vector Machine - SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machine - SVM

  2. Outline • Background: Classification Problem • SVM • Linear Separable SVM • Lagrange Multiplier Method • Karush-Kuhn-Tucker (KKT) Conditions • Non-linear SVM: Kernel • Non-Linear Separable SVM • Lagrange Multiplier Method • Karush-Kuhn-Tucker (KKT) Conditions (X) (X) (X) (X)

  3. Background – Classification Problem • The goal of classification is to organize and categorize data into distinct classes • A model is first created based on the previous data (training samples) • This model is then used to classify new data (unseen samples) • A sample is characterized by a set of features • Classification is essentially finding the best boundary between classes

  4. Classification Formulation • Given • an input space • a set of classes ={ } • the Classification Problem is • to define a mapping f: g where each xin  is assigned to one class • This mapping function is called a Decision Function

  5. Decision Function • The basic problem in classification problem is to find c decision functions with the property that, if a pattern x belongs to class i, then di(x) is some similarity measure between x and class i, such as distance or probability concept

  6. Decision Function • Example d1=d3 Class 1 d2,d3<d1 Class 3 d1,d2<d3 d1=d2 d3=d2 Class 2 d1,d3<d2

  7. Single Classifier • Most popular single classifiers: • Minimum Distance Classifier • Bayes Classifier • K-Nearest Neighbor • Decision Tree • Neural Network • Support Vector Machine

  8. Minimum Distance Classifier • Simplest approach to selection of decision boundaries • Each class is represented by a prototype (or mean) vector: where = the number of pattern vectors from • A new unlabelled sample is assigned to a class whose prototype is closest to the sample

  9. Bayes Classifier • Bayes rule • is the same for each class, therefore • Assign x to class j if • for all i

  10. Bayes Classifier • The following information must be known: • The probability density functions of the patterns in each class • The probability of occurrence of each class • Training samples may be used to obtain estimations on these probability functions • Samples assumed to follow a known distribution pattern

  11. 10 8 6 4 2 0 2 4 6 8 10 K-Nearest Neighbor • K-Nearest Neighbor Rule (k-NNR) • Examine the labels of the k-nearest samples and classify by using a majority voting scheme (7, 3) 投票結果 <8> <9> 1NN <10> <3> <2> 3NN <7> <1> 5NN <6> 7NN <4> 9NN <4> 物以類聚概念

  12. Decision Tree • The decision boundaries are hyper-planes parallel to the feature-axis • A sequential classification procedure may be developed by considering successive partitions of R

  13. Decision Trees • Example

  14. Connection Node Neural Network • A Neural Network generally maps a set of inputs to a set of outputs • Number of inputs/outputs vary • The network itself is composed of an arbitrary number of nodes with an arbitrary topology • It is an universal approximator

  15. Neural Network • A popular NN is the feed forward neural network • E.g. • Multi-layer Perception (MLP) • Radial-Based Function (RBF) • Learning algorithm: Back Propagation • Weights of nodes are adjusted based on how well the current weights match an objective Feed forward NN RBF

  16. What is SVM ? • SVM is a kind of classification algorithm. • Based on Statistical Learning Theory. • Similar to NN • Widely used in Binary Classification • Users do not have any rules for classification. • When a new data comes, SVM can predict which set it should belong to.

  17. SVM for classification Given a sequence of training vector

  18. Hyperplane (最佳的超平面) • SVM想要解決以下的問題:找出一個超平面(hyperplane),將兩個不同的集合分開。 • 超平面意指在高維中的平面。 • 希望能找出一個方程式,能將Class 1和Class2分開。 • 這條線距離這兩個集合的邊界(margin)越大越好

  19. Hyperplane (cont.) (座標原點)

  20. Hyperplane (cont.) Given a set Find out a 內積/dot product/inner product

  21. Hyperplane (cont.) • We can classify data set according to the function f(x). • 超平面: Separating Hyperplane • 使兩邊邊界距離最大的超平面: Optimal Separating Hyperplane (OSH)

  22. Example

  23. Support Hyperplane • 與optimal separating hyperplane平行,並且最靠近兩邊的超平面。 • Support Hyperplane is defined as • 要scale為1,等式乘上常數 (縮小解範圍,直線方程為無限多解) w is normal to hyperplane Support Hyperplane <原點> perpendicular distance

  24. Support Hyperplane (cont.) • D: Distance between separating hyperplane and two support hyperplane • Margin = distance between H1 and H2 • = 2D = 2/||w|| • , ||w||越小,D越大

  25. Support Hyperplane (cont.) • After scaling, the constraint function can be defined as

  26. SVM Problem • Goal: Find a separating hyperplane with largest margin. A SVM is to find w and b that satisfy

  27. SVM Problem (cont.) • Switch the above problem to a Lagrangian formulation for two reason • Easier to handle by transforming into quadratic eq. • Training data only appear in form of dot products (向量內積/ inner product) between vectors => can be generalized to nonlinear case

  28. Langrange Muliplier Method • A method to find the extremum of a multivariate function f(x1,x2,…xn) subject to the constraint g(x1,x2,…xn) = 0 • For an extremum of f to exist on g, the gradient of f must line up with the gradient of g . • for all k = 1, ...,n , where the constant λis called the Lagrange multiplier • The Lagrangian transformation of the problem is

  29. Langrange Muliplier Method • To have , we need to find the gradient of L with respect to w and b. (偏微分) (1) (2) • Substitute them into Lagrangian form, we have a dual problem Inner product form => Can be generalize to nonlinear case by applying kernel

  30. KKT Conditions • Since the problems for SVM is convex, the KKT conditions are necessary and sufficient for w, b and αto be a solution. • w is determined by training procedure. • b is easily found by using KKT complementary conditions, by choosing any i for which αi≠ 0 Complementary slackness

  31. Lagrange Multiplier Method • Karush-Kuhn-Tucker conditions (KTT)

  32. Support Vector • A training data • Satisfy KKT • Locate on Support Hyperplan • αi > 0

  33. Non-Linear Separable SVM : Kernel • To extend to non-linear case, we need to the data to some other Euclidean space.

  34. Kernel (cont.) • 把非線性資料投射到更高維度的空間或是特徵空間(feature space)

  35. Kernel Function • Since the training algorithm only depend on data through dot products. We can use a “kernel function” K such that • One commonly used example is radial based function (RBF) • A RBF is a real-valued function whose value depends only on the distance from the origin, so that Φ(x)= Φ(||x||) ; or alternatively on the distance from some other point c, called a center, so that Φ(x,c)= Φ(||x-c||).

  36. Non- Separable Cases

  37. Non-separable SVM • Real world application usually have no OSH. We need to add an error termζ. => • To give penalty to error term, define • New Lagrangian form is <對照線性SVM>

  38. Non-separable SVM • New KKT Conditions

More Related