1 / 137

Support Vector Machines

Support Vector Machines. CSE 4309 – Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington. A Linearly Separable Problem. Consider the binary classification problem on the figure.

jreinhardt
Télécharger la présentation

Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines CSE 4309 – Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

  2. A Linearly Separable Problem • Consider the binary classification problem on the figure. • The blue points belong to one class, with label +1. • The orange points belong to the other class, with label -1. • These two classes are linearly separable. • Infinitely many lines separate them. • Are any of those infinitely many lines preferable?

  3. A Linearly Separable Problem • Do we prefer the blue line or the red line, as decision boundary? • What criterion can we use? • Both decision boundaries classify the training data with 100% accuracy.

  4. Margin of a Decision Boundary • The margin of a decision boundary is defined as the smallest distance between the boundary and any of the samples. margin

  5. Margin of a Decision Boundary • One way to visualize the margin is this: • For each class, draw a line that: • is parallel to the decision boundary. • touches the class point that is the closest to the decision boundary. • The margin is the smallest distance between the decision boundary and one of those two parallel lines. • In this example, the decision boundary is equally far from both lines. margin margin

  6. Support Vector Machines • One way to visualize the margin is this: • For each class, draw a line that: • is parallel to the decision boundary. • touches the class point that is the closest to the decision boundary. • The margin is the smallest distance between the decision boundary and one of those two parallel lines. margin

  7. Support Vector Machines • Support Vector Machines (SVMs) are a classification method, whose goal is to find the decision boundary with the maximum margin. • The idea is that, even if multiple decision boundaries give 100% accuracy on the training data, larger margins lead to less overfitting. • Larger margins can tolerate more perturbations of the data. margin margin

  8. Support Vector Machines • Note: so far, we are only discussing cases where the training data is linearly separable. • First, we will see how to maximize the margin for such data. • Second, we will deal with data that are not linearly separable. • We will define SVMs that classify such training data imperfectly. • Third, we will see how to define nonlinear SVMs, which can define non-linear decision boundaries. margin margin

  9. Support Vector Machines • Note: so far, we are only discussing cases where the training data is linearly separable. • First, we will see how to maximize the margin for such data. • Second, we will deal with data that are not linearly separable. • We will define SVMs that classify such training data imperfectly. • Third, we will see how to define nonlinear SVMs, which can define non-linear decision boundaries. An example of a nonlinear decision boundary produced by a nonlinear SVM.

  10. Support Vectors • In the figure, the red line is the maximum margin decision boundary. • One of the parallel lines touches a single orange point. • If that orange point moves closer to or farther from the red line, the optimal boundary changes. • If other orange points move, the optimal boundary does not change, unless those points move to the right of the blue line. margin margin

  11. Support Vectors • In the figure, the red line is the maximum margin decision boundary. • One of the parallel lines touches two blue points. • If either of those points moves closer to or farther from the red line, the optimal boundary changes. • If other blue points move, the optimal boundary does not change, unless those points move to the left of the blue line. margin margin

  12. Support Vectors • In summary, in this example, the maximum margin is defined by only three points: • One orange point. • Two blue points. • These points are called support vectors. • They are indicated by a black circle around them. margin margin

  13. Distances to the Boundary • The decision boundary consists of all points that are solutions to equation: . • is a column vector of parameters (weights). • is an input vector. • is a scalar value (a real number). • If is a training point, its distance to the boundary is computed using this equation:

  14. Distances to the Boundary • If is a training point, its distance to the boundary is computed using this equation: • Since the training data are linearly separable, the data from each class should fall on opposite sides of the boundary. • Suppose that for points of one class, and for points of the other class. • Then, we can rewrite the distance as:

  15. Distances to the Boundary • So, given a decision boundary defined and , and given a training input , the distance of to the boundary is: • If , then: • If , then: • So, in all cases, is positive.

  16. Optimization Criterion • If is a training point, its distance to the boundary is computed using this equation: • Therefore, the optimal boundary is defined as: • In words: find the and that maximize the minimum distance of any training input from the boundary.

  17. Optimization Criterion • The optimal boundary is defined as: • Suppose that, for some values and , the decision boundary defined by misclassifies some objects. • Can those values of and be selected as ?

  18. Optimization Criterion • The optimal boundary is defined as: • Suppose that, for some values and , the decision boundary defined by misclassifies some objects. • Can those values of and be selected as ? • No. • If some objects get misclassified, then, for some it holds that . • Thus, for such and , the expression in red will be negative. • Since the data is linearly separable, we can find better values for and , for which the expression in red will be greater than 0.

  19. Scale of • The optimal boundary is defined as: • Suppose that is a real number, and . • If and define an optimal boundary, then and also define an optimal boundary. • We constrain the scale of to a single value, by requiring that:

  20. Optimization Criterion • We introduced the requirement that: • Therefore, for any , it holds that: • The original optimization criterion becomes: These are equivalent formulations. The textbook uses the last one because it simplifies subsequent calculations.

  21. Constrained Optimization • Summarizing the previous slides, we want to find: subject to the following constraints: • This is a different optimization problem than what we have seen before. • We need to minimize a quantity while satisfying a set of inequalities. • This type of problem is a constrained optimization problem.

  22. Quadratic Programming • Our constrained optimization problem can be solved using a method called quadratic programming. • Describing quadratic programming in depth is outside the scope of this course. • Our goal is simply to understand how to use quadratic programming as a black box, to solve our optimization problem. • This way, you can use any quadratic programming toolkit (Matlab includes one).

  23. Quadratic Programming • The quadratic programming problem is defined as follows: • Inputs: • : an -dimensional column vector. • : an -dimensional symmetric matrix. • : an -dimensional symmetric matrix. • : an -dimensional column vector. • Output: • : an -dimensional column vector, such that:subject to constraint:

  24. Quadratic Programming for SVMs • Quadratic Programming: subject to constraint: • SVM goal: subject to constraints: • We need to define appropriate values of , , , , so that quadratic programming computes and .

  25. Quadratic Programming for SVMs • Quadratic Programming constraint: • SVM constraints:

  26. Quadratic Programming for SVMs • SVM constraints: • Define: , , • Matrix is , vector has rows, vector has rows. • The -th row of is , which should be . • SVM constraint  quadratic programming constraint .

  27. Quadratic Programming for SVMs • Quadratic programming: • SVM: • Define: , , • is like the identity matrix, except that • Then, . • : -dimensionalcolumn vector of zeros. already defined in the previous slides

  28. Quadratic Programming for SVMs • Quadratic programming: • SVM: • Alternative definitions that would NOTwork: • Define: , , • is the identity matrix, , is the -dimensional zero vector. • It still holds that . • Why would these definitions not work?

  29. Quadratic Programming for SVMs • Quadratic programming: • SVM: • Alternative definitions that would NOTwork: • Define: , , • is the identity matrix, , is the -dimensional zero vector. • It still holds that . • Why would these definitions not work? • Vector must also make match the SVM constraints. • With this definition of , no appropriate and can be found.

  30. Quadratic Programming for SVMs • Quadratic programming: subject to constraint: • SVM goal: subject to constraints: • Task: define , , , , so that quadratic programming computes and . ,,, , rows rows, columns rows Like identity matrix, except that rows

  31. Quadratic Programming for SVMs • Quadratic programming: subject to constraint: • SVM goal: subject to constraints: • Task: define , , , , so that quadratic programming computes and . ,,, , • Quadratic programming takes as inputs , , , , and outputs , from which we get the , values for our SVM.

  32. Quadratic Programming for SVMs • Quadratic programming: subject to constraint: • SVM goal: subject to constraints: • Task: define , , , , so that quadratic programming computes and . ,,, , • is the vector of values at dimensions , …, of . • is the value at dimension of .

  33. Solving the Same Problem, Again • So far, we have solved the problem of defining an SVM (i.e., defining and ), so as to maximize the margin between linearly separable data. • If this were all that SVMs can do, SVMs would not be that important. • Linearly separable data are a rare case, and a very easy case to deal with. margin margin

  34. Solving the Same Problem, Again • So far, we have solved the problem of defining an SVM (i.e., defining and ), so as to maximize the margin between linearly separable data. • We will see two extensions that make SVMs much more powerful. • The extensions will allow SVMs to define highly non-linear decision boundaries, as in this figure. • However, first we need to solvethe same problem again. • Maximize the margin betweenlinearly separable data. • We will get a more complicated solution, but that solution will be easier to improve upon.

  35. Lagrange Multipliers • Our new solutions are derived using Lagrange multipliers. • Here is a quick review from multivariate calculus. • Let be a -dimensional vector. • Let and be functions from to . • Functions and map -dimensional vectors to real numbers. • Suppose that we want to minimize , subject to the constraint that . • Then, we can solve this problem using a Lagrange multiplier to define a Lagrangian function.

  36. Lagrange Multipliers • To minimize , subject to the constraint: : • We define the Lagrangian function: • is called a Lagrange multiplier, . • We find , and a corresponding value for , subject to the following constraints: • If , the third constraint implies that . • Then, the constraint is called inactive. • If , then . • Then, constraint is called active.

  37. Multiple Constraints • Suppose that we have constraints: • We want to minimize , subject to those constraints. • Define vector . • Define the Lagrangian function as: • We find , and a value for , subject to:

  38. Lagrange Dual Problems • We have constraints: • We want to minimize , subject to those constraints. • Under some conditions (which are satisfied in our SVM problem), we can solve an alternative dual problem: • Define the Lagrangian function as before: • We find , and the best value for , denoted as , by solving:subject to constraints:

  39. Lagrange Dual Problems • Lagrangian dual problem: Solve: subject to constraints: • This dual problem formulation will be used in training SVMs. • The key thing to remember is: • We minimize the Lagrangian with respect to . • We maximize with respect to the Lagrange multipliers .

  40. Lagrange Multipliers and SVMs • SVM goal: subject to constraints: • To make the constraints more amenable to Lagrange multipliers, we rewrite them as: • Define to be a vector of Lagrange multipliers. • Define the Lagrangian function: • Remember from the previous slides, we minimize with respect to , and maximize with respect to .

  41. Lagrange Multipliers and SVMs • The and that minimize must satisfy:.

  42. Lagrange Multipliers and SVMs • The and that minimize must satisfy:.

  43. Lagrange Multipliers and SVMs • Our Lagrangian function is: • We showed that . Using that, we get:

  44. Lagrange Multipliers and SVMs • We showed that: • Define an matrix such that . • Remember that we have defined . • Then, it follows that:

  45. Lagrange Multipliers and SVMs • We showed that . Using that, we get: We have shown before that the red part equals 0.

  46. Lagrange Multipliers and SVMs • Function now does not depend on . • We simplify more, using again the fact that :

  47. Lagrange Multipliers and SVMs • Function now does not depend on anymore. • We can rewrite as a function whose only input is : • Remember, we want to maximize with respect to .

  48. Lagrange Multipliers and SVMs • By combining the results from the last few slides, our optimization problem becomes:Maximizesubject to these constraints:

  49. Lagrange Multipliers and SVMs • We want to maximize subject to some constraints. • Therefore, we want to find an such that: subject to those constraints.

  50. SVM Optimization Problem • Our SVM optimization problem now is to find an such that:subject to these constraints: This problem can be solved again using quadratic programming.

More Related