1 / 31

Classification IV

Classification IV. Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Overview. Support Vector Machines. Linear Classifier. w. w · x + b =0. w · x + b <0. w · x + b >0. Distance to Hyperplane. x. x '. Selection of Classifiers. ?. Which classifier is the best?

efrem
Télécharger la présentation

Classification IV

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification IV Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

  2. Overview • Support Vector Machines

  3. Linear Classifier w w·x + b =0 w·x + b <0 w·x + b >0

  4. Distance to Hyperplane x x'

  5. Selection of Classifiers ? Which classifier is the best? All have the same training error. How about generalization?

  6. Unknown Samples B A Classifier B divides the space more consistently (unbiased).

  7. Margins Support Vectors Support Vectors

  8. Margins • The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point. • Intuitively, it is safer to choose a classifier with a larger margin. • Wider buffer zone for mistakes • The hyperplane is decided by only a few data points. • Support Vectors! • Others can be discarded! • Select the classifier with the maximum margin. • Linear Support Vector Machines (LSVM) • Works very well in practice. • How to specify the margin formally?

  9. Margins “Predict Class = +1” zone M=Margin Width x+ X- wx+b=1 “Predict Class = -1” zone wx+b=0 wx+b=-1

  10. Objective Function • Correctly classify all data points: • Maximize the margin • Quadratic Optimization Problem • Minimize • Subject to

  11. Lagrange Multipliers Dual Problem Quadratic problem again!

  12. Solutions of w & b inner product

  13. An Example x2 (1, 1, +1) x1 (0, 0, -1)

  14. e11 e2 wx+b=1 e7 wx+b=0 wx+b=-1 Soft Margin

  15. Soft Margin

  16. x 0 x 0 x2 Non-linear SVMs x

  17. Feature Space x2 x22 Φ: x→ φ(x) x1 x12

  18. Feature Space x2 Φ: x→ φ(x) x1

  19. Quadratic Basis Functions Constant Terms Number of terms Linear Terms Pure Quadratic Terms Quadratic Cross-Terms

  20. Calculation of Φ(xi )·Φ(xj)

  21. It turns out …

  22. Kernel Trick • The linear classifier relies on dot products between vectors xi·xj • If every data point is mapped into a high-dimensional space via some transformation Φ: x→ φ(x), the dot product becomes: φ(xi)·φ(xj) • A kernel function is some function that corresponds to an inner product in some expanded feature space: K(xi, xj) = φ(xi)·φ(xj) • Example: x=[x1,x2]; K(xi, xj) = (1 + xi ·xj)2

  23. Kernels

  24. String Kernel Similarity between text strings: Car vs. Custard

  25. Solutions of w & b

  26. Decision Boundaries

  27. More Maths … Quadratic Optimization Lagrange Duality Karush–Kuhn–Tucker Conditions

  28. SVM Roadmap

  29. Reading Materials • Text Book • NelloCristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000. • Online Resources • http://www.kernel-machines.org/ • http://www.support-vector-machines.org/ • http://www.tristanfletcher.co.uk/SVM%20Explained.pdf • http://www.csie.ntu.edu.tw/~cjlin/libsvm/ • A list of papers uploaded to the web learning portal • Wikipedia & Google

  30. Review • What is the definition of margin in a linear classifier? • Why do we want to maximize the margin? • What is the mathematical expression of margin? • How to solve the objective function in SVM? • What are support vectors? • What is soft margin? • How does SVM solve nonlinear problems? • What is so called “kernel trick”? • What are the commonly used kernels?

  31. Next Week’s Class Talk • Volunteers are required for next week’s class talk. • Topic : SVM in Practice • Hints: • Applications • Demos • Multi-Class Problems • Software • A very popular toolbox: Libsvm • Any other interesting topics beyond this lecture • Length: 20 minutes plus question time

More Related