Understanding Support Vector Machines: An Overview of Linear and Nonlinear Classifiers

Machine learning continued Image source: https://www.coursera.org/course/ml

More about linear classifiers • When the data is linearly separable, there may be more than one separator (hyperplane) Which separatoris best?

Support vector machines • Find hyperplane that maximizes the margin between the positive and negative examples C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines • Find hyperplane that maximizes the margin between the positive and negative examples For support vectors, Distance between point and hyperplane: Therefore, the margin is 2 / ||w|| Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin hyperplane • Maximize margin 2 / ||w|| • Correctly classify all training data: • Quadratic optimization problem: C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin hyperplane • Solution: learnedweight Support vector C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin hyperplane • Solution:b = yi – w·xi for any support vector • Classification function (decision boundary): • Notice that it relies on an inner product between the testpoint x and the support vectors xi • Solving the optimization problem also involvescomputing the inner products xi· xjbetween all pairs oftraining points C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

x 0 x 0 x2 Nonlinear SVMs • Datasets that are linearly separable work out great: • But what if the dataset is just too hard? • We can map it to a higher-dimensional space: 0 x Slide credit: Andrew Moore

Nonlinear SVMs • General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable Φ: x→φ(x) Slide credit: Andrew Moore

Nonlinear SVMs • The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such thatK(x,y) = φ(x)· φ(y) • (to be valid, the kernel function must satisfy Mercer’s condition) • This gives a nonlinear decision boundary in the original feature space: C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

x2 Nonlinear kernel: Example • Consider the mapping

Polynomial kernel:

Gaussian kernel • Also known as the radial basis function (RBF) kernel: • The corresponding mapping φ(x)is infinite-dimensional! • What is the role of parameter σ? • What if σ is close to zero? • What if σ is very large?

Gaussian kernel SV’s

What about multi-class SVMs? • Unfortunately, there is no “definitive” multi-class SVM formulation • In practice, we have to obtain a multi-class SVM by combining multiple two-class SVMs • One vs. others • Traning: learn an SVM for each class vs. the others • Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value • One vs. one • Training: learn an SVM for each pair of classes • Testing: each learned SVM “votes” for a class to assign to the test example

SVMs: Pros and cons • Pros • Many publicly available SVM packages:http://www.kernel-machines.org/software • Kernel-based framework is very powerful, flexible • SVMs work very well in practice, even with very small training sample sizes • Cons • No “direct” multi-class SVM, must combine two-class SVMs • Computation, memory (esp. for nonlinear SVMs) • During training time, must compute matrix of kernel values for every pair of examples • Learning can take a very long time for large-scale problems

Beyond simple classification: Structured prediction Word Image Source: B. Taskar

Structured Prediction Parse tree Sentence Source: B. Taskar

Structured Prediction Word alignment Sentence in two languages Source: B. Taskar

Structured Prediction Bond structure Amino-acid sequence Source: B. Taskar

Structured Prediction • Many image-based inference tasks can loosely be thought of as “structured prediction” model Source: D. Ramanan

Unsupervised Learning • Idea: Given only unlabeled data as input, learn some sort of structure • The objective is often more vague or subjective than in supervised learning • This is more of an exploratory/descriptive data analysis

Unsupervised Learning • Clustering • Discover groups of “similar” data points

Unsupervised Learning • Quantization • Map a continuous input to a discrete (more compact) output 2 1 3

Unsupervised Learning • Dimensionality reduction, manifold learning • Discover a lower-dimensional surface on which the data lives

Unsupervised Learning • Density estimation • Find a function that approximates the probability density of the data (i.e., value of the function is high for “typical” points and low for “atypical” points) • Can be used for anomaly detection

Semi-supervised learning • Lots of data is available, but only small portion is labeled (e.g. since labeling is expensive) • Why is learning from labeled and unlabeled data better than learning from labeled data alone? ?

Active learning • The learning algorithm can choose its own training examples, or ask a “teacher” for an answer on selected inputs S. Vijayanarasimhan and K. Grauman, “Cost-Sensitive Active Visual Category Learning,” 2009

Lifelong learning http://rtw.ml.cmu.edu/rtw/

Xinlei Chen, AbhinavShrivastava and Abhinav Gupta. NEIL: Extracting Visual Knowledge from Web Data. In ICCV 2013

Understanding Support Vector Machines: An Overview of Linear and Nonlinear Classifiers

Understanding Support Vector Machines: An Overview of Linear and Nonlinear Classifiers

Presentation Transcript

Machine Learning

Machine Learning

MACHINE LEARNING

Machine Learning

Machine Learning

Machine Learning

Inductive Learning (continued)

Machine Learning

Machine Learning

Finite State Machine Continued

Machine learning Courses | Machine Learning Training

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine learning

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn