1 / 15

Kernel methods

Navneet Goyal , BITS- Pilani , Rajasthan INDIA. Kernel methods. Figure source: http://wwwold.ini.ruhr-uni-bochum.de/thbio/group/neuralnet/index_p.html. Kernel Methods.

tejana
Télécharger la présentation

Kernel methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NavneetGoyal, BITS-Pilani, Rajasthan INDIA Kernel methods • Figure source: • http://wwwold.ini.ruhr-uni-bochum.de/thbio/group/neuralnet/index_p.html

  2. Kernel Methods • In computer science, kernel methods (KMs) are a class of algorithms for pattern analysis, whose best known element is the support vector machine (SVM) (Wikipedia) • Transformations • Feature Spaces • Kernel Functions • Kernel Tricks • Inner Products

  3. Kernel Methods Algorithms capable of operating with kernels include: • Support vector machine (SVM) • Gaussian processes • Fisher's linear discriminant analysis (LDA) • Principal components analysis (PCA) (Kernel PCA) • Canonical correlation analysis • Ridge regression • Spectral clustering • Linear adaptive filters • …

  4. Kernel Methods • Kernels are non-linear generalizations of inner products

  5. Kernel Methods • Any kernel-based method comprises of two modules: • Mapping into embedding or feature space • Learning algorithm designed to discover linear patterns in that space

  6. Kernel Methods • Why this approach works? • Detecting linear patterns has been the focus of much research in statistics and machine learning • Resulting algorithms are well understood and efficient • Computational shortcut: makes it possible to represent linear patterns efficiently in high dimensional space to ensure adequate representational power • The shortcut is nothing but the KERNEL FUNCTION

  7. Kernel Methods • Kernel methods allow us to extend algorithms such as SVMs to define non-linear decision boundaries • Other algorithms that only depend on inner products between data points can be extended similarly • Kernel functions which are symmetric and positive definite allows us to implicitly define inner products in high dimensional • Replacing inner products in input space with positive definite kernels immediately extends algorithms like SVM to • Linear separation in high dimensional space • Or equivalently to a non-linear separation in input space

  8. Types of Kernels • Positive definite symmetric kernels (PDS) • Negative definite symmetric kernels (NDS) • Role of NDS in construction on PDS!

  9. Kernel Methods • Input space, χ • High dimensional space, ℍ • ℍ can be really large!! • Document classification • Trigrams • Vocabulary of 100000 words • Dimension of feature space reaches 1015 • Generalization ability of large-margin classifiers like SVM does not depend on dimensions of the feature space, but on the margin and no. of training examples

  10. Kernel Functions • A function K: 𝟀 x 𝟀 → ℝ is called a kernel over 𝟀 • For any two points x, x’ ∈ 𝟀, K(x,x’) = 〈 ϕ (x), ϕ (x’)〉 For some mapping ϕ : 𝟀 → ℍ to a Hilbert space ℍ called a feature space • K is efficient! • K(x,x’) is O(N) • 〈 ϕ (x), ϕ (x’)〉 is O(dim ℍ) with dim ℍ ≫ N • K is flexible! • No need to explicitly define or compute ϕ • Kernel K can be arbitrarily chosen so long as the existensce of ϕ is guaranteed, i.e. K satisfies Mercer’s condition

  11. Kernel Functions • Mercer’s Condition • A kernel function K can be expressed as K(x,x’) = 〈 ϕ (x), ϕ (x’)〉 iff, for any function g(x) such that ∫g(x)2dx is finite, then ∫K(x,x’)g(x)g(x’) dxdx’ ≥ 0. Kernels satisfying Mercer’s condition are called Positive Definite Kernel Functions! Transformed space of SVM kernels is called a Reproducing Kernel Hilbert Space (RKHS)

  12. Kernel Functions • Examples • Show that the polynomial Kernel fn. satisfies the Mercer’s condition

  13. Feature Spaces example:

  14. Modularity Kernel methods consist of two modules: 1) The choice of kernel (this is non-trivial) 2) The algorithm which takes kernels as input Modularity: Any kernel can be used with any kernel-algorithm. some kernel algorithms: - support vector machine - Fisher discriminant analysis - kernel regression - kernel PCA some kernels:

  15. Goodies and Baddies • Goodies: • Kernel algorithms are typically constrained convex optimization • problems  solved with either spectral methods or convex optimization tools. • Efficient algorithms do exist in most cases. • The similarity to linear methods facilitates analysis. There are strong • generalization bounds on test error. • Baddies: • You need to choose the appropriate kernel • Kernel learning is prone to over-fitting • All information must go through the kernel-bottleneck.

More Related