Input Space versus Feature Space in Kernel-Based Methods

Input Space versus Feature Space in Kernel-Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of Computer Science and Engineering University of California, San Diego

Goals Objectives of the paper • Introduce and illustrate the kernel trick • Discuss the kernel mapping from input space to feature space F • Review kernel algorithms: SVMs and kernel PCA • Discuss interpretation of the return from F to after the dot product computation • Discuss the form of constructing sparse approximations of feature space expansions • Evaluate and discuss the performance of SVMs and PCA Applications of kernel methods • Handwritten digit recognition • Face recognition • De-noising: this paper

Definition A reproducing kernel k is a function k:  R. • The domain of k consists of the data patterns {x1, …, xl}  • is a compact set in which the data lives • is typically a subset of RN Computing k is equivalent to mapping data patterns into a higher dimensional space F, and then taking the dot product there. A feature map : RN F is a function that maps the input data patterns into a higher dimensional space F.

Illustration Using a feature map  to map the data from input space into a higher dimensional feature space F: Φ(X) X X O Φ(X) Φ(O) X Φ(X) O Φ(O) X Φ(X) O Φ(O) Φ(O) O F

Kernel Trick We would like to compute the dot product in the higher dimensional space, or (x) · (y). To do this we only need to compute k(x,y), since k(x,y) = (x) · (y). Note that the feature map  is never explicitly computed. We avoid this, and therefore avoid a burdensome computational task.

Example kernels Gaussian: Polynomial: Sigmoid: Nonlinear separation can be achieved.

Nonlinear Separation

Mercer Theory Input Space to Feature Space Necessary condition for the kernel-mercer trick: NF is equal to the rank of ui uiT – the outer product  is the normalized eigenfunction – analogous to a normalized eigenvector

Mercer :: Linear Algebra Linear algebra analogy: Eigenvector problemEigenfunction problem A k(x,y) u, , x and y are vectors u is the normalized eigenvector  is the eigenvalue • is the normalized eigenfunction

RKHS, Capacity, Metric Reproducing kernel Hilbert space (RKHS) • Hilbert space of functions f on some set X such that all evaluation functions are continuous, and the functions can be reproduced by the kernel Capacity of the kernel map • Bound on the how many training examples are required for learning, measured by the VC-dimension h Metric of the kernel map • Intrinsic shape of the manifold to which the data is mapped

Support Vector Machines The decision boundary takes the form: • Similar to single layer perceptron • Training examples xi with non-zero coefficients i are support vectors

Kernel Principal Component Analysis KPCA carries out a linear PCA in the feature space F The extracted features take the nonlinear form The are the components of the k-th eigenvector of the matrix

KPCA and Dot Products Wish to find eigenvectors V and eigenvalues  of the covariance matrix Again, replace (x) · (y). with k(x,y).

From Feature Space to Input Space Pre-image problem: Here,  is not in the image.

Projection Distance Illustration Approximate the vector   F:

Minimizing Projection Distance z is an approximate pre-image for  if: Maximize: For kernels where k(z,z) = 1 (Gaussian), this reduces to:

Fixed-point iteration So assuming a Gaussian kernel: • i are the eigenvectors of the centered Gram matrix • xi are the input space •  is the width Requiring no step-size, we can iterate:

Kernel PCA Toy Example Generated an artificial data set from three point sources, 100 point each.

De-noising by Reconstruction, Part One • Reconstruction from projections onto the eigenvectors from previous example • Generated 20 new points from each Gaussian • Represented by their first n = 1, 2, …, 8 nonlinear principal components

De-noising by Reconstruction, Part Two • Original points are moving in the direction of de-noising

De-noising in 2-dimensions • A half circle and a square in the plane • De-noised versions are the solid lines

De-noising USPS data patterns Patterns 7291 train 2007 test Size: 16 x 16 Linear PCA Kernel PCA

Questions

Input Space versus Feature Space in Kernel-Based Methods

Input Space versus Feature Space in Kernel-Based Methods

Presentation Transcript

Mercer Kernel-Based Clustering in Feature Space For Math6397 Prof. Azencott

Comparison of Enrollment versus Space

Input Space Regularization Stabilizes Pre-Images For Kernel PCA De-Noising

Negative versus Positive Space

Negative versus Positive Space

Measure Independence in Kernel Space

Kernel – Based Methods

Space-based Astronomy Instruments

LMS Algorithm in a Reproducing Kernel Hilbert Space

Feature Space Based Watermarking in Multi-Images

Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods

Explicit Feature Methods for Accelerated Kernel Learning

Space-based DTN

Input Space Partitioning(2)

The Implicit Mapping into Feature Space

Measure Independence in Kernel Space

Input Space versus Feature Space in Kernel-Based Methods

Input Space Partitioning(1)

Feature space tansformation methods

LMS Algorithm in a Reproducing Kernel Hilbert Space

3. Phase Space Methods

3.1 Image and Kernel (Null Space)