1 / 23

Input Space versus Feature Space in Kernel-Based Methods

Input Space versus Feature Space in Kernel-Based Methods. Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of Computer Science and Engineering University of California, San Diego. Goals. Objectives of the paper.

red
Télécharger la présentation

Input Space versus Feature Space in Kernel-Based Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Input Space versus Feature Space in Kernel-Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of Computer Science and Engineering University of California, San Diego

  2. Goals Objectives of the paper • Introduce and illustrate the kernel trick • Discuss the kernel mapping from input space to feature space F • Review kernel algorithms: SVMs and kernel PCA • Discuss interpretation of the return from F to after the dot product computation • Discuss the form of constructing sparse approximations of feature space expansions • Evaluate and discuss the performance of SVMs and PCA Applications of kernel methods • Handwritten digit recognition • Face recognition • De-noising: this paper

  3. Definition A reproducing kernel k is a function k:  R. • The domain of k consists of the data patterns {x1, …, xl}  • is a compact set in which the data lives • is typically a subset of RN Computing k is equivalent to mapping data patterns into a higher dimensional space F, and then taking the dot product there. A feature map : RN F is a function that maps the input data patterns into a higher dimensional space F.

  4. Illustration Using a feature map  to map the data from input space into a higher dimensional feature space F: Φ(X) X X O Φ(X) Φ(O) X Φ(X) O Φ(O) X Φ(X) O Φ(O) Φ(O) O F

  5. Kernel Trick We would like to compute the dot product in the higher dimensional space, or (x) · (y). To do this we only need to compute k(x,y), since k(x,y) = (x) · (y). Note that the feature map  is never explicitly computed. We avoid this, and therefore avoid a burdensome computational task.

  6. Example kernels Gaussian: Polynomial: Sigmoid: Nonlinear separation can be achieved.

  7. Nonlinear Separation

  8. Mercer Theory Input Space to Feature Space Necessary condition for the kernel-mercer trick: NF is equal to the rank of ui uiT – the outer product  is the normalized eigenfunction – analogous to a normalized eigenvector

  9. Mercer :: Linear Algebra Linear algebra analogy: Eigenvector problemEigenfunction problem A k(x,y) u, , x and y are vectors u is the normalized eigenvector  is the eigenvalue • is the normalized eigenfunction

  10. RKHS, Capacity, Metric Reproducing kernel Hilbert space (RKHS) • Hilbert space of functions f on some set X such that all evaluation functions are continuous, and the functions can be reproduced by the kernel Capacity of the kernel map • Bound on the how many training examples are required for learning, measured by the VC-dimension h Metric of the kernel map • Intrinsic shape of the manifold to which the data is mapped

  11. Support Vector Machines The decision boundary takes the form: • Similar to single layer perceptron • Training examples xi with non-zero coefficients i are support vectors

  12. Kernel Principal Component Analysis KPCA carries out a linear PCA in the feature space F The extracted features take the nonlinear form The are the components of the k-th eigenvector of the matrix

  13. KPCA and Dot Products Wish to find eigenvectors V and eigenvalues  of the covariance matrix Again, replace (x) · (y). with k(x,y).

  14. From Feature Space to Input Space Pre-image problem: Here,  is not in the image.

  15. Projection Distance Illustration Approximate the vector   F:

  16. Minimizing Projection Distance z is an approximate pre-image for  if: Maximize: For kernels where k(z,z) = 1 (Gaussian), this reduces to:

  17. Fixed-point iteration So assuming a Gaussian kernel: • i are the eigenvectors of the centered Gram matrix • xi are the input space •  is the width Requiring no step-size, we can iterate:

  18. Kernel PCA Toy Example Generated an artificial data set from three point sources, 100 point each.

  19. De-noising by Reconstruction, Part One • Reconstruction from projections onto the eigenvectors from previous example • Generated 20 new points from each Gaussian • Represented by their first n = 1, 2, …, 8 nonlinear principal components

  20. De-noising by Reconstruction, Part Two • Original points are moving in the direction of de-noising

  21. De-noising in 2-dimensions • A half circle and a square in the plane • De-noised versions are the solid lines

  22. De-noising USPS data patterns Patterns 7291 train 2007 test Size: 16 x 16 Linear PCA Kernel PCA

  23. Questions

More Related