1 / 43

Kernel Methods

Kernel Methods. Arie Nakhmani. Outline. Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers. Kernel Smoothers – The Goal. Estimating a function by using noisy observations, when the parametric model for this function is unknown

Télécharger la présentation

Kernel Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kernel Methods Arie Nakhmani

  2. Outline • Kernel Smoothers • Kernel Density Estimators • Kernel Density Classifiers

  3. Kernel Smoothers – The Goal • Estimating a function by using noisy observations, when the parametric model for this function is unknown • The resulting function should be smooth • The level of “smoothness” should be set by a single parameter

  4. Example N=100 sample points What is it: “smooth enough” ?

  5. Example N=100 sample points

  6. Exponential Smoother Smaller a smoother line, but more delayed

  7. Exponential Smoother • Simple • Sequential • Single parameter a • Single value memory • Too rough • Delayed

  8. Moving Average Smoother

  9. Moving Average Smoother m=11 Larger m  smoother, but straightened line

  10. Moving Average Smoother • Sequential • Single parameter: the window size m • Memory for m values • Irregularly smooth • What if we have p-dimensional problem with p>1 ???

  11. Nearest Neighbors Smoother m=16 Larger m  smoother, but biased line x0

  12. Nearest Neighbors Smoother • Not sequential • Single parameter: the number of neighbors m • Trivially extended to any number of dimensions • Memory for m values • Depends on metrics definition • Not smooth enough • Biased end-points

  13. Low Pass Filter 2nd order Butterworth: Why do we need kernel smoothers ???

  14. Low Pass Filter The same filter…for log function

  15. Low Pass Filter • Smooth • Simply extended to any number of dimensions • Effectively, 3 parameters: type, order, and bandwidth • Biased end-points • Inappropriate for some functions (depends on bandwidth)

  16. Kernel Average Smoother x0

  17. Kernel Average Smoother • Nadaraya-Watson kernel-weighted average: with the kernel: • for Nearest Neighbor Smoother • for Locally Weighted Average t

  18. Popular Kernels • Epanechnikov kernel: • Tri-cube kernel: • Gaussian Kernel:

  19. Non-Symmetric Kernel • Kernel example: Which kernel is that ???

  20. Kernel Average Smoother • Single parameter: window width • Smooth • Trivially extended to any number of dimensions • Memory-based method – little or no training is required • Depends on metrics definition • Biased end-points

  21. Local Linear Regression • Kernel-weighted average minimizes: • Local linear regression minimizes:

  22. Local Linear Regression • Solution: where: • Other representation: equivalent kernel

  23. Local Linear Regression x0

  24. Equivalent Kernels

  25. Local Polynomial Regression • Why stop at local linear fits? • Let’s minimize:

  26. Local Polynomial Regression

  27. Variance Compromise

  28. Conclusions • Local linear fits can help bias dramatically at the boundaries at a modest cost in variance. Local linear fits more reliable for extrapolation. • Local quadratic fits do little at the boundaries for bias, but increase the variance a lot. • Local quadratic fits tend to be most helpful in reducing bias due to curvature in the interior of the domain. • λ controls the tradeoff between bias and variance. • Larger λ makes lower variance but higher bias

  29. Local Regression in • Radial kernel:

  30. Popular Kernels • Epanechnikov kernel • Tri-cube kernel • Gaussian kernel

  31. Example

  32. Higher Dimensions • The boundary estimation is problematic • Many sample points are needed to reduce the bias • Local regression is less useful for p>3 • It’s impossible to maintain localness (low bias) and sizeable samples (low variance) at the same time

  33. Structured Kernels • Non-radial kernel: • Coordinates or directions can be downgraded or omitted by imposing restrictions on A. • Covariance can be used to adapt a metric A. (related to Mahalanobis distance) • Projection-pursuit model

  34. Structured Regression • Divide into a set (X1,X2,…,Xq) with q<p and the remainder of the variables collect in vector Z. • Conditionally linear model: • For given Z fit a model by locally weighted least squares:

  35. Density Estimation Mixture of two normal distributions constant window estimation original distribution sample set

  36. Kernel Density Estimation Smooth Parzen estimate:

  37. Comparison Mixture of two normal distributions Usually Bandwidth  selection is more important than kernel function selection

  38. Kernel Density Estimation • Gaussian kernel density estimation: where denote the Gaussian density with mean zero and standard deviation . • Generalization to : LPF

  39. Kernel Density Classification • For a J class problem:

  40. Radial Basis Functions • Function f(x) is represented as expansion in basis functions: • Radial basis functions expansion (RBF): • where the sum-of-squares is minimized with respect to all the parameters (for Gaussian kernel):

  41. Radial Basis Functions • When assuming constant lj=l : the problem of “holes” • The solution - Renormalized RBF:

  42. Additional Applications • Local likelihood • Mixture models for density estimation and classification • Mean-shift

  43. Conclusions • Memory-based methods: the model is the entire training data set • Infeasible for many real-time applications • Provides good smoothing result for arbitrary sampled function • Appropriate for interpolation and extrapolation • When the model is known, better use another fitting methods

More Related