Kernel methods

Kernel methods- overview • Kernel smoothers • Local regression • Kernel density estimation • Radial basis functions Data Mining and Statistical Learning - 2008

Introduction Kernel methods are regression techniques used to estimate a response function from noisy data Properties: • Different models are fitted at each query point, and only those observations close to that point are used to fit the model • The resulting function is smooth • The models require only a minimum of training Data Mining and Statistical Learning - 2008

A simple one-dimensional kernel smoother where Data Mining and Statistical Learning - 2008

Kernel methods, splines and ordinary least squares regression (OLS) • OLS: A single model is fitted to all data • Splines: Different models are fitted to different subintervals (cuboids) of the input domain • Kernel methods: Different models are fitted at each query point Data Mining and Statistical Learning - 2008

Kernel-weighted averages and moving averages The Nadaraya-Watson kernel-weighted average where  indicates the window size and the function D shows how the weights change with distance within this window The estimated function is smooth! K-nearest neighbours The estimated function is piecewise constant! Data Mining and Statistical Learning - 2008

Epanechnikov kernel Tri-cube kernel Examples of one-dimesional kernel smoothers Data Mining and Statistical Learning - 2008

Issues in kernel smoothing • The smoothing parameter λ has to be defined • When there are ties at xi : Compute an average y value and introduce weights representing the number of points • Boundary issues • Varying density of observations: • bias is constant • the variance is inversely proportional to the density Data Mining and Statistical Learning - 2008

Boundary effects of one-dimensionalkernel smoothers Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression Data Mining and Statistical Learning - 2008

Local linear regression Find the intercept and slope parameters solving The solution is a linear combination of yi: Data Mining and Statistical Learning - 2008

Kernel smoothing vs local linear regression Kernel smoothing Solve the minimization problem Local linear regression Solve the minimization problem Data Mining and Statistical Learning - 2008

Properties of local linear regression • Automatically modifies the kernel weights to correct for bias • Bias depends only on the terms of order higher than one in the expansion of f. Data Mining and Statistical Learning - 2008

Local polynomial regression • Fitting polynomials instead of straight lines Behavior of estimated response function: Data Mining and Statistical Learning - 2008

Polynomial vs local linear regression Advantages: • Reduces the ”Trimming of hills and filling of valleys” Disadvantages: • Higher variance (tails are more wiggly) Data Mining and Statistical Learning - 2008

Selecting the width of the kernel Bias-Variance tradeoff: Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance. Data Mining and Statistical Learning - 2008

Selecting the width of the kernel • Automatic selection ( cross-validation) • Fixing the degrees of freedom Data Mining and Statistical Learning - 2008

Local regression in RP The one-dimensional approach is easily extended to p dimensions by • Using the Euclidian norm as a measure of distance in the kernel. • Modifying the polynomial Data Mining and Statistical Learning - 2008

Local regression in RP ”The curse of dimensionality” • The fraction of points close to the boundary of the input domain increases with its dimension • Observed data do not cover the whole input domain Data Mining and Statistical Learning - 2008

Structured local regression models Structured kernels (standardize each variable) Note: A is positive semidefinite Data Mining and Statistical Learning - 2008

Structured local regression models Structured regression functions • ANOVA decompositions (e.g., additive models) Backfitting algorithms can be used • Varying coefficient models (partition X) • INSERT FORMULA 6.17 Data Mining and Statistical Learning - 2008

Structured local regression models Varying coefficient models (example) Data Mining and Statistical Learning - 2008

Local methods • Assumption: model is locally linear ->maximize the log-likelihood locally at x0: • Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et -> yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt) Data Mining and Statistical Learning - 2008

Kernel density estimation • Straightforward estimates of the density are bumpy • Instead, Parzen’s smooth estimate is preferred: Normally, Gaussian kernels are used Data Mining and Statistical Learning - 2008

Radial basis functions and kernels Using the idea of basis expansion, we treat kernel functions as basis functions: where ξj –prototype parameter, λj-scale parameter Data Mining and Statistical Learning - 2008

Radial basis functions and kernels Choosing the parameters: • Estimate {λj,ξj} separately from βj (often by using the distribution of X alone) and solve least-squares. Data Mining and Statistical Learning - 2008

Kernel methods - overview

Kernel methods - overview

Presentation Transcript

Kernel Methods Part 2

Windows Kernel Internals Overview

Overview of Kernel Methods

Kernel Methods: Basics

Kernel Methods and SVM’s

Kernel Methods

Kernel methods

Kernel Methods for Relation Extraction

Neural Networks and Kernel Methods

Kernel synchronization methods

Speaker Verification via Kernel Methods

Kernel – Based Methods

Kernel Methods

Overview of Kernel Methods (Part 2)

Kernel Methods

Chapter 1: Kernel Overview

An Overview of Kernel-Based Learning Methods

Support Vector and Kernel Methods

Kernel Density Estimation, Kernel Methods, and fast learning

Kernel Methods

Lecture 7. Kernel Smoothing Methods