1 / 36

Basis Expansion and Regularization

Basis Expansion and Regularization. Presenter: Hongliang Fei Brian Quanz Date: July 03, 2008. Contents. Introduction Piecewise Polynomials and Splines Filtering and Feature Extraction Smoothing Splines Automatic Smoothing parameter selection. 1. Introduction.

dai
Télécharger la présentation

Basis Expansion and Regularization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Date: July 03, 2008

  2. Contents • Introduction • Piecewise Polynomials and Splines • Filtering and Feature Extraction • Smoothing Splines • Automatic Smoothing parameter selection

  3. 1. Introduction • Basis: In Linear Algebra, a basis is a set of vectors satisfying: • Linear combination of the basis can represent every vector in a given vector space; • No element of the set can be represented as a linear combination of the others.

  4. In Function Space, Basis is degenerated to a set of basis functions; • Each function in the function space can be represented as a linear combination of the basis functions. • Example: Quadratic Polynomial bases {1,t,t^2}

  5. What is Basis Expansion? • Given data X and transformation Then we model as a linear basis expansion in X, where is a basis function.

  6. Why Basis Expansion? • In regression problems, f(X) will typically nonlinear in X; • Linear model is convenient and easy to interpret; • When sample size is very small but attribute size is very large, Linear model is all what we can do to avoid over fitting.

  7. 2. Piecewise Polynomials and Splines • Spline: • In Mathematics, a spline is a special function defined piecewise by polynomials; • In Computer Science, the term spline more frequently refers to a piecewise polynomial (parametric) curve. • Simple construction, ease and accuracy of evaluation, capacity to approximate complex shapes through curve fitting and interactive curve design.

  8. Example of a Spline http://en.wikipedia.org/wiki/Image:BezierInterpolation.gif

  9. Assume four knots spline (two boundary knots and two interior knots), also X is one dimensional. • Piecewise constant basis: • Piecewise Linear Basis:

  10. Piecewise Cubic Polynomial

  11. Basis functions: • Six functions corresponding to a six-dimensional linear space.

  12. An M-order spline with knots has continuous derivatives up to order M-2. The general form for truncated-power basis set would be:

  13. Natural cubic Spline • A natural cubic spline adds additional constrains: function is linear beyond the boundary knots. • A natural cubic spline with K knots is represented by K basis functions. • One can start from a basis for cubic splines, and derive the reduced basis by imposing boundary constraints.

  14. Example of Natural cubic spline • Starting from the truncated power series basis, we arrive at: Where

  15. An example of application (Phoneme Recognition)

  16. Data:1000 samples drawn from 695 “aa”s and 1022 “ao”s, with a feature vector of length 256. • Goal: use such data to classify spoken phoneme. • The coefficients can be plotted as a function of frequency

  17. Fitting via maximum likelihood only, the coefficient curve is very rough; • Fitting through natural cubic splines: • Rewrite the coefficient function as expansion of splines that’s where H is a p by M basis matrix of natural cubic splines. • since we replace input features x by filtered version . • Fit via linear logistic regression on • Final result

  18. 3. Filtering and Feature Extraction • Preprocessing high-dimensional features is a power method to improve performance of learning algorithm. • Previous example , a filtering approach to transform features; • They need not be linear, but can be in a general form . Another example: wavelet transform refers to section 5.9.

  19. 4.Smoothing Splines • Purpose: avoid complexity of knot selection problem by using maximal set of knots. • Complexity is controlled via regularization. • Considering this problem: among all functions with two continuous second derivative, minimize

  20. Though RSS is defined on an infinite-dimensional function space, it has an explicit, finite-dimensional unique minimizer : a natural cubic spline with knots at the unique values of the . • Penalty term translates to a penalty on the spline coefficients.

  21. Rewrite the solution: , where are N-dimensional set of basis functions representing the family of natural splines. • Matrix format criterion: Where . • With ridge regression result, the solution: • The fitted smooth spline is given by

  22. Example of a smoothing spline

  23. Degree of freedom and smoother matrix • A smoothing spline with prechosen is a linear operator. • Let be the N-vector of fitted values at the training predictors : Here is called smoother matrix. It depends on only.

  24. Suppose is a N by M matrix of M cubic spline basis functions evaluated at the N training points , with knot sequence . The fitted spline value is given by: Here linear operator is a projection operator, known as hat matrix in statistics.

  25. Similarity and difference between and • Both are symmetric, positive, semi-definite. • Idempotent • Rank( )=N, Rank( )=M. • Trace of gives the dimension of the projection space (number of basis functions).

  26. Define effective degree of freedom as: By specifying , we can derive . • Since is symmetric, hence rewrite is the solution of K is known as Penalty Matrix.

  27. Eigen-decomposition of is given by: where are eigen value and eigen vector of K.

  28. Highlights of eigen-decompostion • The eigen-vectors are not effected by changes in . • Shrinking nature . • The eigen-vector sequence ordered by decreasing appears to increase in complexity. • First two eigen values are always 1, since d1=d2=0, showing Linear functions are not penalized.

  29. Figure: cubic smooth spline fitting to some data

  30. 5. Automatic selection of the smoothing parameters • Selecting the placement and number of knots for regression splines can be a combinatorially complex task; • For smoothing splines, only penalty . • Method: fixing the degree of freedom, solve it from . • Criterion: Bias-Variance tradeoff.

  31. The Bias-Variance Tradeoff • Integrated squared prediction error (EPE): • Cross Validation:

  32. An example:

  33. Figure: EPE,CV and effects for different degree of freedom

  34. Any questions?

More Related