Understanding Overfitting and Regularization: Key Concepts and Techniques in Machine Learning

Overfitting and Regularization Chapters 11 and 12 on amlbook.com

Over-fitting easy to recognize in 1D Parabolic target function 4th order hypothesis 5 data points -> Ein = 0

Origin of over-fitting can be analyzed in 1D: Bias/variance dilemma

Over-fitting easy to avoid in 1D: Results from HW2 Eval Sum of squared deviations Ein Degree of polynomial

Using Eval to avoid over-fitting works in all dimensions but computation grows rapidly for large d Ein Ecv-1 Eval EE d = 2 Terms in F5(x) added successively Validation set needs to be large Does this compromise training?

What if we want to add higher order terms to a linear model but don’t have enough data a validation set? Solution: Augment the error function used to optimize weights Example Penalizes choices with large |w|. Called “weight decay”

Normal equations with weight decay essentially unchanged (ZTZ + lI) wreg =ZTy

Best value l is subjective In this case l = 0.0001 large enough to suppress swings and data still important in determining optimum weights

Assignment 8: due 11-13-14 Generation of in silico dataset y(x) = 1 + 9x2 + N(0,1) with 5 randomly selected values of x between -1 and +1 Fit a 4th degree polynomial to the data with and without regularization by choosing l = 0, 0.0001, 0.001,0.01,1.0, and 10. Display results as in slide 8 of lecture on regularization

Understanding Overfitting and Regularization: Key Concepts and Techniques in Machine Learning