1 / 13

Linear Methods for Regression

Linear Methods for Regression. Lecture Notes for CMPUT 466/551 Nilanjan Ray. Assumption: Linear Regression Function. Model assumption: Output Y is linear in the inputs X =( X 1 , X 2 , X 3 ,…, X p ). Predict the output by:. Vector notation, 1 included in X. Where,.

cybele
Télécharger la présentation

Linear Methods for Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Methods for Regression Lecture Notes for CMPUT 466/551 Nilanjan Ray

  2. Assumption: Linear Regression Function Model assumption: Output Y is linear in the inputs X=(X1, X2, X3,…, Xp) Predict the output by: Vector notation, 1 included in X Where, Also known as multiple-regression when p>1

  3. Least Square Solution Residual sum of squares: In matrix-vector notation: residual Vector differentiation: Solution: Known asleast square solution For a new input The regression output is

  4. Bias-Variance Decomposition • has zero expectation same variance uncorrelated Model: where Linear Estimator: Variance: Bias: Unbiased estimator! Ex. Show the last step Decomposition of EPE: Variance= 2(p/N) Sq. bias=0 Irreducible error= 2

  5. Gauss-Markov Theorem Gauss-Markov Theorem: least square estimate has the minimum variance among all linear unbiased estimators Interpretation: The estimator found by least squares is linear in y We have noticed that this estimator is unbiased, i.e., If we find any other unbiased estimator g(x0) of f(x0) that is linear in y too, i.e., and then Question: Is the LS the best estimator for the given linear additive model?

  6. Subset Selection • LS solution often has large variance (remember that variance is proportional to the number of inputs p, i.e., model complexity) • If we decrease the number of input variables p, we can decrease the variance, however we then sacrifice the zero bias • If this trade-off decreases test error, the solution can be accepted • This reasoning leads to subset selection, i.e., select a subset from the p inputs for the regression computation • Subset selection has another advantage– easy and focused interpretation of the input variables on the output

  7. Subset Selection… Can we determine which j s are insignificant? Yes, we can by statistical hypothesis testing! However, we need a model assumption:  is zero mean Gaussian with standard deviation 

  8. Subset Selection: Statistical Significance Test The linear model with additive Gaussian noise has the following properties: Ex. Show this. So we can form a standardized coefficient or Z-score test for each coefficient: where and vj is the jth diagonal element of (XTX)-1 Hypothesis testing principle says that a large value of Z-score should retain The coefficient, a small value should discard the coefficient How large/small – depends on the significance level

  9. Case Study: Prostate Cancer Output = log prostate-specific antigen Input = ( log cancer volume, log prostate weight, age, log of benign prostatic hyperplacia, seminal vesicle invasion, log of capsular penetration, Gleason score, % of Gleason score 4 or 5) Goal: (1) predict the output given a novel input (2) Interpret the influence of the inputs on the output

  10. Case Study… Scatter plot Hard to interpret which ones are most influencing Also we want to find out how the inputs jointly influence the output

  11. Subset Selection on Prostate Cancer Data Scores with magnitude greater than 2 indicate significant variables at 5% significance level

  12. Coefficient Shrinkage: Ridge Regression Method Non-negative penalty One computational advantage is that the matrix is always invertible If L2 norm is replaced by L1 norm, the corresponding regression is called LASSO (see [HTF])

  13. Ridge Regression… coefficient Decreasing  One way to determine  is cross validation – we’ll learn about it later

More Related