1 / 35

Simple Linear Regression

Simple Linear Regression. Lecture XXVIII. Overview. Most of the material for this lecture is from George Casella and Roger L. Berger Statistical Inference (Belmont, California: Duxbury Press, 1990) Chapter 12, pp. 554-577.

viho
Télécharger la présentation

Simple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simple Linear Regression Lecture XXVIII

  2. Overview • Most of the material for this lecture is from George Casella and Roger L. Berger Statistical Inference (Belmont, California: Duxbury Press, 1990) Chapter 12, pp. 554-577.

  3. The purpose of regression analysis is to explore the relationship between two variables. • In this course, the relationship that we will be interested in can be expressed as: where yi is a random variable and xi is a variable hypothesized to affect or drive yi. The coefficients a and b are the intercept and slope parameters, respectively.

  4. These parameters are assumed to be fixed, but unknown. • The residual ei is assumed to be an unobserved, random error. Under typical assumptions E[ei]=0. • Thus, the expected value of yi given xi then becomes:

  5. The goal of regression analysis is to estimate a and b and to say something about the significance of the relationship. • From a terminology standpoint, y is typically referred to as the dependent variable and x is referred to as the independent variable. Cassella and Berger prefer the terminology of y as the response variable and x as the predictor variable.

  6. This relationship is a linear regression in that the relationship is linear in the parameters a and b. Abstracting for a moment, the traditional Cobb-Douglas production function can be written as: taking the natural log of both sides yields:

  7. Simple Linear Regression • The setup for simple linear regression is that we have a sample of n pairs of variables (xi,yi),…(xn,yn). Further, we want to summarize this relationship using by fitting a line through the data. • Based on the sample data, we first describe the data as follows: • The sample means

  8. The sums of squares:

  9. The most common estimators given this formulation are then given by

  10. Least Squares: A Mathematical Solution • Following on our theme in the discussion of linear projections “Our first derivation of estimates of aand b makes no statistical assumptions about the observations (xi,yi)…. Think of drawing through this cloud of points a straight line that comes ‘as close as possible’ to all the points.”

  11. This definition involves minimizing the sum of square error in the choice of a and b:

  12. Focusing on a first

  13. Taking the first-order conditions with respect to b yields:

  14. Going from this result to the traditional estimator requires the statement that

  15. The least squares estimator of b then becomes:

  16. Computing the simple least squares representation:

  17. First, we derive the projection matrix which is a 12 x 12 matrix. The projection of y onto the space can then be calculated as:

  18. Comparing these results with the estimated values of y from the model yields:

  19. Best Linear Unbiased Estimators: A Statistical Solution • The linear relationship between the xs and ys and we assume that

  20. The implications of this variance assumption are significant. Note that we assume that each observation has the same variance irregardless of the value of the independent variable. In traditional regression terms, this implies that the errors are homoscedastic.

  21. One way to state these assumptions is This specification is consistent with our assumptions, since the model is homoscedastic and linear in the parameters.

  22. Based on this formulation, we can define the linear estimators of a and b as An unbiased estimator of b can further be defined as those linear estimators whose expected value is the true value of the parameter:

  23. The linear estimator that satisfies these unbiasedness conditions and yields the smallest variance of the estimate is referred to as the best linear unbiased estimator (or BLUE). In this example, we need to show that

  24. Given that the yis are uncorrelated, the variance of linear model can be written as:

  25. The problem of minimizing the variance then becomes choosing the dis to minimize this sum subject to the unbiasedness constraints

  26. Using the results from the first n first-order conditions and the second constraint first, we have

  27. Substituting this result into the first n first-order conditions yields:

  28. Substituting these conditions into the first constraint, we get

  29. This proves that simple least squares is BLUE on a fairly global scale. Note that we did not assume normality in this proof. The only assumptions were that the expected error term is equal to zero and that the variances were independently and identically distributed.

More Related