1 / 28

Statistical Techniques I

Statistical Techniques I. EXST7005. Simple Linear Regression. Measuring & describing a relationship between two variables Simple Linear Regression allows a measure of the rate of change of one variable relative to another variable.

dean-knight
Télécharger la présentation

Statistical Techniques I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Techniques I EXST7005 Simple Linear Regression

  2. Measuring & describing a relationship between two variables • Simple Linear Regression allows a measure of the rate of change of one variable relative to another variable. • Variables will always be paired, one termed an independent variable (often referred to as the X variable) and a dependent variable (termed a Y variable). • There is a change in the value of variable Y as the value of variable X changes. Simple Linear Regression

  3. Y X Simple Linear Regression (continued) • For each value of X there is a population of values for the variable Y (normally distributed).

  4. Simple Linear Regression (continued) • The linear model which discribes this relationship is given as • Yi = b0 + b1Xi • this is the equation for a straight line • where; b0 is the value of the intercept (the value of Y when X = 0) • b1 is the amount of change in Y for each unit change in X. (i.e. if X changes by 1 unit, Y changes by b1 units). b1 is also called the slope or REGRESSION COEFFICIENT

  5. Simple Linear Regression (continued) • Population Parameters • my.x = the true population mean of Y at each value of X • b0 = the true value of the Y intercept • b1 = the true value of the slope, the change in Y per unit of X • my.x = b0 + b1Xi • this is the population equation for a straight line

  6. The sample equation for the line describes a perfect line with no variation. In practice there is always variation about the line. We include an additional term to represent this variation. • my.x = b0 + b1Xi + ei for a population • Yi = b0 + b1Xi + ei for a sample • when we put this term in the model, we are describing individual points as their position on the line, plus or minus some deviation Simple Linear Regression (continued)

  7. Y X Simple Linear Regression (continued)

  8. Simple Linear Regression (continued) • the SS of deviations from the line will form the basis of a variance for the regression line • when we leave the ei off the sample model, we are describing a point on the regression line predicted from the sample. To indicate this we put a HAT on the Yi value

  9. Characteristics of a Regression Line • The line will pass through the point `X,`Y (also the point 0, b0) • The sum of squared deviations (measured vertically) of the points from the regression line will be a minimum. • Values on the line can be described by the equation Y = b0 + b1Xi

  10. Y X • Fitting the line starts with a corrected SSDeviation, this is the SSDeviation of the observations from a horizontal line through the mean. Fitting the line

  11. Y X • The fitted line is pivoted on the point until it has a minimum SSDeviations. Fitting the line (continued)

  12. How do we know the SSDeviations are a minimum? Actually, we solve the equation for ei, and use calculus to determine the solution that has a minimum of Sei2. Fitting the line (continued)

  13. The line has some desirable properties • E(b0) = b0 • E(b1) = b1 • E(`YX) = mX.Y • Therefore, the parameter estimates and predicted values are unbiased estimates. Fitting the line (continued)

  14. Y = the "dependent" variable, the variable to be predicted • X = the "independent" variable, also called the regressor or predictor variable. • Assumptions - general assumptions • Y variable is normally distributed at each value of X • The variance is homogeneous (across X). • Observations are independent of each other and ei independent of the rest of the model. The regression of Y on X

  15. The regression of Y on X (continued) • Special assumption for regression. • Assume that all of the variation is attributable to the dependent variable (Y), and that the variable X is measured WITHOUT ERROR. • Note that the deviations are measured vertically, not horizontally or perpendicular to the line.

  16. Derivation of the formulas • Any observation can be written as • Yi = b0 + b1Xi + ei for a sample • where; ei = a deviation fo the observed point from the regression line • note, the idea of regression is to minimize the deviation of the observations from the regression line, this is called a Least Squares Fit

  17. Derivation of the formulas (continued) • Sei = 0 • the sum of the squared deviations • Sei2 = S(Yi - Yhat)2 • Sei2 = S(Yi - b0 + b1Xi )2 • The objective is to select b0 and b1 such that Sei2 is a minimum, this is done with calculus • You do not need to know this derivation!

  18. We have previously defined the uncorrected sum of squares and corrected sum of squares of a variable Yi • The uncorrected SS is SYi2 • The correction factor is (SYi)2/n • The corrected SS is SYi2 - (SYi)2/n • Your book calls this SYY, the correction factor is CYY • We could define the exact same series of calculations for Xi , and call it SXX A note on calculations

  19. A note on calculations (continued) • We will also need a crossproduct for regression, and a corrected crossproduct • The crossproduct is XiYi • The Sum of crossproducts is SXiYi, which is uncorrected • The correction factor is (SXi)(SYi) / n = CXY • The corrected crossproduct is SXiYi-(SXi)(SYi)/n • Which you book calls SXY

  20. Derivation of the formulas (continued) • the partial derivative is taken with respect to each of the parameters for b0

  21. Derivation of the formulas (continued) • set the partial derivative to 0 and solve for b0 • 2 S(Yi-b0-b1Xi)(-1) = 0 • - SYi + nb0 + b1 SXi = 0 • nb0 = SYi - b1 SXi • b0 = `Y - b1`X • So b0 is estimated using b1 and the means of X and Y

  22. Derivation of the formulas (continued) • Likewise for b1 we obtain the partial derivative

  23. Derivation of the formulas (continued) • set the partial derivative to 0 and solve for b1 • 2 S(Yi-b0-b1Xi)(-Xi) = 0 • - S(YiXi + b0Xi + b1 Xi2) = 0 • -SYiXi + b0SXi + b1 SXi2) = 0 • and since b0 =`Y - b1`X ) , then • SYiXi = (SYi/n - b1 SXi/n )SXi + b1 SXi2 • SYiXi = SXiSYi/n - b1 (SXi)2/n + b1 SXi2 • SYiXi - SXiSYi/n = b1 [SXi2 - (SXi)2/n] • b1 = [SYiXi - SXiSYi/n] / [SXi2 - (SXi)2/n]

  24. Derivation of the formulas (continued) • b1 = [SYiXi - SXiSYi/n] / [SXi2 - (SXi)2/n] • b1 = SXY / SXX • so b1 is the corrected crossproducts over the corrected SS of X • The intermediate statistics needed to solve all elements of a SLR are SXi, SYi, n, SXi2 , SYiXi and SYi2 (this last term we haven't seen in the calculations above, but we will need later)

  25. Derivation of the formulas (continued) • Review • We want to fit the best possible line, we define this as the line that minimizes the vertically measured distances from the observed values to the fitted line. • The line that achieves this is defined by the equations • b0 = `Y - b1`X • b1 = [SYiXi - SXiSYi/n] / [SXi2 - (SXi)2/n]

  26. Derivation of the formulas (continued) • These calculations provide us with two parameter estimates that we can then use to get the equation for the fitted line.

  27. Numerical example • See Regression handout

  28. Crossproducts are used in a number of related calculations. • a crossproduct = YiXi • Sum of crossproducts = SYiXi = SXY • Covariance = SYiXi / (n-1) • Slope = SXY / SXX • SSRegression = S2XY / SXX • Correlation = SXY / ÖSXXSYY • R2 = r2 = S2XY / SXXSYY = SSRegression/SSTotal About Crossproducts

More Related