1 / 49

Regression Models

Regression Models. Professor William Greene Stern School of Business IOMS Department Department of Economics. Regression and Forecasting Models. Part 2 – Inference About the Regression. The Linear Regression Model. 1. The linear regression model

kagami
Télécharger la présentation

Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

  2. Regression and Forecasting Models Part 2 – Inference About the Regression

  3. The Linear Regression Model 1. The linear regression model 2. Sample statistics and population quantities 3. Testing the hypothesis of no relationship

  4. A Linear Regression Predictor: Box Office = -14.36 + 72.72 Buzz

  5. Data and Relationship • We suggested the relationship between box office and internet buzz is Box Office = -14.36 + 72.72 Buzz • Note the obvious inconsistency in the figure. This is not the relationship. The observed points do not lie on a line. • How do we reconcile the equation with the data?

  6. Modeling the Underlying Process • A model that explains the process that produces the data that we observe: • Observed outcome = the sum of two parts • (1) Explained: The regression line • (2) Unexplained (noise): The remainder • Regression model • The “model” is the statement that part (1) is the same process from one observation to the next. Part (2) is the randomness that is part of real world observation.

  7. The Population Regression • THE model: A specific statement about the parts of the model • (1) Explained: Explained Box Office = β0 + β1 Buzz • (2) Unexplained: The rest is “noise, ε.” Random ε has certain characteristics • Model statement • Box Office = β0 + β1 Buzz + ε

  8. The Data Include the Noise

  9. The Data Include the Noise  0+ 1Buzz Box = 41, 0+ 1Buzz = 10,  = 31

  10. Model Assumptions • yi = β0 + β1xi + εi • β0 + β1xi is the ‘regression function’ • Contains the ‘information’ about yi in xi • Unobserved because β0 and β1 are not known for certain • εi is the ‘disturbance.’ It is the unobserved random component • Observed yi is the sum of the two unobserved parts.

  11. Regression Model Assumptions About εi • Random Variable • (1) The regression is the mean of yi for a particular xi. εi is the deviation of yi from the regression line. • (2)εi has mean zero. • (3) εi has variance σ2. • ‘Random’ Noise • (4) εi is unrelated to any values of xi (no covariance) – it’s “random noise” • (5) εi is unrelated to any other observations on εj (not “autocorrelated”) • (6) Normal distribution - εi is the sum of many small influences

  12. Regression Model

  13. Conditional Normal Distribution of 

  14. A Violation of Point (4) c = 0+ 1 q + ? Electricity Cost Data

  15. A Violation of Point (5) - Autocorrelation Time Trend of U.S. Gasoline Consumption

  16. No Obvious Violations of Assumptions Auction Prices for Monet Paintings vs. Area

  17. Samples and Populations • Population (Theory) • yi = β0 + β1xi + εi • Parameters β0, β1 • Regression • β0 + β1xi • Mean of yi | xi • Disturbance, εi • Expected value = 0 Standard deviation σ • No correlation with xi • Sample (Observed) • yi = b0 + b1xi + ei • Estimates, b0, b1 • Fitted regression • b0 + b1xi • Predicted yi|xi • Residuals, ei • Sample mean 0, Sample std. dev. se • Sample Cov[x,e] = 0

  18. Disturbances vs. Residuals =y- 0 - 1Buzz e=y-b0 –b1Buzz

  19. Standard Deviation of Residuals • Standard deviation of εi = yi- β0– β1xi is σ • σ = √E[εi2] (Mean of εi is zero) • Sample b0 and b1 estimate β0 and β1 • Residual ei = yi – b0– b1xi estimates εi • Use √(1/N)Σei2 to estimate σ? Close, not quite. Why N-2? Relates to the fact that two parameters (β0,β1) were estimated. Same reason N-1 was used to compute a sample variance.

  20. Linear Regression Sample Regression Line

  21. Residuals

  22. Regression Computations

  23. Results to Report

  24. The Reported Results

  25. Estimated equation

  26. Estimated coefficients b0and b1

  27. Sum of squared residuals, Σiei2 

  28. S = se = estimated std. deviation of ε

  29. Interpreting  (Estimated by se) Remember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (b0 +b1x) ±2sebelow.) This point is 2.2 standard deviations from the regression. Only 3.2% of the 62 observations lie outside the bounds. (We will refine this later.)

  30. yi = β0 + β1xi + εi No Relationship: 1 = 0 Relationship: 1  0 How to Distinguish These Cases Statistically?

  31. Assumptions • (Regression) The equation linking “Box Office” and “Buzz” is stable E[Box Office | Buzz] = α + β Buzz • Another sample of movies, say 2012, would obey the same fundamental relationship.

  32. Sampling Variability Samples 0 and 1 are a random split of the 62 observations. Sample 0: Box Office = -16.09 + 79.11 Buzz Sample 1: Box Office = -13.25 + 68.51 Buzz

  33. Sampling Distributions

  34. n = N-2 Small sample Large sample

  35. Standard Error of Regression Slope Estimator 

  36. Internet Buzz Regression Range of Uncertainty for b is 72.72+1.96(10.94)to72.72-1.96(10.94)= [51.27 to 94.17] If you use 2.00 from the t table, the limits would be [50.1 to 94.6] Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = - 14.4 + 72.7 Buzz Predictor Coef SE Coef T P Constant -14.360 5.546 -2.59 0.012 Buzz 72.72 10.94 6.65 0.000 S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression 1 7913.6 7913.6 44.16 0.000 Residual Error 60 10751.5 179.2 Total 61 18665.1 

  37. Some computer programs report confidence intervals automatically; Minitab does not.

  38. Uncertainty About the Regression Slope Hypothetical Regression Fuel Bill vs. Number of Rooms The regression equation is Fuel Bill = -252 + 136 Number of Rooms Predictor Coef SE Coef T P Constant -251.9 44.88 -5.20 0.000 Rooms 136.2 7.09 19.9 0.000 S = 144.456 R-Sq = 72.2% R-Sq(adj) = 72.0% This is b1, the estimate of β1 This “Standard Error,” (SE) is the measure of uncertainty about the true value. The “range of uncertainty” is b ± 2 SE(b). (Actually 1.96, but people use 2) 

  39. Sampling Distributions and Test Statistics

  40. t Statistic for Hypothesis Test

  41. Alternative Approach: The P value • Hypothesis: 1 = 0 • The ‘P value’ is the probability that you would have observed the evidence on this hypothesis that you did observe if the null hypothesis were true. • P = Prob(|t| would be this large | 1 = 0) • If the P value is less than the Type I error probability (usually 0.05) you have chosen, you will reject the hypothesis. • Interpret: It the hypothesis were true, it is ‘unlikely’ that I would have observed this evidence.

  42. P value for hypothesis test

  43. Intuitive approach: Does the confidence interval contain zero? • Hypothesis: 1 = 0 • The confidence interval contains the set of plausible values of 1 based on the data and the test. • If the confidence interval does not contain 0, reject H0: 1 = 0.

  44. More General Test

  45. Summary: Regression Analysis • Investigate: Is the coefficient in a regression model really nonzero? • Testing procedure: • Model: y = β0 + β1x + ε • Hypothesis: H0: β1 = B. • Rejection region: Least squares coefficient is far from zero. • Test: • α level for the test = 0.05 as usual • Compute t = (b1 – B)/StandardError • Reject H0 if t is above the critical value • 1.96 if large sample • Value from t table if small sample. • Reject H0 if reported P value is less than α level Degrees of Freedom for the t statistic is N-2

More Related