150 likes | 285 Vues
Interpreting Bi-variate OLS Regression. Stata Regression Output Regression plots and RSS R 2 -- Coefficient of Determination Adjusted R 2 Sample Covariance/Correlation Hypothesis Testing Standard Errors T-tests and P-values. Data. Use the “caschool.dat” file Data description:
 
                
                E N D
Interpreting Bi-variate OLS Regression • Stata Regression Output • Regression plots and RSS • R2 -- Coefficient of Determination • Adjusted R2 • Sample Covariance/Correlation • Hypothesis Testing • Standard Errors • T-tests and P-values
Data • Use the “caschool.dat” file • Data description: • CaliforniaTestScores.pdf • Build a Stata do-file as you go • Model: • Test score=f(student/teacher ratio)
Stata Regression Model: Regressing Student Teacher RatioontoTest Score histogram str, percent normal histogram testscr, percent normal
Regression Output regress testscr str Source | SS df MS Number of obs = 420 -------------+------------------------------ F( 1, 418) = 22.58 Model | 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual | 144315.484 418 345.252353 R-squared = 0.0512 -------------+------------------------------ Adj R-squared = 0.0490 Total | 152109.594 419 363.030056 Root MSE = 18.581 ------------------------------------------------------------------------------ testscr | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- str | -2.279808 .4798256 -4.75 0.000 -.2263628 _cons | 698.933 9.467491 73.82 0.000 . ------------------------------------------------------------------------------
Regression Descriptive Statistics cor testscr str, means Variable | Mean Std. Dev. Min Max -------------+---------------------------------------------------- testscr | 654.1565 19.05335 605.55 706.75 str | 19.64043 1.891812 14 25.8 | testscr str -------------+------------------ testscr | 1.0000 str | -0.2264 1.0000
Regression Plot twoway (scatter testscr str) (lfitci testscr str)
Measuring “Goodness of Fit” • Root of Mean Squared Error (“Root MSE”) • Measures spread around the regression line • Coefficient of Determination (R2) “model” or explained sum of squares “total” sum of squares
unexplained deviation explained deviation Explaining R2 For each observation Yi, variation around the mean can be decomposed into that which is “explained” by the regression and that which is not: Book terminology: TSS = (all)2 RSS = (unexplained)2 ESS = (explained)2 Stata terminology: Residual = (unexplained)2 Model = (explained)2 Total = (all)2
Sample Covariance & Correlation • Sample covariance for a bivariate model is defined as: • Sample correlations (r) “standardize” covariance by dividing by the product of the X and Y standard deviations: Sample correlations range from -1 (perfect negative relationship) to +1 (perfect positive relationship)
Standardized Regression Coefficients(aka “Beta Weights” or “Betas”) • Formula: • In our example: • Interpretation: the number of std. deviations change in Y one should expect from a one-std. deviation change in X.
Hypothesis Tests for Regression Coefficients • For our model: Yi = 698.933-2.279808*Xi+ei • Another sample of 420 observations would lead to different estimates for b0 and b1. If we drew many such samples, we’d get the sample distribution of the estimates • We need to estimate the sample distribution, (because we usually can’t see it) based on our sample size and variance
For our model: b0 = 698.933, and SEb0 = 9.467 b1 = -2.28, and SEb1 = .4798 Interpreting Standard Errors Assuming that we estimated the sample standard error correctly, we can identify how many standard errors our estimate is away from zero. The T-test reports the number of standard errors our estimate falls away from zero. Thus, the “T” for b1 is 4.75 for our model. (rounding!) Estimated Sampling Distribution for b1 b1 = -2.28 0 (which is 4.75 SEb1 “units” away from b1) b1 - SEb1=-2.76 b1 + SEb1= -1.8
Classical Hypothesis Testing • Assume that b1 is zero. What is the probability that your sample would have • resulted in an estimate for b1 that is 4.75 SEb1’s away from zero? • To find out, determine the cumulative density of the estimated sampling • distribution that falls more than 4.75 SEb1’s away from zero. • See Table 2, page 757, in Stock & Watson. It reports discrete “p-values”, given • the sample size and t-values. Note the distinction between 1 and 2 sided tests • In general, if the t-stat is above 2, • the p-value will be <0.05 -- which is • the acceptable upper limit in a • classical hypothesis test. Note: in Stata-speak, a p-value is a “p>|t|” Assume that b1 = 0.0 (null hypothesis) Estimated b1 = 2.27 (working hypothesis)
Coming up... • For Next Week • Use the caschool.dta dataseet • Run a model in Stata using Average Income (avginc) to predict Average Test Scores (testscr) • Examine the univariate distributions of both variables and the residuals • Walk through the entire interpretation • Build a Stata do-file as you go • For Next Week: • Read Chapter 8 of Stock & Watson