Multiple Regression Analysis

Multiple Regression Analysis y = b0 + b1x1 + b2x2 + . . . bkxk + u Heteroskedasticity Slides by K. Clark, adapted from P. Anderson

What is Heteroskedasticity? • Var(u|x) = σ2 [MLR.5] • Homoscedasticity assumption: variance is constant • Recall the assumption of homoskedasticity implied that conditional on the explanatory variables, the variance of the unobserved error, u, was constant • If this is not true, that is if the variance of u is different for different values of the x’s, then the errors are heteroskedastic

Examples of Heteroskedasticity • Estimating returns to education: ability is unobservable, and the variance in ability differs by educational attainment • Estimating savings as a function of income. Higher income households have more discretion over what to do with their money • Estimating average crime rates in a country as a function of average characteristics of that country – Var( ) is a function of n which varies with x

Picture of Heteroskedasticity f(y|x) y . . E(y|x) = b0 + b1x . x1 x2 x3 x

Why Worry About Heteroskedasticity? • OLS estimators are still unbiased and consistent, even if we do not assume homoskedasticity [Need MLR.1-MLR.4] • The standard errors of the estimators are biased if we have heteroskedasticity • If the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences

Variance with Heteroskedasticity

Robust Standard Errors • This provides an estimator of the variance of which is consistent • The square root of this can be used as a standard error for inference • Typically call these robust or heteroscedasticity-consistent standard errors • [Or White standard errors or Huber standard errors…]

Robust Standard Errors (cont) • Important to remember that these robust standard errors only have asymptotic justification – with small sample sizes t statistics formed with robust standard errors will not have a distribution close to the t, and inferences will not be correct • In Stata, robust standard errors are easily obtained using the robust option of regress

Stata Example: usual s.e.’s . use wage1, clear . regress wage educ exper tenure Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 3, 522) = 76.87 Model | 2194.1116 3 731.370532 Prob > F = 0.0000 Residual | 4966.30269 522 9.51398984 R-squared = 0.3064 -------------+------------------------------ Adj R-squared = 0.3024 Total | 7160.41429 525 13.6388844 Root MSE = 3.0845 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .5989651 .0512835 11.68 0.000 .4982176 .6997126 exper | .0223395 .0120568 1.85 0.064 -.0013464 .0460254 tenure | .1692687 .0216446 7.82 0.000 .1267474 .2117899 _cons | -2.872735 .7289643 -3.94 0.000 -4.304799 -1.440671 ------------------------------------------------------------------------------ .

Stata Example: robust s.e.’s . regress wage educ exper tenure,robust Regression with robust standard errors Number of obs = 526 F( 3, 522) = 41.59 Prob > F = 0.0000 R-squared = 0.3064 Root MSE = 3.0845 ------------------------------------------------------------------------------ | Robust wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .5989651 .0610139 9.82 0.000 .4791021 .718828 exper | .0223395 .0105548 2.12 0.035 .0016044 .0430747 tenure | .1692687 .0292784 5.78 0.000 .1117508 .2267865 _cons | -2.872735 .8074154 -3.56 0.000 -4.458918 -1.286552 ------------------------------------------------------------------------------ .

Notes on example • Estimates of parameters not affected • We can now do robust t-tests in standard way. Robust F tests possible but more tricky • Large sample size in this example justifies use of robust standard errors • Robust standard errors can be larger or smaller than usual ones so it can make a difference to the inferences we draw

A Robust LM Statistic • For q exclusion restrictions: • Run OLS on the restricted model and save the residuals ŭ • Regress each of the excluded variables on all of the included variables (q different regressions) and save each set of residuals ř1, ř2, …, řq • Regress a variable defined to be = 1on ř1 ŭ, ř2 ŭ, …, řq ŭ, with no intercept (use nocons option in Stata, see helpregress) • The LM statistic is n – SSR1, where SSR1 is the sum of squared residuals from this final regression

Bridge into Testing • Why not always use robust inference? • often do in cross section applications (large n) • in small samples robust standard errors may be far out • OLS not BLUE in presence of heteroscedasticity so a better estimator is potentially available • Hence sometimes need to test for presence of heteroscedasticity

Testing for Heteroskedasticity • Actually we test the null hypothesis of homoscedasticity: • H0: Var(u|x1, x2,…, xk) = s2 • From the definition of variance and since E(u|x) = 0, this is equivalent to • H0: E(u2|x1, x2,…, xk) = E(u2) = s2 • If assume the relationship between u2 and xj will be linear, can test as a linear restriction • So, for u2 = d0 + d1x1 +…+ dk xk + v this means testing H0: d1 = d2 = … = dk = 0

The Breusch-Pagan Test • Don’t observe the error, but can estimate it with the residuals from the OLS regression • Then estimate an auxiliary regression of these residuals squared on a set of suspect regressors and an intercept – note from this regression • The s suspect regressors can be all, some or none of the regressors in the original model • After regressing the residuals squared on the suspect x’s, can use to form an F or LM test

The Breusch-Pagan Test (cont’d) As usual we reject if the observed value of the test statistic exceeds an appropriate critical value Rejection implies heteroscedasticity is present

The White Test • The Breusch-Pagan test will detect any linear form of heteroskedasticity • The White test allows for nonlinearities by putting in the auxiliary regression squares and crossproducts of all the x’s • Use as before an F or LM to test of whether all the xj, xj2, and xjxh are jointly significant • Test the same number of restrictions as explanatory variables in the auxiliary regression

Alternate form of the White test • Consider that the fitted values from OLS, ŷ, are a function of all the x’s • Thus, ŷ2 will be a function of the squares and crossproducts and ŷ and ŷ2 can proxy for all of the xj, xj2, and xjxh, so • Regress the residuals squared on ŷ and ŷ2 and use the R2 to form an F or LM statistic • Note only testing for 2 restrictions now

White test –auxiliary regressions Suppose the original model: y = β0 + β1x1 + β2x2 +β3x3 + u Auxiliary Regression 1 (test 9 restrictions) Auxiliary Regression 2 (test 2 restrictions)

Example (8.4) • House prices – data in hprice1.dta • Estimating house prices as a function of the characteristics of the house (hedonic model) price = β0 + β1lotsize + β2sqrft+β3bdrms + u • lotsize – size of land, sqrft – size of house, bdrms – number of bedrooms • Stata output on web page – leave BP test as an exercise

Example (8.4): White I * do auxiliary regression for version 1 of White test . regress uhatsq lotsize sqrft bdrms lotsizesq sqrftsq bdrmssq lotsqrf lotbdrms sqrfbdrms Source | SS df MS Number of obs = 88 -------------+------------------------------ F( 9, 78) = 5.39 Model | 1.6784e+09 9 186492384 Prob > F = 0.0000 Residual | 2.7003e+09 78 34619264.4 R-squared = 0.3833 -------------+------------------------------ Adj R-squared = 0.3122 Total | 4.3787e+09 87 50330276.7 Root MSE = 5883.8 ------------------------------------------------------------------------------ uhatsq | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lotsize | -1.859507 .6370969 -2.92 0.005 -3.127869 -.5911443 sqrft | -2.673917 8.662183 -0.31 0.758 -19.919 14.57116 bdrms | -1982.841 5438.482 -0.36 0.716 -12810.03 8844.345 lotsizesq | -4.98e-07 4.63e-06 -0.11 0.915 -9.72e-06 8.72e-06 sqrftsq | .0003523 .0018396 0.19 0.849 -.0033101 .0040146 bdrmssq | 289.7541 758.8302 0.38 0.704 -1220.961 1800.469 lotsqrf | .0004568 .0002769 1.65 0.103 -.0000945 .001008 lotbdrms | .3146468 .2520936 1.25 0.216 -.187233 .8165266 sqrfbdrms | -1.02086 1.667154 -0.61 0.542 -4.339909 2.298189 _cons | 15626.24 11369.41 1.37 0.173 -7008.516 38261 ------------------------------------------------------------------------------

Example (8.4) : White I • F version of test • F = 5.39, 1.97 < c < 2.04 (F9,78 at 5%) • p-value = 0.000 (Stata output) • Reject H0 of homoscedasticity • LM= 88*0.3833= 33.73, c = 16.92 (χ29 at 5%) • p-value = 0.000 (computed by me) • Reject H0 of homoscedasticity

Example (8.4): White II * . * do auxiliary regression for version 2 of White test . regress uhatsq yhat yhatsq Source | SS df MS Number of obs = 88 -------------+------------------------------ F( 2, 85) = 9.64 Model | 809489364 2 404744682 Prob > F = 0.0002 Residual | 3.5692e+09 85 41991114.2 R-squared = 0.1849 -------------+------------------------------ Adj R-squared = 0.1657 Total | 4.3787e+09 87 50330276.7 Root MSE = 6480.1 ------------------------------------------------------------------------------ uhatsq | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- yhat | -119.6554 53.31721 -2.24 0.027 -225.6643 -13.64646 yhatsq | .2089466 .0745962 2.80 0.006 .0606294 .3572637 _cons | 19071.59 8876.227 2.15 0.035 1423.27 36719.9 ------------------------------------------------------------------------------ .

Example (8.4): White II • F version of test • F = 9.64, p-value = 0.000 (Stata output) • Reject H0 of homoscedasticity • LM= 88*0.1849= 16.27, c = 5.99 (χ22 at 5%) • Reject H0 of homoscedasticity • Conclusion: heteroscedasticity is present

Generalised Least Squares While it’s always possible to estimate robust standard errors for OLS, if we know something about the specific form of the heteroskedasticity, we can obtain more efficient estimators than OLS The basic idea is to transform the regression model into one that has homoskedastic errors – this is an example of generalised least squares (GLS)

Case of form being known up to a multiplicative constant • Suppose the heteroskedasticity can be modeled as Var(u|x) = s2h(x), where the trick is to figure out what h(x) ≡hi looks like • Define ui*I = ui/√hi • E(ui*|x) = 0, because hi is only a function of x, and Var(ui*|x) = s2, because we know Var(ui|x) = s2hi • So, if we divided our whole model by √hi we would have a model where the error is homoskedastic

Generalized Least Squares • Original model: • yi = β0 + β1xi1 + …+βkxik + ui • Var(ui) = σ2hi • Transformed model: yi/√hi = β0(1/√hi) + β1(xi1/√hi) + … +βk(xik/√hi) + ui/√hi • Or: • yi* = β0x0* + β1xi1* + …+βkxik* + ui* • where Var(ui*) = σ2

Generalized Least Squares • Estimating the transformed equation by OLS is an example of generalized least squares (GLS) • GLS will be BLUE since the transformed model satisfies MLR.1-MLR.5 • GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi)

GLS and WLS • Var(ui|x)=σ2hi • Transformed model: yi/√hi = β0(1/√hi) + β1(xi1/√hi) +… +βk(xik/√hi) + ui/√hi • OLS on this involves

GLS and WLS • The last expression is just the usual OLS problem except each observation is weighted by 1/hi • This is called weighted least squares (WLS) and can be easily implemented in Stata using the regress y x [aweight=1/h] command where h is a variable containing the weights • This is a convenient way of estimating the transformed model without having to create lots of new variables

Example: Electricity Demand • Unusual to know form of heteroscedasticity • One nice example is Houthakker’s 1951 study of UK electricity demand • Berndt, The Practice of Econometrics: Houthakker’s work “appears to be the earliest published econometric article anywhere that reports least squares regression results obtained by using an electronic computer.” • Houthakker’s computer at Cambridge was down 95% of the time!

Example: Electricity Demand • Houthakker wanted to study the demand of individual households but only had data on average values for 42 UK towns. His variables were: • yt - log average electricity consumption per customer • xt2 - log average income • xt3 - log price of electricity • xt4 - log price of gas • xt5 - log potential electricity demand (appliance ownership).

Example: Electricity Demand • Ideally the equation to be estimated would be yi = 1 + 2yi2 + 3yi3 + 4yi4 + 5yi5 + ui defined over households i,but data available only for towns t. Suppose Nt households in t

Example: Electricity Demand • In larger towns the variance of yt will be lower. • An appropriate transformation is: • ut* =Ntut. • So to estimate the model we use • yt* = Ntyt, xtj* = Ntxtj; j = 1, …, 5 • and estimate the regression of y* on x*. • Web page contains data and Ex sheet allows you to reproduce Houthakker’s results.

Weighted Least Squares For the Houthakker UK electricity data: . regress y x2 x3 x4 x5 [aweight=n] You should be able to verify that this gives exactly the same answer as estimating the transformed model explicitly

Roundup of GLS • GLS is great if we know what Var(ui|xi) looks like • In most cases, won’t know form of heteroskedasticity • Example where we do is if data are aggregated, but model is individual level • Want to weight each aggregate observation by the (inverse of the)number of individuals

Feasible GLS • More typical is the case where you don’t know the form of the heteroskedasticity • In this case, you need to estimate h(xi) • Typically, we start with the assumption of a fairly flexible model, such as • Var(u|x) = s2exp(d0 + d1x1 + …+ dkxk) • Since we don’t know the d, must estimate

Feasible GLS (continued) Our assumption implies that u2 = s2exp(d0 + d1x1 + …+ dkxk)v Where E(v|x) = 1. Taking logs, log(u2) = a0+ d1x1 + …+ dkxk + e Where E(e) = 0 and e is independent of x Using û as an estimate of u, we can estimate this by OLS

Feasible GLS (continued) • An estimate of h is obtained as ĥ = exp(ĝ), and the inverse of this is our weight • So, what did we do? • Run the original OLS model, save the residuals, û, square them and take the log • Regress log(û2) on all of the independent variables and get the fitted values, ĝ • Do WLS using 1/exp(ĝ) as the weight • Alternatively transform the model by multiplying all variables by 1/ exp(ĝ) and do OLS

GLS/WLS Wrapup • Feasible GLS is not BLUE (unlike GLS) but is consistent and asymptotically more efficient than OLS • Remember we are using GLS/WLS just for inference/efficiency– OLS is still unbiased & consistent • OLS/GLS estimates will still be different due to sampling error, but if they are very different then it’s likely that some other Gauss-Markov assumption is not true

Next Time • Start to look at econometric analysis using time series data • Introduce econometric package Microfit • Reading: Wooldridge, Sections 10.1-10.3, 10.5, 12.1-12.3

Multiple Regression Analysis