Multiple Regression Analysis

Multiple Regression Analysis y = b0 + b1x1 + b2x2 + . . . bkxk + u Heteroskedasticity

What is Heteroskedasticity • Recall the assumption of homoskedasticity implied that conditional on the explanatory variables, the variance of the unobserved error, u, was constant • If this is not true, that is if the variance of u is different for different values of the x’s, then the errors are heteroskedastic • Example: estimating returns to education and ability is unobservable, and think the variance in ability differs by educational attainment

Example of Heteroskedasticity f(y|x) y . . E(y|x) = b0 + b1x . x1 x2 x3 x

Why Worry About Heteroskedasticity? • OLS is still unbiased and consistent, even if we do not assume homoskedasticity • The standard errors of the estimates are biased if we have heteroskedasticity • If the standard errors are biased, we can not use the usual t statistics or F statistics for drawing inferences • Luckily, many advances in econometrics of the last 20 years have discovered ways to adjust standard errors so they are valid in the presence of heteroskedasticity (at least in large samples)

Variance with Heteroskedasticity: Simple Regression • Take the simple case: • We previously saw that: • The main difference here is that u is not subscripted by “i” • When this reduces to

Variance with Heteroskedasticity: Simple Regression

Variance with Heteroskedasticity: Multiple Regression (White 1980)

Robust Standard Errors • Now that we have a consistent estimate of the variance, the square root can be used as a standard error for inference • Typically call these robust standard errors • Sometimes the estimated variance is corrected for degrees of freedom by multiplying by n/(n – k – 1) • As n → ∞ it’s all the same, though

Robust Standard Errors (cont) • Important to remember that these robust standard errors only have asymptotic justification • with small sample sizes t statistics formed with robust standard errors will not have a distribution close to the t, and inferences will not be correct • In Stata, robust standard errors are easily obtained using the robust option of reg • reg y x1 x2 …xk, robust

Testing for Heteroskedasticity • Why bother? • If heteroskedasticity is present, OLS is no longer BLUE • In heteroskedasticity is present, we will soon see that there is a better estimator when the form of heteroskedasticity is known

Testing for Heteroskedasticity • Start with our linear model under Assumptions 1 - 4: y = b0 + b1x1 + b2x2 + . . . bkxk + u • Assumption 5 becomes our null hypothesis. • H0: Var(u|x1, x2,…, xk) = s2, which is equivalent to H0: E(u2|x1, x2,…, xk) = E(u2) = s2 • If assume the relationship between u2 and xj will be linear, we can estimate the equation: û2 = d0 + d1x1 +…+ dk xk + v and test H0: d1 = d2 = … = dk = 0

The White Test • What if the relationship between û2 and the x’s is not linear? • The White test allows for nonlinearities by using squares and crossproducts of all the x’s • Consider again our linear model: • y = b0 + b1x1 + b2x2 + b3x3 + u • Regress û2 = a0 + a1x1 + a2x2 + a3x3 + a4x12 + a5x22 + a6x32 + a7x1x2+ a8x1x3+ a9x2x3 + error • Test the joint significance of all the estimated coefficients with an F test • This can get to be messy pretty quickly

Alternate form of the White test • Consider that the fitted values from OLS, ŷ, are a function of all the x’s • ŷ= b0 + b1x1 + b2x2 + b3x3 • ŷ2=(b0 + b1x1 + b2x2 + b3x3)2 • Thus, ŷ2 will be a function of the squares and crossproducts and ŷ and ŷ2 can proxy for all of the xj, xj2, and xjxh • Regress the residuals squared on ŷ and ŷ2 and again test the joint significance of the estimated coefficients • Stata Example 6A-1

Weighted Least Squares • Now that we have discovered that Assumption 5 has failed, what can we do? • Use robust standard errors • If we know something about the specific form of the heteroskedasticity, we can obtain more efficient estimates than OLS • The basic idea is going to be to transform the model into one that has homoskedastic errors – called weighted least squares (WLS)

Case of form being known up to a multiplicative constant • Assume that Var(ui|xi) = σ2h(xi), where h(x) is some function of the explanatory variables that determines the heteroskedasticity. • Since variances must be positive, h(x) > 0 for all x. • Assume that the function h(x) is known. • We can write σi2=Var(ui|xi) = σ2h(xi) • Example: savi = b0 + b1inci + ui • Var(ui|inci) = σ2inci • h(x) = h(inci) = inci; the variance of the error is proportional to the level of income • As income increases, the variability in saving also increases (income is nonnegative)

Case of form being known up to a multiplicative constant • We can use the fact that we know h(x) to transform our heteroskedastic equation to a homoskedastic one. • Since h is just a function of x: • E(ui/√hi|x) = 0 • Var(ui/√hi|x)= E[(ui/√hi|x)2]=E(u2)/hi = hiσ2/hi = σ2 • So, if we divided our whole equation by √hi we would have a model where the error is homoskedastic

Case of form being known up to a multiplicative constant • yi/√hi = β0 /√hi + β1(x1/√hi)+ β2(x2 /√hi)+ β3(x3 /√hi)+ (u /√hi) • yi*= β01* + β1x1*+ β2x2*+ β3x3*+ u* • Return to the original model to interpret the coefficients

Generalized Least Squares • Estimating the transformed equation by OLS is an example of generalized least squares (GLS) • GLS will be BLUE in this case • GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi)

Weighted Least Squares • While it is intuitive to see why performing OLS on a transformed equation is appropriate, it can be tedious to do the transformation • Weighted least squares is a way of getting the same thing, without the transformation • Idea is to minimize the weighted sum of squares (weighted by 1/hi) • Stata Example 6A-2

More on WLS • WLS is great if we know what Var(ui|xi) looks like • In most cases, won’t know form of heteroskedasticity • There is one case where the weights needed arise naturally from an underlying economic model: data is aggregated, but model is individual level • For example, when you use per capita data at the city, county, state or country level • If the individual level equation satisfies the GM Assumptions, then the error in the per capita equation has a variance proportional to one over the size of the population. • Thus, WLS with weights equal to the population is appropriate

Example • Suppose we have city-level data on per capita beer consumption, the percentage of people in the population over 21 years old, average adult education levels, average income levels and the city price of beer. The city-level model is • beerpc = b0 + b1perc21 + b2avgeduc + b3incpc+ b4price+ui,e which can be estimated by WLS using city population as weights

WLS Wrapup • Important: When doing F tests with WLS, form the weights from the unrestricted model and use those weights to do WLS on the restricted model as well as the unrestricted model • Remember we are using WLS just for efficiency – Coefficients estimated by OLS are still unbiased & consistent • Estimates will still be different due to sampling error, but if they are very different then it’s likely that some other Gauss-Markov assumption is false

Linear Probability Model and Heteroskedasticity • Suppose our dependent variable takes on one of two values—0 or 1. • For example: • y = 1 if a person has a high school education, y = 0 if the person does not • y = 1 if a student used drugs, y = 0 if the student does not use drugs • We can write down our usual equation y = b0 + b1x1 + b2x2 +..+ bkxk + u

Linear Probability Model and Heteroskedasticity • Because y can only take on two values, it doesn’t make sense to interpret β as the change in y given a one-unit increase in x, holding other factors fixed. • Still, the βs have an interesting interpretation • When y is binary taking on the values of 0 and 1, it is true that • P(y = 1|x) = E(y|x) • the probability of “success” (i.e., y = 1) is the same as the expected value of y. P(y = 1|x) = b0 + b1x1 + b2x2 +..+ bkxk

Linear Probability Model and Heteroskedasticity • We call this model the linear probability model (LPM) because the probability that y = 1 is linear in the parameters, β. • β measures the change in the probability of success when xj changes, ceteris paribus: ΔP(y = 1|x) = βjΔxj • For example: β = 0.038 means that as x increases by one unit, the probability that y = 1 increases by 0.038 or 3.8%

Linear Probability Model and Heteroskedasticity • Benefits to LPM: Estimated Coefficients are easy to interpret • Disadvantage: We can obtain predictions that are outside [0,1]. • What does it mean to say that the probability that someone smokes is -0.26. • Disadvantage: We can obtain changes in the probability that y = 1 (for a one unit change in x) greater than 1 or less than zero • Often a good place to start with a binary dependent variable

Linear Probability Model and Heteroskedasticity • Due to the binary nature of y, Assumption 5 (homoskedasticity) fails. Var(y|x) = p(x)[1-p(x)] where p(x) = E(y|x) = b0 + b1x1 + b2x2 +..+ bkxk **Recall the mean and variance for a Bernoulli distribution** • Because the variance depends on x, the LPM necessarily suffers from heteroskedasticity

Linear Probability Model and Heteroskedasticity • Recall that we have unbiased estimators of the βs • For observation i, Var(yi|x) is estimated by • Where is the is the OLS fitted value for observation i • Thus, we can estimate the model using WLS where

Linear Probability Model and Heteroskedasticity • There is one major problem with this • Solutions

LPM and WLS • 1. Estimate the model by OLS and obtain the fitted values, • 2. Determine whether all the values are inside the interval [0,1]. If not, make an adjustment to those that are not • 3. Construct the estimated variances • 4. Estimate the equation by WLS using weights

Multiple Regression Analysis