Lecture 10 : Heteroskedasticity

Lecture 10 :Heteroskedasticity Econ 488

Order of Testing • Omitted variables and incorrect functional form (Adjusted R2) • Either A or B, but not both • Serial Correlation (Durbin-Watson) • Heteroskedasticity (Park’s Test, White’s Test) • Multicollinearity (Correlation Matrix, VIF) • Irrelevant Variables (t-test)

Homoskedasticiy • Ideal Case: Homoskedasticity • Error variance σ2 is constant across sample • σ2 measures dispersion of dependent variable around regression line • Homoskedasticity means that the average relationship between dependent variable and independent variable is the same throughout sample

Homoskedasticity

Heteroskedasticity • Heteroskedasticity (or heteroscedasticity) is when σ2 is not constant across sample • Dispersion of dependent variable around regression line is not constant.

Heteroskedasticity

Why do we care? • If we don’t fix heteroskedasticity: • Coefficients are not efficient (not minimum variance) • Estimated standard errors biased and inconsistent…meaning • t-stats are not right!

When can it occur? • Whenever dispersion around regression line differs within sample • means relationship between dependent variable and independent variable differs within sample • Example: MLB Payroll and Market Size

2008 MLB Payrolls • Large Markets:(Population>5,000,000) • Mean: $104,000,000 • Std Dev: $44,600,000 • Min: $21,800,000 (Florida Marlins) • Max: $209,000,000 (NY Yankees) • Small Markets:(Population<5,000,000) • Mean: $78,800,000 • Std Dev: $28,300,000 • Min: $43,800,000 (Tampa Bay Rays) • Max: $139,000,000 (Detroit Tigers)

Heteroskedasticity • Note: Same principle applies when observations are groups that differ in size. e.g.: • States (population) • Countries (population) • Colleges (enrollment) • Companies (sales) • Etc.

Another Example • Household income and consumption. • Low-income households • Little Flexibility in spending • Most income spend on necessities: • Food, shelter, clothing, transportation, utilities • Little dispersion of consumption around mean consumption. • Small σ2

Household Income vs. Consumption • High income households • More flexibility in spending • Once necessities are purchased, much remains to be spent in different ways • Big Spenders • Savers and Investors • Large dispersion of consumption around mean.

Pure vs. Impure Heteroskedasticity • Impure – Occurs when regression is not correctly specified • E.g. omitted variables • Can cause heteroskedasticity • Pure – Occurs due to nature of data

Consequences • If we ignore heteroskedasticity, coefficient estimates are: • Unbiased – OK! • Consistent – OK! • Inefficient – Not OK. • t-tests are inaccurate.

Detection • Tests detect heteroskedasticity • But won’t distinguish between pure and impure types • If test uncovers heteroskedasticity–STOP! • Try to decide if you have omitted variable. • If you do… • Include it in your model, and then retest for heteroskedasticity

Detection • OR…If you don’t have an omitted variable: • Employ one of the remedies we’ll discuss • After you “fix” the problem, • Test again • If you still have heteroskedasticity, • It might be the impure type

Detection • Plots • Estimate model, save residuals • Plot residuals against each independent variable separately Example: data3-6.gdt

Plots

Plots – V on it’s side

Plots – Increasing or Decreasing

Plots – Rainbow or inverted rainbow

Park Test • If there is heteroskedasticity, then… • Var(εi)= σ2 Zi2 • εi = error term • σ2 = variance of homoskedastic error term • Zi= proportionality factor • If you know something about Z, you can use the Park test. • Find a variable that is related to heteroskedasticity (e.g. population)

Park Test • Run regression, obtain residuals • Run the following regression: • ln(ei2)= α0+ α1ln(Zi)+ ui • Where: • ei= residuals from regression • Zi= best choice as to proportionality factor in data • ui= classical error term • Test the significance of ln(Zi). • If significant, there is evidence of heteroskedasticity.

Park Test • Problem: We don’t always have a good Z • So, we can use White’s Test

White’s Test • H0: No Heteroskedasticity • HA: Heteroskedasticity

White’s Test • Estimate Equation • Yi=β0+β1X1i+β2X2i+εi • Save residual and square it. • Regress squared residual on a constant, X1, X2, X12, X22, X1X2 (all combinations of X’s) • ui2=α0+ α1X1i+ α2X2i + α3X1i2+ α4X2i2+ α5X1iX2i+ vi

White’s Test • Compute N*R2 • N= sample size • R2 = unadjusted R2 • Reject Null if • NR2 >χ2 (Chi-Square) with 5 degrees of freedom • Because there are 5 independent vars in auxiliary regression (step 3)

White’s Test • If you have 3 independent vars, auxiliary regression will have 9 independent vars. • X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3 • If you have 6 independent vars, auxiliary regression will have 27 independent vars! • This can get out of hand quickly.

White’s Test Version 2 • Same as before, except in auxiliary regression only use the X and X2 terms (no cross products) • Use when you have a lot of independent variables.

Remedies For Heteroskedasticity • Heteroskedasticity-Corrected Standard Errors • Fixes consistency of standard errors, so when N is large, standard errors are correct. • In gretl, just check the “robust standard error” box when running a regression

Remedies For Heteroskedasticity • Weighted Least Squares (WLS) • (1) Yi=β0+β1X1i+β2X2i+εi • (2) Var(εi)= σ2 Zi2 • eqn. (1) is equivalent to • (3) Yi=β0+β1X1i+β2X2i+Ziui • So we can divide through by Zi

Remedies For Heteroskedasticity • Step one: • Step two: estimate by OLS • Caution about step 2: there are two cases.

Remedies For Heteroskedasticity • Case 1: Z is not in the original equation • Old: Yi=β0+β1X1i+β2X2i+εi • New: • What’s Missing? • The constant! • Solution: Add a constant • Better:

Remedies For Heteroskedasticity • Case 2: Z is in the original equation • Suppose X1 is Z • Old: Yi=β0+β1X1i+β2X2i+εi • New: • What’s different about this equation? • One of the slope coefficients in the original equation becomes an intercept! • This happens because X1i/X1i=1

Remedies For Heteroskedasticity • That is: • Intercept value in the new equation is the same as slope β2 in the original equation. • What should you look at in the new equation to find the equation of X2? • The constant.

Remedies For Heteroskedasticity Example: saving.gdt (weight by income)

Lecture 10 : Heteroskedasticity