Chapter 10

Endogenous Regressors and Instrumental Variables Estimation Chapter 10 Adapted from Vera Tabakova, East Carolina University

Chapter 10: Random Regressors and Moment Based Estimation • 10.1 Linear Regression with Random x’s • 10.2 Cases in which x and e are Correlated • 10.3 Estimators Based on the Method of Moments • 10.4 Specification Tests • Show Kennedy;s graphs and explain “regression” Principles of Econometrics, 3rd Edition

Chapter 10: Random Regressors and Moment Based Estimation The assumptions of the simple linear regression are: • SR1. • SR2. • SR3. • SR4. • SR5. The variable xi is not random, and it must take at least two different values. • SR6. (optional) Principles of Econometrics, 3rd Edition

Chapter 10: Random Regressors and Moment Based Estimation The purpose of this chapter is to discuss regression models in which xiis random and correlated with the error term ei. We will: • Discuss the conditions under which having a random x is not a problem, and how to test whether our data satisfies these conditions. • Present cases in which the randomness of x causes the least squares estimator to fail. • Provide estimators that have good properties even when xi is random and correlated with the error ei. Principles of Econometrics, 3rd Edition

10.1 Linear Regression With Random X’s • A10.1 correctly describes the relationship between yi and xiin the population, where β1 and β2 are unknown (fixed) parameters and ei is an unobservable random error term. • A10.2 The data pairs , are obtained by random sampling. That is, the data pairs are collected from the same population, by a process in which each pair is independent of every other pair. Such data are said to be independent and identically distributed. Principles of Econometrics, 3rd Edition

10.1 Linear Regression With Random X’s • A10.3 The expected value of the error term ei, conditional on the value of xi, is zero. This assumption implies that we have (i) omitted no important variables, (ii) used the correct functional form, and (iii) there exist no factors that cause the error term ei to be correlated with xi. • If , then we can show that it is also true that xi and ei are uncorrelated, and that . • Conversely, if xi and ei are correlated, then and we can show that . Principles of Econometrics, 3rd Edition

10.1 Linear Regression With Random X’s • A10.4 In the sample, xi must take at least two different values. • A10.5 The variance of the error term, conditional on xi is a constant σ2. • A10.6 The distribution of the error term, conditional on xi, is normal. Principles of Econometrics, 3rd Edition

10.1.1 The Small Sample Properties of the OLS Estimator • Under assumptions A10.1-A10.4 the least squares estimator is unbiased. • Under assumptions A10.1-A10.5 the least squares estimator is the best linear unbiased estimator of the regression parameters, conditional on the x’s, and the usual estimator of σ2 is unbiased. Principles of Econometrics, 3rd Edition

10.1.1 The Small Sample Properties of the OLS Estimator • Under assumptions A10.1-A10.6 the distributions of the least squares estimators, conditional upon the x’s, are normal, and their variances are estimated in the usual way. Consequently the usual interval estimation and hypothesis testing procedures are valid. Principles of Econometrics, 3rd Edition

10.1.2 Asymptotic Properties of the OLS Estimator: X Not Random Figure 10.1 An illustration of consistency Principles of Econometrics, 3rd Edition

10.1.2 Asymptotic Properties of the OLS Estimator: X Not Random Principles of Econometrics, 3rd Edition

10.1.3 Asymptotic Properties of the OLS Estimator: X Random • A10.3* Principles of Econometrics, 3rd Edition

10.1.3 Asymptotic Properties of the OLS Estimator: X Random • Under assumption A10.3* the least squares estimators are consistent. That is, they converge to the true parameter values as N. • Under assumptions A10.1, A10.2, A10.3*, A10.4 and A10.5, the least squares estimators have approximate normal distributions in large samples, whether the errors are normally distributed or not. Furthermore our usual interval estimators and test statistics are valid, if the sample is large. Principles of Econometrics, 3rd Edition

10.1.3 Asymptotic Properties of the OLS Estimator: X Random Principles of Econometrics, 3rd Edition

10.1.4 Why OLS Fails Figure 10.2 Plot of correlated x and e Principles of Econometrics, 3rd Edition

10.1.4 Why OLS Fails True model, but it gets estimated as: …so there is s substantial bias … Principles of Econometrics, 3rd Edition

10.1.4 Why OLS Fails Example Wages = f(intelligence) Figure 10.3 Plot of data, true and fitted regressions In the case of a positive correlation between x and the error Principles of Econometrics, 3rd Edition

10.2 If X and e Are Correlated When an explanatory variable and the error term are correlated the explanatory variable is said to be endogenous and means “determined within the system.” Principles of Econometrics, 3rd Edition

e X Y u v 10.2.1 Measurement Error X Y Principles of Econometrics, 3rd Edition

10.2.1 Measurement Error Principles of Econometrics, 3rd Edition

10.2.2 Omitted Variables We will focus on this case Omitted factors: experience, ability and motivation. Therefore, we expect that Y X W Principles of Econometrics, 3rd Edition

10.2.2 Omitted Variables We will focus on this case Other examples: • Y = crime, X = marriage, W = “marriageability” • Y = divorce, X = “shacking up”, W = “good match” • Y = crime, X = watching a lot of TV, W = “parental involvement” • Y = sexual violence, X = watching porn, W = any unobserved anything that would affect both W and Y • The list is endless! . Y X W Principles of Econometrics, 3rd Edition

X Y 10.2.3 Simultaneous Equations Bias There is a feedback relationship between Pi and Qi. Because of this feedback, which results because price and quantity are jointly, or simultaneously, determined, we can show that The resulting bias (and inconsistency) is called the simultaneous equations bias. Principles of Econometrics, 3rd Edition

10.2.4 Lagged Dependent Variable Models with Serial Correlation In this case the least squares estimator applied to the lagged dependent variable model will be biased and inconsistent. Principles of Econometrics, 3rd Edition

10.3 Estimators Based on the Method of Moments When all the usual assumptions of the linear model hold, the method of moments leads us to the least squares estimator. If x is random and correlated with the error term, the method of moments leads us to an alternative, called instrumental variables estimation, or two-stage least squares estimation, that will work in large samples. Principles of Econometrics, 3rd Edition

10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model Suppose that there is another variable, z, such that • z does not have a direct effect on y, and thus it does not belong on the right-hand side of the model as an explanatory variable ONCE X IS IN THE MODEL! • zi should not be simultaneously affected by y either, of course • zi is not correlated with the regression error term ei. Variables with this property are said to be exogenous • z is strongly [or at least not weakly] correlated with x, the endogenous explanatory variable • A variable z with these properties is called an instrumental variable. Principles of Econometrics, 3rd Edition

10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model • An instrument is a variable z correlated with x but not with the error e • In addition, the instrument does not directlyaffect y and thus does not belong in the actual model as a separate regressor (of course it should affect it through the instrumented regressor x) • It is common to have more than one instrument for x (just not good ones!) • These instruments, z1; z2; : : : ; zs, must be correlated with x, but not with e • Consistent estimation is obtained through the instrumental variables or two-stage least squares(2SLS) estimator, rather than the usual OLS estimator Principles of Econometrics, 3rd Edition

Z e X Y W Instrumental Variables • Using Z, an “instrumental variable” for X is one solution to the problem of omitted variables bias • Z, to be a valid instrument for X must be: • Relevant = Correlated with X • Exogenous = Not correlated with Y except through its correlation with X

10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model Based on the method of moments… Principles of Econometrics, 3rd Edition

10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model Solving the previous system, we obtain a new estimator that cleanses the endogeneity of X and exploits only the component of the variation of X that is not correlated with e instead: We do that by ensuring that we only use the predicted value of X from its regression on Z in the main regression Principles of Econometrics, 3rd Edition

10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model These new estimators have the following properties: • They are consistent, if • In large samples the instrumental variable estimators have approximate normal distributions. In the simple regression model Principles of Econometrics, 3rd Edition

10.3.3 Instrumental Variables Estimation in the Simple Linear Regression Model • The error variance is estimated using the estimator For: The stronger the correlation between the instrument and X the better! Principles of Econometrics, 3rd Edition

10.3.3a The importance of using strong instruments Using the instrumental variables is less efficient than OLS (because you must throw away some information on the variation of X) so it leads to wider confidence intervals and less precise inference. The bottom line is that when instruments are weak, instrumental variables estimation is not reliable: you throw away too much information when trying to avoid the endogeneity bias Principles of Econometrics, 3rd Edition

X Y Z Instrumental Variables using Venn diagrams • Not all of the available variation in X is used • Only that component of X that is “explained” by Z is used to explain Y X = Endogenous variable Y = Response variable Z = Instrumental variable

X Y Z X Y Z Weak versus Strong IVs Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for Realistic scenario: Very little of X is explained by Z, and/or what is explained does not overlap much with Y

Instrumental Variables Models • The IV estimator is BIASED • E(bIV) ≠β (finite-sample bias) but consistent: E(b) → β as N → ∞ So IV studies must often have very large samples • But with endogeneity, E(bLS) ≠β and plim(bLS) ≠β anyway… • Asymptotic behavior of IV: plim(bIV) =β + Cov(Z,e) / Cov(Z,X) • If Z is truly exogenous, then Cov(Z,e) = 0

ω ε α1 β1 Z X Y ξ δ1 Z Y Some IV Terminology • Three different models to be familiar with • First stage: EDUC = α0 + α1Z + ω(‘reduced form equation’) • Structural model: WAGES = β0 + β1EDUC + ε • Also: WAGES = δ0 + δ1Z + ξ • An interesting equality: δ1 = α1 × β1 so… β1 = δ1 / α1

10.3.4a An Illustration Using Simulated Data Principles of Econometrics, 3rd Edition

10.3.3b An Illustration Using Simulated Data Principles of Econometrics, 3rd Edition

10.3.3c An Illustration Using a Wage Equation Principles of Econometrics, 3rd Edition

10.3.3c An Illustration Using a Wage Equation We hope for a high t ratio here Check it out: as expected much lower than from OLS!!! Principles of Econometrics, 3rd Edition

open "@gretldir\data\poe\mroz.gdt" logs wage square exper list x = consteducexpersq_exper list z = constexpersq_expermothereduc # least squares and IV estimation of wage eq olsl_wage x tslsl_wage x ; z # tsls--manually smpl wage>0 --restrict olseduc z series educ_hat = $yhat olsl_wageconsteduc_hatexpersq_exper But check that the latter will give you wrong s.e. so not reccommended Include in z all G exogenous variables and the instruments available Exogenous variables Instrument themselves! Principles of Econometrics, 3rd Edition

In econometrics, two-stage least squares (TSLS or 2SLS) and instrumental variables (IV) estimation are often used interchangeably The `two-stage' terminology comes from the time when the easiest way to estimate the model was to actually use two separate least squares regressions With better software, the computation is done in a single step to ensure the other model statistics are computed correctly Unfortunately GRETL outputs R2 for the IV regression; you should ignore it! It is based on the correlation of y and yhat, so it does not have the usual interpretation… Principles of Econometrics, 3rd Edition

10.3.4 Instrumental Variables Estimation With Surplus Instruments A 2-step process. • Regress x on a constant term, z and all other exogenous variables G, and obtain the predicted values • Use as an instrumental variable for x. Principles of Econometrics, 3rd Edition

10.3.4 Instrumental Variables Estimation With Surplus Instruments Two-stage least squares (2SLS) estimator: • Stage 1 is the regression of x on a constant term, z and all other exogenous variables G, to obtain the predicted values . This first stage is called the reduced formmodel estimation. • Stage 2 is ordinary least squares estimation of the simple linear regression Principles of Econometrics, 3rd Edition

10.3.4 Instrumental Variables Estimation With Surplus Instruments Principles of Econometrics, 3rd Edition

10.3.4b An Illustration Using a Wage Equation If we regress education on all the exogenous variables and the TWO instruments: Rule of thumb for strong instruments: F>10 These look promising  In fact: F test says Null hypothesis: the regression parameters are zero for the variables mothereduc, fathereduc Test statistic: F(2, 423) = 55.4003, p-value 4.26891e-022 Principles of Econometrics, 3rd Edition

10.3.4b An Illustration Using a Wage Equation With the additional instrument, we achieve a significant result in this case Principles of Econometrics, 3rd Edition

10.3.5 Instrumental Variables Estimation in a General Model Is a bit more complex, but the idea is that you need at least as many Instruments as you have endogenous variables You cannot use the F test we just saw to test for the weakness of the instruments, but see Appendix for a test based on the notion of partial correlations For example, run this experiment with the mroz.gdt data: # partial correlations--the FWL result olseducconstexpersq_exper series e1 = $uhat olsmothereducconstexpersq_exper series e2 = $uhat ols e1 e2 corr e1 e2 Principles of Econometrics, 3rd Edition

Chapter 10

Chapter 10

Presentation Transcript

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

CHAPTER 10

CHAPTER 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10

10~Chapter 10

CHAPTER 10

Chapter 10

Chapter 10

Chapter 10

Chapter 10