Estimation taking account of sample selection with Stata . Cheti Nicoletti ISER, University of Essex 2009. Estimation commands : truncreg , tobit, heckman, heckprobit , treatreg, ivreg Other useful commands: ivprobit, ivtobit Useful option in the estimation commands : pweights.
truncreg • The truncreg command is useful to estimate regression models with a truncated sample • Ex: Health insurance claims observed only when amount claimed is higher than a fixed threshold. truncreg y x1 x1 x2 … xk , ll(c)
tobit • The tobit command is useful to estimate regression models with a censored dependent variable (deterministic censure) • 3 Different types of models: • Tobit with fixed censoring value (tobit) • Censored regression with varying censoring value (cnreg) • Regression with interval data (intreg)
tobit • Tobit first type (consumption of a good) tobit y x1 x2 … xk , ll(0) tobit y x1 x2 … xk , ul(c)
cnreg • Tobit first type Ex. minimum wage with different levels in different years • cnreg y x1 x2 … xk censored(d)
intreg • Interval data regression (Ex:Bracket information on income for people refusing to give the exact value) • Whet yi* is not declared we observe the range to which yi* belong (0, 5000], (5000,15000], (15000,30000], (30000,+∞] say (ai, bi]
Estimating the regression with interval data in Stata The command intreg needs two variables to define the dependent variable, say y1 and y2 intreg y1 y2 x1 x2 … xk
heckman • The heckman command is used to estimate Generalized Tobit or Tobit of the 2nd type using ML estimation (default option) or the two-step estimation (option [twostep]) heckman yx1 x2 … xk, select(z1 z2 … zs) heckman yx1 x2 … xk, select(d = z1 z2 … zs) heckman yx1 x2 … xk, select(z1 z2 … zs) twostep
heckprobit • The heckman command is used to estimate a probit model with selection (option twostep does not exist because inconsistent) heckprobit px1 x2 … xk, select(z1 z2 … zs)
Impact of an endogenous dummy Homogenous treatment effect y1= earnings for trained people y0= earnings for non-trained people d dummy indicating participation to the training program y=y1 d+y0 (1-d) y=x+ d+ d*=z +u where d=l(d*>0) We have a selection problem because of the correlation between u and . This implies that d is not independent of .
treatreg • The treatreg command is used to evaluate the effect of a endogenous binary variables (treatment, program, …) on a continuous variable of interest (see previous slide). treatreg yx1 x2 … xk , treat(d=z1 z2 … zs) • Ex: Sample of graduated students with and without a master degree • y=log earnings, d=1 if master degree, 0 otherwise • x = age, age square, d, sex, type first degree • z = mother’s level of education, father’s level of education, sex, type first degree
How to use weights in Stata • Most Stata commands can deal with weighted data. Stata allows four kinds of weights: • fweights, or frequency weights, are weights that indicate the number of duplicated observations. • pweights, or sampling weights, are weights that denote the inverse of the probability that the observation is included due to the sampling design and or nonresponse. • aweights, or analytic weights, are weights that are inversely proportional to the variance of an observation; i.e., the variance of the j-th observation is assumed to be sigma^2/w_j, where w_j are the weights. • iweights, or importance weights, are weights that indicate the "importance" of the observation in some vague sense.
Option pweights • Usually sample surveys provide weights to take account of sampling design and nonresponse. • Let p be individual weight • Then we can run a regression with weighted observations regress y x1 x2 … xk [pweight=p] • Let us assume to have a sample with a sample selection problem (due to observables), then we can use propensity score weighting • A possible “simplified” way to estimate your own weights is described in the following: probit d z1 z2 … zs predict prop gen invprop=1/prop reg y x1 x2 … xk [pweight=invprop]
For complex survey design it is better to use • svyset [pweight=p] • svy: regress y x1 x2 … xk • svyset have options for cluster sampling designs or other complex design • Declare survey design for dataset • svyset [pweight=p], strata(stratid)
ivreg • The ivreg command is used to estimate regression model by using instrumental variables for potential endogenous explanatory variables. • Evaluation of the impact of years of schooling on earnings y=x+ d*+ Problem: d* and are correlated Solution 1: IV estimation ( IV=z: parental interest in the child education, bad financial shock of the family when the child is age 11-16, presence of older siblings, Blundell et al 2003) ivreg y x1 x1 x2 … xk (d*=z1 z2 … zs)
STATA program for evaluation Abadie A., Drukker D., Herr J.L., Imbens G.W. (2001), Implementing Matching Estimators for Average Treatment Effects in Stata, The Stata Journal, 1, 1-18 http://ksghome.harvard.edu/~.aabadie.academic.ksg/software.html Becker S.O., Ichino A. (2002), Estimation of average treatment effects based on propensity scores. The Stata Journal, 2, 358-377http://www.lrz-muenchen.de/~sobecker/pscore.html Sianesi B. (2001), Implementing Propensity Score Matching Estimators with STATA, UK Stata Users Group, VII Meeting London, http://ideas.repec.org/c/boc/bocode/s432001.html
