GY460 Techniques of Spatial Analysis

GY460 Techniques of Spatial Analysis Lecture 4: Techniques for dealing with spatial sorting and selection:(fixed effects, diff-in-diff, matching and discontinuities etc.) Steve Gibbons

Introduction • Sometimes we just want to eliminate problems induced by spatial ‘sorting’ and heterogeneity • i.e. differences between places which may lead to ‘confounding’ factors and biased estimates of relationships of interest • Selection (sorting) on observable and unobservable characteristics • Examples: • Eliminating spatial factors from models of firm behaviour • Eliminating geographical influences from models of school quality • Various methods are available for dealing with this; we have looked some of these already…

Regression models with spatial effects

Data with discrete zones • N observations in the data • Grouped in to M zones (regions, districts, neighbourhoods) • E.g. • Cross-section data with >1 cross-sectional observations in each neighbourhood • Or panel data with more than one time period for each neighbourhood

Spatial variation in the mean • Empirical model, with discrete ‘neighbourhoods’ m • yim for observation i in place m, depends on: • xim : characteristics of observation i in place m • im : unobserved factors for observation i in place m • um : Unobserved factors common to all observations in place m • X-sectional case: i = cross-sectional units, m = places • Panel data case: i = time units, m=places

Random effects • Empirical model, with discrete ‘neighbourhoods’ m • If um uncorrelated with xim, then OLS consistent; just like spatial error model • Error terms  are correlated within spatial groups m • But uncorrelated between spatial groups • Use GLS or ML (assuming normality) for efficient estimates and unbiased s.e.s (multi-level modelling)

Fixed area effects – dummy variables • Empirical model, with discrete ‘neighbourhoods’ m • If um correlated with xim, then OLS inconsistent. • Options: • Estimate the area ‘fixed effects’ using OLS • Least Squares Dummy variable model: neighbourhood dummy variables

Fixed area effects – within groups • Or ‘within-groups’ transformation: difference the variables from the neighbourhood mean • Where is the mean of y in group m • Eliminates um • Estimate by OLS • Only uses deviation of variables from neighbourhood means – so only within-neighbourhood variation counts • LSDV and Within Groups (or (‘Fixed Effect’) models are equivalent

Fixed area effects – panel data • Even better: information with repeated observations on panel units (individuals, firms, regions etc.) over time • Panel data • Now all relationships of interest can be estimated from variation within panel units over time • Use within-groups or first-differences over time, e.g. • Q: what does (vt-vt-1 ) represent? How could you control for it? Then, what variation in the data allows us to estimate ? Hence what do we assume, if  is to be estimated consistently?

Dynamic panel data models • It would be useful to estimate this model – e.g. to estimate the dependence of y on past values (or control for mean reversion) • Q: Can this within group model be estimated consistently by OLS? • See Nickell (1981) Econometrica • What about the first differenced model? • Q: Is there a useful IV here?

Dynamic panel data models • In principle you could use instruments for : • This is the basis of the Arrelano Bond estimator (1991, Review of Economic Studies) • They develop a GMM estimator which weights the instruments taking into account the first-differenced error structure e.g. implemented in “xtabond” in STATA • Problems: serial correlation in error terms?, if  is close to zero the instruments will be very weak (since lagged values don’t predict current values if =0) • Can also use as instruments for • System GMM (Blundell and Bond 1998): xtabond2

Spatial panel data models • These look attractive e.g. to eliminate sorting i.e. u_i • But this still suffers from the simultaneity problems of the spatial y model – requires maximum likelihood or instruments for • Also difficult to defend that there is spatial correlation, but no time-dynamics • So you have to estimate • Have to deal with time dynamic y and spatial y!

Spatial panel data models • Probably more useful to consider the reduced form e.g.

Difference in difference

Difference-in-difference • Suppose we have places, firms individuals i observed over time. • Treatment group D=1 is exposed to some treatment x=1,0 at time t=1, whereas a control group D=0 is not • There is selection into treatment group (E[f|D]0) and common time effects g

Difference-in-difference • The effect of the treatment can be estimated by a “Difference in difference” estimator • Note that this is the same as you’d get from OLS on

Difference-in-difference • The DiD estimator is commonly used for evaluation of policy interventions • DiD doesn’t work if the treatment and control groups have different time trends • If the composition of the treatment or control groups change before and after treatment e.g.

Matching

Matching estimators • ‘Matching’ tries to do something similar, when treatment and control group are not both observed pre and post policy • Suppose we observe two groups • Suppose the goal is to estimate the “Average effect of the Treatment on the Treated” (ATT) • As we know, simple difference in means won’t work: • i.e. because the treated and non-treated would have different Y in the absence of treatment

Matching estimators • But suppose we have some observable characteristics Z for which • i.e. mean pre-treatment Y for individuals with characteristics Z is the same, whether or not they are in the treatment group • Called “Conditional Independence Assumption CIA” • Allows for selection into treated and non-treated groups by Z (selection on observables), but not by unobservables. • So if you can find individuals in group 0 who have the same Z as those in group 1 you can estimate from the individuals in group 0 • If Z is discrete this is straightforward..

Matching estimators • So we can estimate • The naïve estimate of the effect of the treatment is 190-125 = 65

Matching estimators • For the treated, Y0 is unobserved but can be estimated by re-weighting (under the CIA assumption) • So the ATT is 190-180 = 10

Matching estimators • But what if (as is usual) Z is not discrete? Propensity score matching does this reweighting using an estimate of the probabilty that individual with characteristics z is in the treatment group • (Rosenbaum and Rubin (1983) Biometrika) • Requires a first stage estimate of Pr(D=1 | Z) e.g. from a probit or logit regression on Z • Then the treatment effect for an individual i in the treated group can be estimated as • Where the weights depend on the difference between the propensity score for individual i and the untreated controls j, and:

Matching estimators • In practice Matching estimators behave like ‘kitchen sink’ regressions: you are just controlling for as many observable characteristics as possible (Z) • However, you are controlling for these Z in a very non-linear way: like having lots of control variables and their interactions in an OLS regression • Matching estimators allow for heterogenous treatment effects • You can re-weight in other ways, e.g. to estimate the effect of the treatment on the population, or on the un-treated • No solution to selection on unobservables – which is surely the main issue! • Requires “common support”: no overlap between Z in the treated and untreated groups  you can’t match.

Discontinuity designs

Discontinuity designs • Regression discontinuity method tries to identify causal effects from abrupt changes • Requires a discontinuity induced by institutional rules, policy etc. • e.g. majority voting • Class size rules – e.g. Maimonides rule • Geographical administrative boundaries • Assumption is that assignment to treatment is determined by some covariate X when it reaches a value d • The outcome is otherwise only related to X by a smooth function e.g. E[y|X] = m(X)

Discontinuity designs

Discontinuity designs • So • Idea is to estimate the average effect of the treatment at the discontinuity point • We could control for a m(x) parametrically (polynomial series etc.) • Or restrict the sample to observations for which x is close to c i.e.

Admissions boundary Boundary discontinuities School quality in district B +ve quality-price relationship across boundary Price, homeowner characteristics Price, homeowner characteristics School quality in district A Unobserved local amenity

Discontinuity designs • In principle, X is identical for treatment and controls exactly at the discontinuity • But practical applications require non-zero differences between X and discontinuity • E.g. can rarely find a large enough sample of housing transactions exactly on the boundary • Trade off between adequate sample size and elimination of biases due to m(x) • We looked at practical spatial examples – e.g. Black (1999), Duranton et al (2006) • See also Gibbons, S., Machin, S and Silva, O. (2009), Valuing School Quality Using Boundary Discontinuity Regressions, SERC DP0018 http://www.spatialeconomics.ac.uk/textonly/SERC/publications/download/sercdp0018.pdf

Applications to spatial policy evaluation • Research designs can incorporate elements of all these methods e.g. match treatment and control groups using propensity score matching, then implement dif in dif • Machin, S., McNally, S., Meghir,C. (2007), Resources and Standards in Urban Schools, IZA DP2653 http://ftp.iza.org/dp2653.pdf • Busso, M. and P. Kline (2006) Do Local Economic Development Programs Work, Evidence from Federal Empowerment Zone Program, http://www.econ.berkeley.edu/~pkline/papers/Busso-Kline%20EZ%20(web).pdf • Romero, R. and M. Noble (2008) Evaluating England’s New Deal for Communities Programme Using the Difference in Difference method, Journal of Economic Geography 8(6): 1-20

The partial linear model

Continuous space • A general model with spatial heterogeneity: • Si is an index of the location of observation i • Model continuous unobserved variation over space • m(.) is supposed to represent large-scale predictable variation over space – e.g. land values •  random shocks – sales price of specific houses • We discussed these issues in the lecture on smoothing • Could do it parametrically e.g. polynomial series or Cheshire and Sheppard (1995) – see earlier lectures

Partial linear model • Suppose • If we know , function m(.) is just the expected (mean) value of y-xb given the location s1, s2 • Refer to the lecture on smoothing: this can be inferred from values of y in neighbouring locations once we know  • Spatial weighting again • Kernel weighting, nearest neighbours etc..

Semi-parametric spatial models • Must get estimates of beta first? How? • e.g. see Robinson (1988), Econometric, Root-n consistent Semiparametric Regression • Estimate averages of y and all x at each point in the data, non-parametrically • Estimate the betas by OLS on • Note: analogy to the within-groups model • Can then estimate

Applications to housing analysis • Clapp, J. M., H.-J. Kim, and A. E. Gelfand (2002): "Predicting Spatial Patterns of House Prices Using Lpr and Bayesian Smoothing," Real Estate Economics, 30, (4), 505-532 • Use of non-parametric methods to construct house price indices • Gibbons, S., and S. Machin (2003): "Valuing English Primary Schools," Journal of Urban Economics, 53, (2). • Use of the semi-parametric model for eliminating larger-scale neighbourhood effects on school performance

Conclusions • Underlying issue we have considered is selection or sorting e.g. people, firms etc of different types sort into different locations and this can lead to biased estimates of causal relationships • Selection can be on unobservables, or observables • We considered various techniques for dealing with these problems • Other solutions – random assignment, IV we have or will consider elsewhere.

GY460 Techniques of Spatial Analysis