Propensity Score Matching and the EMA pilot evaluation

Propensity Score Matching and the EMA pilot evaluation Lorraine Dearden IoE and Institute for Fiscal Studies RMP Conference 22nd November 2007

The Evaluation Problem • Question which we want to answer is • What is the effect of some treatment (Di=1) on some outcome of interest (Y1i) compared to the outcome (Y0i) if the treatment had taken place (Di=0) • Problem is that it is impossible to observed both outcomes of interest to get the true causal effect

How can we solve this problem? • Randomised experiment • Randomly assign people to treatment group and control group • If groups large enough, the distribution of all pre-treatment characteristics in the two groups should be identical so any difference in outcome can be attributed to the treatment • Not generally available • Not always solution

Propensity Score Matching • Instead have to rely on non-experimental approaches • Propensity score matching is one such method that is gaining popularity because of simplicity • Crucial, however, to understand the assumptions underlying the approach (and all approaches) • Again NOT always appropriate • may need to rely on other method e.g. instrumental variables, control function

Assumptions • Need to have a treatment group and some type of appropriate non-treated group from which you can select a control group • Finding an appropriate and convincing control group is often the most difficult evaluation task • Assume ALL relevant differences between the groups pre-treatment can be captured by observable characteristics in your data (X) • Having high quality and extensive pre-treatment observables is crucial! • Conditional Independence Assumption (CIA) assumption • Common support – return to this

What are we trying to measure? • Average treatment effect for the population (ATE) • Average treatment effect on the treated (ATT) • Average treatment effect on the non-treated (ATNT) • Usually interested in ATT:E(Y1 – Y0|D=1) = E(Y1|D=1) – E(Y0|D=1) • OLS - ATT=ATE=ATNT • IV – LATE • Matching and control function - ATE, ATT & ATNT • How can we find E(Y0|D=1)?

What is treatment? • Most robust design is Intention to Treat (ITT) analysis – treatment is all individuals who could have taken up program whether they did or not • Another approach is ‘receipt of treatment’ approach – but here sometimes much more difficult to find an appropriate control group

Matching • Involves selecting from the non-treated pool a control group in which the distribution of observed variables is as similar as possible to the distribution in the treated group • There are a number of ways of doing this but they almost always involve calculating the propensity score pi(x)Pr{D=1|X=x}

The propensity score • The propensity score is the probability of being in the treatment group given you have characteristics X=x • How do you do this? • Use parametic methods (i.e. logit or probit) and estimate the probability of a person being in the treatment group for all individuals in the treatment and non-treatment groups • Rather than matching on the basis of ALL X’s can match on basis of this propensity score (Rosenbaum and Rubin (1983))

How do we match? • Nearest neighbour matching • each person in the treatment group choose individual(s) with the closest propensity score to them • can do this with (most common) or without replacement • not very efficient as discarding a lot of information about the control group

Kernel based matching • each person in the treatment group is matched to a weighted sum of individuals who have similar propensity scores with greatest weight being given to people with closer scores • Some kernel based matching use ALL people in non-treated group (e.g. Gaussian kernel) whereas others only use people within a certain probability user-specified bandwidth (e.g. Epanechnikov ) • Choice of bandwidth involves a trade-off of bias with precision

Other methods • Radius matching • Caliper matching • Mahalanobis matching • Local linear regression matching • Spline matching…..

Imposing Common Support • In order for matching to be valid we need to observe participants and non-participants with the same range of characteristics • i.e for all characteristics X there are treated and non-treated individuals • If this cannot be achieved • treated units whose p is larger than the largest p in the non-treated pool are left unmatched

How do we get standard errors? • Asymptotics of propensity score matching hard/impossible to define • Generally need to ‘Bootstrap’ standard errors • Take a random draw from your sample with replication • Repeat this 500 to 1000 times • Standard Deviation of these estimates gives you your standard error

What was the EMA pilot? • EMA pilots involved payment of up to £40 per week for 16-18 year olds who remained in full-time education • 4 different variants tested: V1 – up to £30 per week, £50 retention and achievement bonus V2 – V1 but up to £40 per week V3 – V1 but paid to mother V4 – V1 but more generous bonuses

Justifications for intervention • Low levels of participation in post-16 education among low income families • Presence of liquidity constraints? • need evidence on the returns to education • Card (2000), Cameron & Heckman (2001) suggest that these may not be that important • Meghir & Palme (1999) find evidence of liquidity constraints using Swedish data

Design of the evaluation • Interviews with young people and parents in 10 EMA pilot areas and 11 control areas • Information collected both among those income-eligible and income-ineligible for the EMA • First survey involved young people who completed Year 11 in 1999 (cohort 1) • Parental questionnaire only in initial survey • Cohort 1 followed up 3 times

The data • Questionnaires have detailed information on: • all components of family income • household composition • GCSE results • mother’s and father’s education, occupation and work history • early childhood circumstances • current activities of young people

Matching approach • Involves taking all eligible individuals in the pilot areas and matching them with a weighted sum of individuals who look like them in control areas • Difference in full-time education outcomes in pilot and control areas in this matched sample is the estimate of the EMA effect (ATT) • Crucial assumption is that we observe everything that determines education participation

How do we do this? • Don’t match on all X’s, but can instead match on the propensity score (Rosenbaum and Rubin, 1983) • Propensity score is just predicted probability of being in a pilot area given all the observables in our data • Use kernel-based matching (Heckman, Ichimura & Todd, 1998) • We do this matching for each sub-group of interest

Variables we match on: • Family background • household composition, housing status, ethnicity, early childhood characteristics, older siblings’ education and parents’ age, education, work status and occupation • Family income • current family income, whether on means-tested benefits • Ability (GCSE results) • School variables • Indicators of ward level deprivation

Results Y12: urban men Note: Income eligibles only

Results Y12: urban women Note: Income eligibles only

Results Y13: Note: Income eligibles only

Results by Eligibility Groups • In Year 12 impact concentrated on those who are fully eligible (6.7-6.9 % pts) • No significant effect for boys or girls on taper • No effect on ineligibles • In Year 13 impact on both groups • EMA impacts significantly on retention for those on the taper

Does is matter who EMA paid to? • No difference if we do not distinguish by eligibility • For variant where paid to child impact is concentrated on those fully eligible • For variant where paid to mother impact on those who are fully and partially eligible

Credit Constraints? • Follow consumption literature (see Zeldes (1989)) split the sample by assets, the idea being that those with assets are not liquidity constrained. • Compare results for home-owners and non home-owners • The key assumption here is that house ownership in itself does not lead to different responses to financial incentives, other than because it implies different access to funds.

Results • Significant impact for non home-owners of 9.1 percentage points • Insignificant impact of home-owners of 3.8 percentage points • But difference of 5.3 percentage points is not significant at conventional levels (p-value 12%)

Conclusions • EMA effect around 4.5 percentage points • Plays a role in reducing gender differences in stay-on rates particularly retention in Year 13 • Important to control for local area effects • matching on ward level data important

Other conclusions • More effective paying to child rather than parent for those fully eligible • More effective paying to mother for those who are partially eligible • Increase drawn from both work and NEET groups • Some evidence it may be alleviating credit constraints

What else can you do with Matching? • What is the policy question you are interested in? • Is ATT the appropriate measure? • In returns to schooling evaluation we are much more interested in ATNT • What is treatment – ITT versus ‘receipt of treatment’ • Take-up usually an important policy implication therefore usually inappropriate (& difficult) to compare actual participants with an appropriate control group but sometimes no choice!

Propensity Score Matching and the EMA pilot evaluation