Maximizing Impact: Evaluating Programs for Success

Impact Evaluation: why, what and how?

Why is this important? • Impact evaluation can provide reliable estimates of the causal effects of programs • Impact evaluation can potentially help improve the efficacy of programs by influencing design and implementation • Impact evaluation can broaden political support for programs • Impact evaluation can help in the sustainability of successful programs and the termination of programs that are a failure • Impact evaluation can help expand our understanding of how social programs produce the effects that they do

What is Impact Evaluation? • Impact evaluation is a set of methods to identify and quantify the causal impact of programs • For example: • What is the effect of a scholarship program on student exam performance? • What is the effect of a training program on post-training employment and earnings? • What is the effect of a textbook provision program on child enrollment and learning?

Why are these effects difficult to estimate? • The basic question: What would it happen in the absence of the program? (e.g., what is the contrafactual?) • We need the same individual with and without the program • …but it is impossible to observe the same individual in both states! • Solution: “build” the correct contrafactual • Find individuals who do not have the benefits of the program, but are very similar to the ones that have the program

What is the problem? • In short, the problem is to find the correct comparison (control) group • The control group and the treatment group should have the same characteristics, observable and unobservable, before the beginning of the program • External factors will affect in the same way control and treatment group • An, usually, people self-select into programs, and therefore, the beneficiaries are (with high probability), different that the ones that did not enter into the program

The basic intuition: only data before and after the program Y Impact of the program? NO! We need a contrafactual Time t = 0 before program t = 1 after program Intervention

We need the right comparison group Y Impact of the program Control Time t = 0 before program t = 1 after program Intervention

The basic intuition 2: only data after the program…. Y Impact of the Program? Control Time t = 0 before program t = 1 after program Intervention

We need the right comparison group! Y Impact of the Program? NO! At t=0, two groups were very different… Control Time t = 0 before program t = 1 after program Intervention

Four possibilities to find or construct the right control group • Prospective evaluation • Randomization of benefits • Randomization of entry (phase-in approach) • Randomization of information (encouragement design) • Retrospective evaluation • Regression discontinuity analysis • Instrumental variables • Differences in differences • Propensity and matching estimators

Randomization • Lottery among individuals will separate the sample between winners and losers • Model of “over-subscription” • A lottery will create homogenous control and treatment group: they will be very similar in all characteristics • Unit of randomization is important: • Geographical observation (states, districts, towns, etc) or Individual (e.g. people or household) randomization • Number of unit of observations: If it is too small, the sample may not balance at t=0.

Example of randomization • Peer Effects, Pupil-Teacher Ratios and teacher Incentives Evidence from a Randomized Evaluation in Kenya (Duflo, Dupas and Kremer )

The Extra-Teacher Program • With funds from World Bank, ICS (an NGO) gave 2,500 ksh monthly (US$40) to school committees to hire an extra-teacher for grade 1. • Teacher hired locally after advertisement • Teacher must be fully qualified • Program started in May 2005 (beginning of second quarter of 2005 school year) • It was pursued the 2006 school year: • In grade 2 • Schools encouraged to keep the same children with the extra teachers • 140 program schools randomly selected out of 210 schools (70 comparison schools)

How does the authors measure the effects of the program? • Across schools randomization • 88 school-classes serves as comparison group. • Among treatment schools: • School committees training in some schools (SBM schools) • Separate classes by prior achievement in some schools (tracking schools) • Within school randomization • Children were randomly assigned to the contract teacher or the regular teacher in all treatment schools. • In tracking schools, random selection of which teachers takes which group.

Impact on test Score • Overall program effect is positive and significant (0.22 SD) • But effect in non-tracking, non SBM schools is smaller and insignificant (0.13 SD) • And the overall class size effect (compare students with regular teacher in non SBM non tracking school) is small and insignificant (0.09 SD) • Conclusion: class size without any other changes does not produce significant increase in test score

Randomized phase-in design • When do we use this method? • When government / institutions are planning to roll in a program • Basic intuition: randomization of entry • So, you have different intensity of treatments: early entrants are expose to a larger treatment than latter ones • Usually the randomization is at a geographical level

Randomized phase-in design Time:

Encouragement design • If randomization is not possible, still it is possible to randomized the information • One group is given intensive information • Both groups can apply to the program • Basic idea: we create a variable (the information campaign), that is correlated with receiving the benefits and it is uncorrelated with any other characteristic of the individuals / towns / etc. • This is a correct Instrumental Variable • Assumption: group that receive the campaign actually has an observed higher probability of receiving benefits

Regression discontinuity • When to use this method? • The beneficiaries/non-beneficiaries can be ordered along a quantifiable dimension. • This dimension can be used to compute a well-defined index or parameter. • The index/parameter has a cut-off point for eligibility. • The index value is what drives the assignment of a potential beneficiary to the treatment. (or to non-treatment)

Intuition • The potential beneficiaries (units) just above the cut-off point are very similar to the potential beneficiaries just below the cut-off point. • We compare outcomes for units just above and below the cutoff point.

Indexes are common in targeting of social programs • Anti-poverty programs  targeted to households below a given poverty index • Pension programs  targeted to population above a certain age • Scholarships  targeted to students with high scores on standardized test • CDD Programs  awarded to NGOs that achieve highest scores

Intuition: an hypothetical example • Method: • Construct poverty index from 1 to 100 with pre-intervention characteristics • Households with a score <=50 are poor • Households with a score >50 are non-poor • Implementation: • Cash transfer to poor households • Evaluation: • Measure outcomes (i.e. consumption, school attendance rates) before and after transfer, comparing households just above and below the cut-off point.

Non-poor Poor

Treatment effect

Regression discontinuity design: example • The Effects of User Fee Reductions on School Enrollment (Barrera, Linden y Urquiola, 2006)

The program • Each year the government issues a resolution that stipulates which items schools may charge for, as well as aspects like the maximum fee they can set for each. • These expenses are equivalent to between 7 and 29 monthly dollars, which in turn represent between 6 and 25 percent of the minimum wage. • The Gratuidad program reduces some of these fees. • The program is targeted using the Sisben index that identify the most vulnerable households in Colombia. • Two points of discontinuity: between Sisben 1 and 2; between Sisben 2 and 3

Results

Difference in differences • Estimating the impact of a “past” program • We can try to find a “natural experiment” that allows us to identify the impact of a policy • For example. An unexpected change in policy could be seen as “natural experiment” • For example. A policy that only affects 16 year olds but not 15 year olds • Even in natural experiments, we need to identify which is the group affected by the policy change (“treatment”) and which is the group that is not affected (“control”). • The quality of the control group determines the quality of the evaluation.

Intuition • Find a group that did not receive the program • …with the same pattern of growth in the outcome variable before the intervention • The two groups, treatment and control, have the same profile before the intervention

Intuition: when it is right to use DD? Y Impact of the Program Control Time t = -1 t = 0 t = 1 after program Slopes, before intervention (between -1 and 0) were equal Intervention

Intuition: when it is right to use DD? Y Impact of the Program? NO: wrong comparison group Control Time t = -1 t = 0 t = 1 after program Slopes, before intervention (between -1 and 0) were different Intervention

Example of Difference in differences • Schooling and labor market consequences of school construction in Indonesia: evidence from an unusual policy experiment (E. Duflo) American Economic Review Sept 2001

1973-1978: The Indonesian government built 61,000 school (equivalent to one school per 500 children between 5 and 14 years old) The enrollment rate increased from 69% to 85% between 1973 and 1978 The number of schools built in each region depended on the number of children out of school in those regions in 1972, before the start of the program. Program description

There are 2 sources of variations in the intensity of the program for a given individual: By region: there is variation in the number of schools received in each region By age: Children who were older than 12 years in 1972 did not benefit from the program. The younger a child was 1972, the more it benefited from the program – because she spent more time in the new schools Identification of the treatment effect

Results

Propensity and matching estimation • Take ideal comparison group from a large survey • Each treatment will have a comparison that has observable characteristics as similar as possible to the treated individuals • This method assumes that there is not self-selection based on unobservable characteristics: • Selection is based on observable characteristics

How is this procedure done? Two steps • Estimation of the propensity score: The propensity score is the conditional probability of receiving the treatment given the pre-treatment variables. • Estimate the probability of been treated based on the observable characteristics • Estimation of the average effect of treatment given the propensity score • match cases and controls with exactly the same (estimated) propensity score; • compute the effect of treatment for each value of the (estimated) propensity score • obtain the average of these conditional effects

Data requirements • Randomization: ideally, base line and follow up; data constructed simultaneously with the implementation of the program • RD: data on the assignment variable; already available data • IV: the key information needed is a valid instrumental variable • DD: data over time before and after the program; usually from secondary sources • Propensity and matching: detail data at baseline and follow up • The requirements of data increases with the design of the evaluation: quasi experiments (IV, DD and Propensity) needs large amount of high quality data

When do we apply each method? • Ideally, randomization is first best • Individual / geographic randomization: program is a pilot and not universal • Phase-in randomization: program is universal and implementation is done in steps • RD • Program is targeted using an instrument • DD and Matching: • If everything else fails, and there is good amount of information; program is not universal

Maximizing Impact: Evaluating Programs for Success