IDEV 624 – Monitoring and Evaluation

IDEV 624 – Monitoring and Evaluation Evaluating Program Impact Elke de Buhr, PhD Payson Center for International Development Tulane University

Process vs. Outcome/Impact Monitoring Outcome Impact Monitoring Evaluation Process Monitoring LFM USAID Results Framework

A Public Health Questions Approach to HIV/AIDS M&E Are collective efforts being implemented on a large enough scale to impact the epidemic? (coverage; impact)?Surveys & Surveillance Are we doing them on a large enough scale? Determining Collective Effectiveness OUTCOMES & IMPACTS Are interventions working/making a difference? Outcome Evaluation Studies OUTCOMES Monitoring & Evaluating National Programs Are we doing them right? Are we implementing the program as planned? Outputs Monitoring OUTPUTS What are we doing? Are we doing it right? Process Monitoring & Evaluation, Quality Assessments ACTIVITIES Are we doing the right things? What interventions and resources are needed? Needs, Resource, Response Analysis & Input Monitoring INPUTS Understanding Potential Responses What interventions can work (efficacy & effectiveness)? Efficacy & Effectiveness Studies, Formative & Summative Evaluation, Research Synthesis What are the contributing factors? Determinants Research Problem Identification What is the problem? Situation Analysis & Surveillance (UNAIDS 2008)

Strategic Planning for M&E: Setting Realistic Expectations All Most Some Few* Number of Projects Input/ Output Monitoring Process Evaluation Outcome Monitoring / Evaluation Impact Monitoring / Evaluation Levels of Monitoring & Evaluation Effort *Disease impact monitoring is synonymous with disease surveillance and should be part of all national-level efforts, but cannot be easily linked to specific projects 4

Monitoring Strategy • Process  Activities • Outcome/Impact  Goals and Objectives

Impact Evaluation

Impact Evaluation • Impact evaluations are undertaken to find out whether a program has accomplished its intended effects • Directed at the net effects of an intervention, impact evaluations produce "an estimate of the impact of the intervention uncontaminated by the influence of other processes and events that also may affect the behavior or conditions at which the social program being evaluated is directed” (Rossi/Freeman 1989: 229) • Ideally, impact assessments establish causality by means of a randomized experiment

Outcome vs. Impact • Outcome level: Status of an outcome at some point of time • Outcome change: Difference between outcome levels at different points in time • Impact/program effect: Proportion of an outcome change that can be attributed uniquely to a program as opposed to the influence of some other factor (Rossi/Lipsey/Freeman 2004)

Outcome vs. Impact (cont.) • Impact/program effect: the value added or net gain that would not have occurred without the program and the only part of the outcome for which the program can honestly take credit • Most demanding evaluation task • Time-consuming and costly

(Rossi/Lipsey/Freeman 2004: 207)

Outline of an Impact Evaluation • Unit of analysis • Research question/hypothesis • Evaluation design • Sampling method • Impact indicators • Data analysis plan

1. Unit of Analysis

Unit of Analysis • Unit of analysis: The units on which outcome measures are taken in an impact assessment and, correspondingly, the units on which data are available for analysis • The unit of analysis in impact assessments is determined by • the nature of the intervention and • the targets to which the intervention is directed • Can be individuals, households, neighborhoods, organizations, geographic areas, etc. (Rossi/Lipsey/Freeman 2004)

What are your program’s units of analysis?

2. Research Question/Hypothesis

Hypothesis • Hypothesis: Formal statement that predicts relationship between one or more factors and the problem under study • Support or reject the null hypothesis • Null = no relationship • Test: • Compare same variable over time • Comparison between two or more groups

Can you formulate a null hypothesis for your program?

3. Evaluation Design

Evaluation Designs • Evaluation strategies: • Comparisons over time • Comparison between groups • Research designs: • Pre-test/Post-test designs • Time series • Quasi-experiments • Randomized experiments

Time O1 X O2 Time Time O1 O1 O2 X O3 O2 X X O3 O4 O5 X O4 O6 Comparisons Over Time Pretest/Post-test design Longitudinal designs / Time series

Effect of Intervention? (Fisher, A A and J R Foreit Designing HIV/AIDS Intervention Studies: An Operations Handbook Population Council: May 2002, p.56)

Effect of Intervention? (Fisher and Foreit, p.57)

Effect of Intervention? (Fisher and Foreit, p. 57)

Effect of Intervention? (Fisher and Foreit, p. 58)

Time Time Experimental group Experimental group O1 O1 X X O2 O2 O3 O3 O4 O4 Comparison group Control group Comparisons Between Groups Quasi-experimental design Experimental design R

Randomized Experiments • “Flagships of impact assessment”(Rossi/Lipsey/Freeman 2004: 262) • When conducted well, provide the most credible conclusions about program effects • Isolate the effects of the intervention being evaluated by ensuring that intervention and control group are statistically equivalent except for the intervention received • In practice, it is sufficient if groups, as aggregates, are comparable with regard to any characteristic relevant to the outcome

Randomization • Randomization: Assignment of potential targets to intervention and control groups on the basis of chance so that every unit in a target population has the same probability as any other to be selected for either group • Approximations of randomization: Acceptable if the groups that are being compared do not differ on any characteristic relevant to the intervention or the expected outcomes ( Quasi-experiments) (Rossi/Lipsey/Freeman 2004)

Feasible? • Randomized experiments are not feasible for all impact assessments • Results may be ambiguous if • program in early stages of implementation • interventions change in ways experiments cannot easily capture • In addition, the method may • be perceived as unfair or unethical (requires withholding services from parts of the target population) • be too resource intensive (technical expertise, time, costs, etc.) • cause disruption in program procedures for delivering services, create artificial situation

Quasi-Experimental Designs • Often used when it is not feasible to randomly assign targets to intervention and control groups • Types of quasi-experimental designs: matched controls, statistical controls, reflexive controls, etc. • Threats to validity: Selection bias, secular trends, interfering events, maturation

Threats to Validity

Threats to Internal Validity • INTERNAL VALIDITY: Any changes that are observed in the dependent variable are due to the effect of the independent variable. They are not due to some other independent variables (extraneous variables, alternative explanations, rival hypotheses). The extraneous variables need to be controlled for in order to be sure that any results are due to the treatment and thus the study is internally valid. • Threat of History: Study participants may have had outside learning experiences and enhanced their knowledge on a topic and thus score better when they are assessed after an intervention independent from the impact of the intervention. (No control group) • Threat of Maturation: Study participants may have matured in their ability to understand concepts and developed learning skills over time and thus score better when they are assessed after an intervention independent from the impact of the intervention. (No control group) • Threat of Mortality: Study participants may drop out and do not participate in all measures. Those that drop out are likely to differ from those that continue to participate. (No pretest) • Treat of Testing: Study participants might do better on the posttest compared to the pretest simply because they take the same test a second time. • Threat of Instrumentation: The posttest may have been revised or otherwise modified compared to the pretest and the two test are not comparable anymore. • John Henry Effect: Control group may try extra hard after not becoming part of the “chosen” group (compensatory rivalry). • Resentful Demoralization of Control Group: Opposite of John Henry Effect. Control group may be demoralized and perform below normal after not becoming part of the “chosen” group. • Compensatory Equalization: Control group may feel disadvantaged for not being part of the “chosen” group and receive extra resources to keep everybody happy. This can cloud the effect if the intervention. • Statistical Regression: Threat to validity in cases in which the researcher uses extreme groups as study participants that have been selected based on test scores. Due to the role that chance plays in test scores, the scores of students that score at the bottom of the normal curve are likely to go up, the scores of those that score at the top will go down if they are assessed a second time. • Differential Selection: Experimental and control group differ in its characteristics. This may influence the results. • Selection-Maturation Interaction: Combines the threats to validity described as differential selection and maturation. If experimental and control group differ in important respects, as for example age, differences in achievement might be due to this maturational characteristic rather than the treatment. • Experimental Treatment Diffusion: Close proximity of treatment and control group might result in treatment diffusion. This clouds the effect of the intervention.

Threats to Validity Matrix

Research Designs - Variations A. Simple Designs B. Cross-Sectional Studies C. Longitudinal Studies D. Experimental Designs

A. Simple Designs • One-Shot Case Study • One-Group Pretest-Posttest Design • Time Series Design X O O X O O O O O X O O O O R = Random assignment of subjects to conditions X = Experimental treatment O = Observation of the dependent variable (pretest, posttest, interim measure, etc.)

B. Cross-Sectional Studies Group 1 Group 2 Group 3 Comparison of groups. One point in time. Variations: Case-control study

Case-Control Study Group 1 (with characteristic) Event(s) Group 2 (without characteristic) Comparison of groups. One point in time. Major limitations: Cannot be sure that population has not changed since event(s).

C. Longitudinal Studies Population Population Population Comparison of population over time. Repeated measurements. Variations: Panel study, Cohort study

Panel Study Group 1 Group 1 Group 1 Measures change over time. Repeated data collection from same individuals. Major limitations: High drop-out rates pose threat to internal validity.

Cohort Study Cohort (1) Cohort (2) Cohort (3) Measures change over time. Repeated data collection from same cohort but different individuals. Major limitations: Measures total change but fluctuations within cohort are not assessed.

D. Experimental Designs Experi- ment Group 1 Group 1 Pre-Test Post-Test Group 2 Group 2 Compares group(s) exposed to treatment with group not exposed to treatment. Measures at two points of time. Variations: True experimental design, Quasi-experimental design

True Experimental Design Experi- ment Group 1 Group 1 Target population Pre-Test Post-Test Group 2 Group 2 Groups assigned randomly. Compares group(s) exposed to treatment with group not exposed to treatment. Measures at two points of time.Research subjects are assigned randomly to treatment and control group. Major limitations: Not feasible for all research & ethical problems.

True Experimental Designs • True experimental designs use control groups and random assignment of participants Variations: • Pretest-Posttest Control Group Design • Posttest-Only Control Group Design • Single-Factor Multiple Treatment Designs • Solomon 4 – Group Design • Factorial Design

Pretest-Posttest Control Group Design • The randomly assigned experimental group receives the treatment and the control group receives no treatment or an alternative treatment R O X O R O O

Posttest-Only Control Group Design • Like previous but without pretest. R X O R O

Single-Factor Multiple Treatment Designs • Extension of Pretest-Posttest Control Group Design • Sample is assigned randomly to one of several conditions R O X1 O R O X2 O R O O

Solomon 4 – Group Design • Developed by researchers that worried about the effect of pretesting on the validity of the results. R O X O R O O R X O R O

Factorial Design Three Independent Variables A B C A x B A x C B x C A x B x C • Allows to include more than one independent variable. • Test for the effects of different kinds of variables that might be expected to influence outcomes (gender, age, etc.). Two Independent Variables A B A x B

Quasi-Experimental Design Experi- ment Group 1 Group 1 Target population Pre-Test Post-Test Group 2 Group 2 Groups not assigned randomly. Compares group(s) exposed to treatment with group not exposed to treatment. Measures at two points of time.Random assignment not possible. Major limitations: Not a true experiment. Threats to validity. ( Selection bias)

Quasi-experimental designs lack the random assignment of experimental designs. Variations: Static-Group Comparison Design Nonequivalent Control Group Design Quasi-Experimental Designs X O ------------- O O X O ------------- O O

Choosing an Evaluation Design

IDEV 624 – Monitoring and Evaluation