Program Evaluation: Current Status and Future Directions

Program Evaluation: Current Status and Future Directions Huey T. Chen, PhD Professor, Department of Public Health Director, Center for Evaluation and Applied Research Mercer University Atlanta, Georgia, USA E-mail: chen_h@mercer.edu

Overview • Current status of program evaluation • Campbellian validity typology • Program theory and theory-driven evaluation • Integrated evaluation perspective • Chinese philosophy, politics, and evaluation

Evaluation Examples in Ancient China • God of Agriculture • Confucius

What is Program Evaluation? Systematically gather empirical information of an intervention program on what, who, how, and why questions for assessing a program’s planning, implementation, and/or outcomes in order to serve stakeholders’ program accountability and/or improvement purposes. Questions? Intervention programs: Education, public administration, public health, criminal justice, welfare, job safety, air and water pollution, and so on. Stakeholders:

Example of an Intervention Program: A NGO-Based HIV Prevention Program in South China Background: HIV prevalence was high in southern boarder provinces. Injection drug users lacked knowledge and skills in preventing HIV and did not know their HIV status Intervention: HIV counseling and testing Target population: Injection drug users (IDUs) Implementing organization: Wuzhou Women’s Foundation, 2006-2007 Goals: Reducing IDUs’ needle/syringe sharing and high risk sexual behaviors

Major Issues in Outcome Evaluation Campbellian validity typology Internal validity: Does an intervention affect the outcomes? External validity: Are the effects generalizable? Trade-off between internal and external validity Prime priority of internal validity in research or evaluation Threats to internal validity: Contaminations, rival hypotheses, confounded factors

Threats to Internal Validity in Research Designs Pre-experimental designs One-group pretest and posttest design O1 X O2 Threats to internal validity • Maturation • Alternative historical events • Regression toward the mean (participants are extremer) • Testing • Instrumentation • Attrition

Threats to Internal Validity in Research Designs Quasi-Experimental Designs: • Nonequivalent comparison group design Intervention G: O1 X O2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Comparison G: O1 O2 *Threats to internal validity: Self-selection bias • Interrupted time-series design O OOOOO X O OOOOO *Threats to internal validity: alternative historical events

Threats to Internal Validity in Research Designs Randomized controlled trials (RCTs) R Intervention G: O1 X O2 R Control G: O1 O2 R: Random assignment O: Observation X: Intervention * Maximizing internal validity

Campbellian Validity Typology Rigor in research designs • Best designs: RCTs or other experiments (pure effects) • Next to the best: Quasi-experiments (some contaminations) • Weak designs: Pre-experiments (many contaminations) Evidence-based interventions Interventions with evidence of RCTs. The popularity of the typology • The typology has been introduced in every evaluation text book.

Development of Program Theory and Theory-Driven Evaluation Background: Black-box evaluation Intervention Outcome Method-Driven Evaluation No conceptual framework is needed in doing evaluation Limitations Provide no information on how and why Provide no insightful information for understanding or improving a program

Program Theory and Theory-Driven Evaluation Chen (2005): A set of stakeholders’ Implicit and explicit assumptions on what actions are required to solve a problem and why the problem will respond to the actions. Theory-driven evaluation: Using program theory as a conceptual framework to guide evaluation design and practice

Illustrate the importance of program theory In order to encourage police officers to actively patrol streets and fight crime, the police chief of a major city announced a new policy using mileage shown on odometers to measure police officers’ performance. His change model: New Policy Increase Patrols (as measured by odometers) Reduce Crimes

Action Model PROGRAM THEORY Associate organizations and community partners Intervention and service delivery protocols Implementing organizations Ecological context Target populations Implementers Change Model Intervention Determinants Outcomes

Example of Assessing a Change Model Petrified Forest National Park’ Preservation Program Placed signs to make tourists aware of preservation efforts of keeping the Petrified Forest intact: “Your heritage is being vandalized everyday by theft losses of petrified wood of 14 tons a year, mostly a small piece at a time” Signs Increased Enhancing Reducing awareness tourists’ theft of of the wide-spread morality for petrified problem protecting woods heritage

Example of Assessing an Action Model Evaluating a School-Based Anti-Drug Abuse Program in Taiwan Drug abuse among middle school students had worsened The Ministry of Education launched a national anti-drug abuse program to deal with the problem Teachers were trained to identify students abusing drugs and provide counseling Schools were required to file monthly reports on the number of active drug abusing students to the ministry

# of active cases *3850 *1625 *501 *440 *374 *260 *353 *55 1 2 3 yr

Theory-Driven Process EvaluationApplication Procedures • Conducted working group meetings with key officials at the Ministry of Education to develop an action model • Conducted working group meetings with representatives of teachers to develop their version of the action model • Combined both groups (key officials, teachers) to create a new version for feedback • Used mixed methods (site visits, survey, participant observation, focus group meetings, interviews, record checking) to collect implementation data

Action Model as Planned vs. as Implemented

Action Model (cont)

# of active cases *3850 *1625 *501 *440 *374 *260 *353 *55 1 2 3 yr

Developments of the Integrated Evaluation Perspective Background: Growing recognition of the Campbellian validity typology’s limitations in the context of evaluation • The majority of evaluators are not able to apply RCTs in their evaluations. (Implications?) • Stakeholders feel evidence-based interventions are not useful to them. (Implications?)

Are threats to internal validity a “curse” or a “blessing” for evaluation? A curse: If evaluators continue to follow the Campbellian validity typology, it is a curse. A blessing? Could evaluators use them for their advantages?

Lessons Learned from a Zumba Class Zumba classes for weight loss • 36% of American adults and 33% of children are obese. • Zumba classes as an intervention for weight loss: • An hour-long intensive exercise • Zumba is a Latin-inspired dance-fitness program that blends international music such as sosa, Bollywood, hip-hop, belly dance, etc.

Lessons Learned from a Zumba Class Threats to internal validity are essential to the instructor and me Threats to Internal validity • Self-selection bias • Testing • Instrumentation • Maturation • Alternative historical events Intended and unintended outcomes

Paradox on Threats to Internal Validity Campbellian validity typology: Evaluation must rule all of them off in order to have credible evidence. Practitioners and clients: The more threats of internal validity are ruled out in an evaluation, the less the evaluation is relevant or useful to them. A new evaluation perspective is needed to reconcile the differences.

Myths in Program Evaluation Myth: If an intervention with controlled effects, the intervention will have real-world effects. Myth: The priority of evaluation is to assess an intervention’s controlled effects. Myth: The more threats to internal validity are ruled out, the better the quality of evaluation

Integrated Evaluation Perspective Distinction between controlled effects vs. real-world effects • Controlled effects: Effects found through manipulating and controlling research conditions -They are pure effect, useful for establishing a causal relationship between an intervention and outcome. -But the effects may be artificial and irrelevant to real life. • Real-world effects: Effects found in the real world through comparing with a previous situation or a comparison group -The effects are likely to have some contaminations. -But they are real-world effects.

Limitations of Controlled Effects National Cooperative Inner-City Asthma Study(Controlled efffects) • Trained master’s level social workers to provide asthma counseling to families ……. • The efficacy study was evaluated by a RCT characterized by highly committed counselors, motivated participants, monetary and childcare incentives, regular counseling hours, food/refreshments. etc • The RCT indicated the intervention was efficacious in reducing asthma morbidity among inner-city children (Evans et al. 1999)

Implementation Difficulties (continued) Inner-city Asthma Intervention: A Real-World Implementation of the Social Worker Efficacy Study (Spiegel et al. 2006): • Unable to provide monetary and childcare incentives to families • Unable to contact and meet with families frequently • Difficult to hold sessions in regular hours • Unable to provide food/refreshments to create an enjoyable atmosphere • Difficult to retain social workers • Only 25% of children completed the entire intervention

Integrated Evaluation Perspective • Indicates “real-world effects” are as important of “controlled effects” • Proposes viability evaluation to assess real-world effects to meet stakeholders’ needs • Proposes and the bottom-up approach to systematically evaluate an intervention

Dissemination Top-Down Approach Efficacy Evaluation Efficacy evaluation Effectiveness Evaluation Effectiveness evaluation Viability evaluation Dissemination Bottom-Up Approach ***Solid arrows indicate sequence, dashed arrows indicate feedback

Viability Evaluation: A New Type of Real-world Evaluation Assessing an intervention’s: • Practicality • Suitability • Affordability • Evaluability • Helpfulness: Real-world effects (quantitative and qualitative evidence)

Viability Evaluation: A New Type of Real-world Evaluation • Focus: Is the intervention viable in the real world? • Evaluation questions: Is the intervention practical, suitable, affordable, and evaluable, and helpful? • Research design: Mixed methods Qualitative: Using interviews and/or focus groups to understand stakeholders’ experiences with the intervention Quantitative: one group pretest and posttest design for assessing helpfulness

Example of Viability Evaluation Wuzhou HIV prevention program Suitability: Women’s Federationwas able to manage the project well.They recruited 226 IDUs in four months. Practicality: WF staff was capable of delivering the services well, but the protocol required slight modifications. Affordability: HIV testing kits Evaluability: Procedures and outcome were well-established and measurable.

Viability Evaluation (continued) Helpfulness: Pretest-posttest data showed IDUs substantially: Increasing condom use reducing needle/syringes sharing Qualitative data showed: Clients were satisfied with the services and expressed their reduction of HIV risk Evaluation findings: • Viability: High • Transferability: Strong leadership requirement

Dissemination Top-Down Approach Efficacy Evaluation Efficacy evaluation Effectiveness Evaluation Effectiveness evaluation Viability evaluation Dissemination Bottom-Up Approach ***Solid arrows indicate sequence, dashed arrows indicate feedback

Chinese Philosophy, Politics, and Evaluation in Chinese Society • Chinese philosophy, moral courage, and intervention • Moral courage and evaluation • What kinds of political contexts or moral courage disfavor or favor in conducting evaluation?

Conclusions • The state of the art of program evaluation: Has be steadily growing, but is beginning a paradigm shift. • Theory-driven evaluation and the integrated evaluation perspective will play a key role in the new development. • Program evaluation is useful for China’s efforts in further improving people’s well-being.

Questions and Answers

Program Evaluation: Current Status and Future Directions