Lab Experiments for Measurement in Program Evaluation

Lab Experiments for Measurement in Program Evaluation Michael J. Gilligan, New York University

The Task • Government/NGO/CBO programs wish to change participants attitudes and beliefs in particular ways • Typically these program coach participants in the ‘right’ set of attitudes and beliefs. • Examples • Pro-social behaviors: contributions to public goods, trust, tolerance, non-violence and so on • Attitude and behaviors toward marginalized groups: women minorities, particular ethnic groups • These programs would like to be able to measure whether their efforts have been successful

The Problem • Randomized control trials are essential to be able to make causal statements about the effects of the program • But randomized control trials are not a solution to the measurement problem—indeed they are a hindrance to it. • RCT programmers only operate with ‘treated’ populations so only treated populations receive coaching on the ‘right’ responses • RCTs, the very thing that is insuring unbiasedness with respect to subject pools (balance) is introducing bias in measurement

Social Capital & Pro-social Attitudes

Definition • [S]ocialnetworks and the norms of reciprocity and trustworthiness that arise from them. …[S]ocialcapital is closely related to ... “civic virtue.'' The difference is … civic virtue is more powerful when embedded in a dense network of reciprocal social relations. A society of many virtuous but isolated individuals is not necessarily rich in social capital (Putnam 2000).

We are interested in measuring: • Altruism • Trust • Trustworthiness • Willingness to contribute to public goods • The social networks that (purportedly) support these behaviors

Implications for Development • Trust: crucial for cost-effective self enforcement of contracts • Compliance with social norms: non-violence, compromise, fairness • Contributions to public goods: essential for economic efficiency • Respect for legitimate sources of authority

A Few Findings (among many) • Putnam (1993) shows that local governments in Italy are more efficient where there is greater civic engagement. • Knack and Keefer (1997) demonstrate that increases in country-level trust lead to large increases in the country’s economic growth. • La Porta et. al. (1997) establish a strong positive link between trust and judicial efficiency and a strong negative link between trust and corruption.

Implications • The World Bank and other international actors have many programs to foster social capital and pro-sociality • Community-based DDR • Community-driven development programs • A focus on local capacity in development efforts • “Local ownership” of development programs to foster sustainability

Measuring Social Capital and Social Norms • These are very difficult concepts to measure • In many cases they are not observed directly • Indicators differ greatly across different cultures • People are often unwilling to reveal behavior that is not pro-social

Traditional survey measures • ‘Generally speaking, would you say that most people can be trusted or that you can’t be too careful in dealing with people?’ (World Values Survey) • “Would you be willing to contribute a day of free time to … ?” • How difficult do you think it would be for your community to reach agreement on …?” • In the last three months have you contributed time or money to a community-based organization? • Did you vote in the last election?

Bias concerns with surveys • Programmers coach respondents in the ‘right’ answers to these types of questions • They do not operate in control communities at all so respondents many not even know the ‘right’ answers

Observational Measures • Number of people who voted in the last election • Number of people who show up to clean up a public park • Contributions to a community fund

The measures have great external (real world) validity but … • Are we measuring social attitudes or leadership strength? • … or intimidation? • …or corruption? • Example: Voter turnout in the Soviet Union was routinely above 98 percent • ‘Good’ outcomes may be caused by the exact opposite of good institutions and pro-social attitudes

Structured Observational Measures • ‘Structured Community Activities’ (Casey Glennerster and Miguel) • Funds collected in matching-grant scheme • Decision making over allocating salt or batteries • Allocation of tarpaulin • Tuungame Project, Congo (Humpreys, Sanchez de la Sierra and van der Windt 2013) • Participation in matching funds for a public good • Allocation of a $100 ‘windfall’ • Participation in a community meeting

Structured Observational Measures • Structured and therefore more comparable to each other • Have great external validity … • but we still cannot disentangle individual factors (attitudes) from community-wide factors (leadership, institutions)

Lab-in-the Field Activities • Observing behavior in a controlled laboratory setting • All social pressures, political institutional effects etc., are removed by design of the experiment • We observe only people’s responses to the incentives that we (the experimenters) offer them • We are able to disentangle attitudes from community-wide factors

Loss in External Validity • Community-wide factors (leadership, institutional efficiency) are excluded from the lab so we cannot obtain measures of them • Thus lab activities are best combined with the other measurement methods

Behavioral games • Three important games are: • Altruism game • Trust game • Public goods game • Our main interest is in the altruism, trust and public goods games, but we also need to conduct the other games to control for risk attitudes, patience and altruism

Game Instruction

Altruism Activity • Subjects were given a sum of money • Nepal; 40 NPR in 5 NPR notes • Sudan: 3 pounds in half-pound coins • Cambodia: 16,000 KHR in 4,000 KHR notes • Subjects decide how much they want to contribute to a local needy family • The identity of the family is not revealed

Trust/Trustworthiness Activity • Subjects are randomly assigned to one of two roles: sender or receiver (we use neutral names in the field) • Both types are given initial endowment of money • Senders decide how much of their endowment to send to the receiver • We triple that amount and give it to the receiver • The receiver decides how much of this total to return to the sender • All players and types are anonymous • Nash: send zero, return zero • Social optimum: send full endowment, return whatever is necessary to support trusting behavior

Public Goods Game • All subjects play simultaneously • Each player is given two cards, one with an “X” and one blank • For each “X” card turned in in the first round all players receive an amount of money, say 4NPR • Turning in an “X” card in the second round earns the player that turned it in a larger amount, say 20 NPR

Attitudes Toward Marginalized Groups

Examples • Many programs are interested improving the status of marginalized groups, especially women • Governments/NGOs/CBOs are often interested in easing (often violent) ethnic rivalries, especially in post-conflict settings

Same Problem • RCT programmers only operate with ‘treated’ populations so only treated populations receive coaching on the ‘right’ responses • RCTs, the very thing that is insuring unbiasedness with respect to subject pools (balance) is introducing bias in measurement

A Variety of Options • Standard games (altruism, trust, public goods etc.) can be used to measure attitudes toward ‘out groups’ groups • Bracic 2013 attitudes toward Roma in the former Yugoslavia • Observing behavior of deliberation, cooperation and teamwork among mixed groups • Karpowitz and Mandelberg 2014 deliberation in mixed groups of men and women

Observing group behavior • Bales Interaction Process Analysis • Participants are given a task that requires a group decision or cooperation • Record interactions according to a specific set of criteria to code whatever the researcher is interested in measuring (respect, hostility, etc.) • The trick • Not cuing participants that this is a study of in-group out-group interaction • Incentivizing participants to act according to beliefs about the out-group

Example: Attitudes toward Gender and Ethnicity in the Liberian National Police (LNP) • The government of Liberia adopted an explicit 30% quota for women in the LNP • We did NOT conduct an RCT but we were interested in: testing some of the assumptions of the gender program

Program proponents claimed that more women would produce a variety of benefits • More consensual decision making • Greater sensitivity to gendered crimes • Decades of social psychology findings that women would not participate fully in group deliberations.

The program had been underway for several years so officers new the attitudes toward female officer that they were supposed to have • Thus a survey would not have been a convincing measurement strategy • We had groups of size officers complete team tasks and randomized the number of female officers in each group • We observed team members’ to see • if men reacted differently in groups with more women • Groups with more women deliberated more consensually and were more likely to see crime as gendered

Findings • Female officers were not, in general, more likely to see a gendered crime but more competent women were • Groups with more women members were not more likely to see a gendered crime • Groups with more women were not more consensual • Backlash effect: Men in majority female groups were significantly more aggressive.

Conclusion • Programming by its very nature coaches beneficiaries in giving the types of survey responses answers the program would like to hear • Randomization exacerbates this problem • Behavioral measures are appealing but: • Measures with high external validity can make it hard to disentangle mechanisms at the individual and community level • Fine tuning individual incentives correctly get at attitudes even when subjects are cued to the ‘right’ answer: monetary reward will induce people will act on actually held beliefs rather than the ‘socially correct’ ones • Lab-in-the field activities address both of these issues and provide an important tool for measuring the social effects of programs, at some loss of external validty

Lab Experiments for Measurement in Program Evaluation