590 likes | 1.06k Vues
Qualitative and Mixed Methods in Poverty Research and Evaluation Michael Woolcock Development Research Group World Bank Poverty & Inequality Analysis: Module 3 Washington, 4-5 February 2008 Overview 1. Session One : The Value and ‘Value-Added’ of Qualitative Approaches
E N D
Qualitative and Mixed Methods in Poverty Research and Evaluation Michael Woolcock Development Research Group World Bank Poverty & Inequality Analysis: Module 3 Washington, 4-5 February 2008
Overview 1. Session One: The Value and ‘Value-Added’ of Qualitative Approaches • Ten reasons to use qualitative approaches 2. Session Two: Overview of Qualitative Methods and Data • Focus Groups, Key Informant Interviews, PRA/RRA, Texts • Comparative Methods, Case Studies, Process Tracing • Qualitative methods and ‘causality’ 3. Session Three: The Evaluation Challenge Revisited • Letting questions drive choice of methods (not vice versa) • Distinguishing between methods and data • Types of integration: Parallel, Sequential, Iterative 4. Session Four: Applications of Qualitative and Mixed Methods • Small, Quick, Dirty, but Expedient: St. Lucia, Colombia • Country Poverty Assessments: Guatemala • Project Evaluation: Local conflict and KDP in Indonesia • Operational Research: Justice for the Poor
1. Ten Reasons to Use Qualitative Approaches in Projects and Evaluation • Understanding Political, Social Change • ‘Process’ often as important as ‘product’ • Modernization of rules, relations, meaning • Examining Dynamics (not just ‘Demographics’) of Group Membership • How are boundaries defined, determined? How are leaders determined? • Accessing Sensitive Issues and Stigmatized/Marginalized Groups • E.g., conflict and corruption; sex workers
1. Ten Reasons to Use Qualitative Approaches in Projects and Evaluation • Explaining Context Idiosyncrasies • Beyond “context matters” to understanding how and why, at different units of analysis • ‘Contexts’ not merely “out there” but “in here”; the Bank produces legible contexts • Unpacking Understandings of Concepts and (‘Fixed’) Categories • Surveys assume everyone understands questions and categories the same way; do they? • Qualitative methods can be used to correct and/or complement orthodox surveys
1. Ten Reasons to Use Qualitative Approaches in Projects and Evaluation • Facilitating Researcher-Respondent Interaction • Enhance two-way flow of information • Cross-checking; providing feedback • Exploring Alternative Approaches to Understanding ‘Causality’ • Econometrics: robustness tests on large N datasets; controlling for various contending factors • History: single/rare event processes • Anthropology: deep knowledge of contexts • Exploring inductive approaches • Cf. ER doctors, courtroom lawyers, solving jigsaws
1. Ten Reasons to Use Qualitative Approaches in Projects and Evaluation • Observing ‘Unobservables’ • Project impact not just a function of easily measured factors; unobserved factors—such as motivation, political ties—also important • Exploring Characteristics of ‘Outliers’ • Not necessarily ‘noise’ or ‘exceptional’; can be high instructive (cf. illness informs health) • Resolving Apparent Anomalies • Nice when inter and intra method results align, but sometimes they don’t; who/which is ‘right’?
Don’t let strengths become weaknesses! • True in life, true in research… • Qualitative methods have particular comparative advantage, but so do quantitative approaches • The art of research is knowing how to work within time, budgetary and human resource constraints to answer interesting important questions, drawing on an optimal ‘package’ of available data and methods
2. Types of Qualitative Methods Micro level: • Ethnography • Focus group discussions • ‘Invented’ by Robert Merton (1910-2003) at Columbia • Solicit opinions from very diverse (politics) or very similar (marketing) groups in real time • Quick and dirty, but used extensively to make major decisions • Easily abused; requires skilled facilitator to be done well • In development, especially useful with illiterates • Key informant interviews • Accessing marginalized groups • Sex workers, victims of police brutality, the homeless • Often use ‘snow-ball’ sampling… • Learning from leaders (some more equal than others…) • Political, military, elders, opinion-shapers (“The Influentials”)
2. Qualitative Methods (cont.) • Various forms of participant observation • Pure observer (journalism) • Participant as observer (anthropology) • Observer as participant (‘go native’) • Pure participant (‘spy’) • Textual Analysis • Analysis of legal documents, media, films, literature, diaries, official reports, etc • ‘The Anti-Politics Machine’, James Ferguson, Tim Mitchell • History and politics of knowledge (Cooper & Packard) • Participatory Approaches • RRA, PPA (“instrumental”) • PRA (“transformative”)
2. Qualitative Methods (cont.) ‘Meso’ and macro level: • Comparative Methods and Case Studies • What is a ‘case’? What is this a case of? • Commonly used in political science, history • Explaining rare, one-off events (e.g., revolutions) • Identifying necessary and sufficient conditions that lead to certain outcomes rather than others (e.g., why some institutions are more equitable than others) • Example: James Mahoney on Central America • Process-tracing • Working backwards from outcomes to discern ‘causes’ • Selection of cases is obviously crucial • Examples: Theda Skocpol on social revolutions, Patrick Heller on Kerala, Ashutosh Varshney on ethnic conflict • Analytic Narratives (Bates et al) • Game theory meets political science: institutional evolution
3. The Evaluation Challenge Revisited: Linking Theory and Methods in the Assessment of J4P • Three challenges: • Allocating development resources • Assessing project effectiveness (in general) • Assessing J4P effectiveness (in particular) • Discussion of options, strategies for assessing J4P pilots
Overview • Three challenges: • Allocating development resources • Assessing project effectiveness (in general) • Assessing J4P effectiveness (in particular) • Discussion of options, strategies for assessing J4P pilots
Three challenges • How to allocate development resources? • How to assess project effectiveness in general? • How to assess social development projects (such as ‘Justice for the Poor’) in particular?
1. Allocating development resources • How to allocate finite resources to projects believed likely to have a positive development impact? • Allocations made for good and bad reasons, only a part of which is ‘evidence-based’, but most of which is ‘theory-based’, i.e., done because of an implicit (if not explicit) belief that Intervention A will ‘cause’ Impact B in Place C net of Factors D and E for Reasons F and G. • E.g., micro-credit will raise the income of villagers in Flores, independently of their education and wealth, because it enhances their capacity to respond to shocks (floods, illness) and enables larger-scale investment in productive assets (seeds, fertilizer)
1. Allocating development resources • Imperatives of the prevailing resource allocation mechanisms (e.g., those of the World Bank) strongly favor one-size-fits-all policy solutions (despite protestations to the contrary!) that deliver predictable, readily-measurable results in a short time frame • Roads, electrification, immunization • Want project impacts to be independent of context, scale, and time so that ‘successful’ examples (‘best practices’) can be scaled up and replicated • Projects that diverge from this structure—e.g., J4P—enter the resource allocation game at a distinct disadvantage. But the obligation to demonstrate impact (rightly) remains; just need to enter the fray well armed, empirically and strategically…
2. How to Assess Project Effectiveness? • Need to disentangle the effect of a given intervention over and above other factors occurring simultaneously • Distinguishing between the ‘signal’ and ‘noise’ • Is my job creation program reducing unemployment, or is it just the booming economy? • Furthermore, an intervention itself may have many components • TTLs are most immediately concerned about which aspect is the most important, or the binding constraint • (Important as this is, it is not the same thing as assessing impact) • Need to be able to make defensible causal claims about project efficacy even (especially) when the apparent ‘rigor’ of econometric methods aren’t suitable/available • Thus need to change both the terms and content of debate
Impact Evaluation 101 • Core evaluation challenge: • Disentangling effects of people, place, and project (or policy) from what would have happened otherwise • i.e., need a counterfactual (but this is rarely observed) • ‘Tin’ standard • Beneficiary assessments, administrative checks • ‘Silver’ • Double difference: before/after, program/control • ‘Gold’ • Randomized allocation, natural experiments
Impact Evaluation 101 • Core evaluation challenge: • Disentangling effects of people, place, and project (or policy) from what would have happened otherwise • i.e., need a counterfactual (but this is rarely observed) • ‘Tin’ standard • Beneficiary assessments, administrative checks • ‘Silver’ • Double difference: before/after, program/control • ‘Gold’ • Randomized allocation, natural experiments • (‘Diamond’?) • Randomized, triple-blind, placebo-controlled, cross-over • Alchemy? • Making ‘gold’ with what you have, given prevailing constraints (people, money, time, logistics, politics)…
We observe an outcome indicator… Intervention
…and its value rises after the program Intervention
However, we need to identify the counterfactual (i.e., what would have happened otherwise)… Intervention
… since only then can we determine the impact of the intervention
Problems when evaluation is not built in ex-ante (i.e., from the outset) • Need a reliable comparison group • Before/After: Other things may happen • Units with/without the policy • May be different for other reasons than the policy • e.g., because program is placed in specific areas, for development (targeting the poor) or political (buying favors) reasons
How can we fill in the missing data on the counterfactual? • Randomization • Quasi Experiment: • Matching • Propensity-score matching • Difference-in-difference • Matched double difference • Regression discontinuity design • Instrumental variables • Comparison group designs • Designs pairing jurisdictions • Lagged start designs • Naturally occurring comparison group
1. Randomization“Randomized out” group reveals counterfactual • Only a random sample participates • As long as the assignment is genuinely random, impact is revealed in expectation • Randomization is the theoretical ideal, and the benchmark for non-experimental methods. Identification issues are more transparent compared with other evaluation technique. • But there are problems in practice: • Internal validity: selective non-compliance • External validity: difficult to extrapolate results from a pilot experiment to the whole population
An example from Mexico • Progresa: Grants to poor families (women), conditional on preventive health care and school attendance for children • Mexican government wanted an evaluation; order of community phase-in was random • Results: child illness down 23%; height increased 1-4cm; 3.4% increase in enrollment • After evaluation: PROGRESA expanded within Mexico, similar programs adopted throughout other Latin American countries
An example from Kenya • School-based de-worming: treat with a single pill every 6 months at a cost of 49 cents per student per year • 27% of treated students had moderate-to-heavy infection, 52% of comparison • Treatment reduced school absenteeism by 25%, or 7 percentage points • Costs only $3 per additional year of school participation
2. MatchingMatched comparators identify counterfactual • Propensity-score matching: • Match on the basis of the probability of participation • Match participants to non-participants from a larger survey • The matches are chosen on the basis of similarities in observed characteristics • This assumes no selection bias based on unobservable heterogeneity (i.e., things that are not readily ‘measurable’ by orthodox surveys, such as ‘motivation’, ‘connections’) • Validity of matching methods depends heavily on data quality
3. Difference-in-difference (double difference) Observed changes over time for non-participants provides the counterfactual for participants • Collect baseline data on non-participants and (probable) participants before the program. • Compare with data after the program. • Subtract the two differences, or use a regression with a dummy variable for participant. • This allows for selection bias but it must be time-invariant and additive.
The Challenge of Assessing SD Projects • You’re a star in development if you devise a “best practice” and a “tool kit”—i.e., a universal, easy-to-administer solution to a common problem • There are certain problems for which finding such a universal solution is both desirable and possible (e.g., TB, roads for high rainfall environments)… • But many key problems, such as those pertaining to local governance and law reform (e.g., J4P), inherently require context-specific solutions that are heavily dependent on negotiation and teamwork, not a technology (pills, bridges, seeds) • Not clear that if such a project works ‘here’ that it will also work ‘there’, or that ‘bigger’ will be ‘better’ • Assessing such complex projects is enormously difficult
Why are ‘complex’ interventions so hard to evaluate? A simple example • You are the inventor of ‘BrightSmile’, a new toothpaste that you are sure makes teeth whiter and reduces cavities without any harmful side effects. How would you ‘prove’ this to public health officials and (say) Colgate?
Why are ‘complex’ interventions so hard to evaluate? A simple example • You are the inventor of ‘BrightSmile’, a new toothpaste that you are sure makes teeth whiter and reduces cavities without any harmful side effects. How would you ‘prove’ this to public health officials and (say) Colgate? • Hopefully (!), you would be able to: • Randomly assign participants to a ‘treatment’ and ‘control’ group (and then have then switch after a certain period); make sure both groups brushed the same way, with the same frequency, using the same amount of paste and the same type of brush; ensure nobody (except an administrator, who did not do the data analysis) knew who was in which group
Demonstrating ‘impact’ of BrightSmile vs. SD projects • Enormously difficult—methodologically, logistically and empirically—to formally identify ‘impact’; equally problematic to draw general ‘policy implications’, especially for other countries • Prototypical “complex” CDD/J4P project: • Open project menu: unconstrained content of intervention • Highly participatory: communities control resources and decision-making • Decentralized: local providers and communities given high degree of discretion in implementation • Emphasis on building capabilities and the capacity for collective action • Context-specific; project is (in principle) designed to respond to and reflect local cultural realities • Project’s impact may be ‘non-additive’ (e.g., stepwise, exponential, high initially then tapering off…)
How does J4P work over time?(or, what is its ‘functional form’?) ‘Governance’? CCTs? Impact Impact A B Time Time Bridges? ‘AIDS awareness’? Impact Impact C D Time Time
How does J4P work over time?(or, what is its ‘functional form’?) Unintended consequences? Shocks? (‘Impulse response function’) Impact Impact E F Time Time ‘Pest control’? e.g., cane toads ‘Empowerment’? Impact Impact G H Time Time
How does J4P work over time?(or, what is its ‘functional form’?) ? Impact Impact Unknown… Unknowable? J I Time Time
Science, Complexity, and Evaluation Lo Many Narrow Wide
So, what can we do when… • Inputs are variables (not constants)? • Facilitation/participation vs. tax cuts (seeds, pills, etc) • Teaching vs. text books • Therapy vs. medicine • Adapting to context is an explicit, desirable feature? • Each context/project nexus is thus idiosyncratic • Outcomes are inherently hard to define and measure? • E.g., empowerment, collective action, conflict mediation, social capital
3. Linking Questions, Methodologies, Methods, Data • Questions should drive choice of methods and measurement tools (not vice versa) • Social science data is always partial, an imperfect reflection of a more complex underlying reality • Data can be manipulated for political purposes • Some (very important) things cannot be measured—love, identity, meaning • “Not everything that can be counted, counts” • “It’s better to be vaguely right than precisely wrong” • “Triangulation”—integrating more abundant, more diverse, and higher-quality evidence
Begin with interesting and important questions • “The most important questions of method begin where the standard techniques do not apply” (C Wright Mills) • Finding answers may require single or multiple methods and data forms—need to be a good detective • But difficult to do when one has invested many years in mastering difficult techniques—“Everything looks like a nail when all you have is a hammer” • Methodologies as the particular combination and sequence of methods used to answer the question(s) • Methods can be qualitative and/or quantitative • Data can also be qualitative and/or quantitative
Qual/quan disputes often stem from… • Conflating methods and data • Mismatches between question, methods and data • Assumptions that different “standards” apply • Qualitative approaches seen as • inductive, valid, subjective, process (‘how’), generating ideas • Quantitative approaches seen as • deductive, reliable, objective, effects (‘whether’), testing ideas • Not necessarily… • Integrating qual and quan approaches to… • Complement strengths, compensate weaknesses • Address problems of missing/inadequate data • Observing the unobservable
Types of Mixed Methods • Pure Qualitative: ‘Think quan, act qual’ • Parallel: Quan and qual done separately • Sequential: Quan follows qual • Iterative: Quan and qual in constant dialogue • (Pure Quantitative) qual quan qual quan qual quan
Forms and sources of data • Quantitative (“numbers”) • Household and other surveys (e.g., census, LSMS) • Opinion polls (e.g., Gallup, marketing research) • Data from official files (e.g., membership lists, government reports) • Indexes created from multiple sources (e.g., “governance”) • Qualitative (“texts”) • Historical records, political reports, letters, legal documents • Media (print, radio, and television) • Open-ended responses to survey questions • Observation (ethnography) • Interviews—key informants, focus groups • Participatory approaches—PRA, etc • Comparative (“cases”) • ‘Rare’, ‘small-N’ historical events (e.g. wars, economic crises)
Types of methods • Quantitative • Statistical analysis • Hypothesis testing (deductive) • Qualitative • Emergent themes • Generates propositions (inductive) • Software available: e.g., N6 (reduces ‘small N’ problem) • Comparative (“cases”) • Differences among otherwise similar cases • Commonalities among otherwise different cases • Common strategy in history; used to try to explain ‘causality’ • “The goal is not to show which approach is best, but rather to generate dialogue between ideas and evidence” (Ragin)
Types of Data and Methods Quan Subjective Welfare Standard Survey Qual Quan Data Ethnography PRA Quantitative Anthropology Small-N Matched Comparisons Qual Methods Adapted from Hentschel, 1999
Mixed Methods can be… • Difficult… • Technically • training occurs largely exclusively within disciplines • Administratively • finding, coordinating willing and able staff • different agendas, expectations, • institutional imperatives for “straightforward” policy recommendations • Professionally • No “natural constituency” to provide financial or moral support, or detailed intellectual critique • …Time consuming… • …But enormously rewarding! • More, better data facilitates better theory, better policy • Bridging otherwise separate disciplines, sectors • Development as “bio-technology” (multiple agents and agencies of expertise) not “math” (lone genius) problem
Parallel Qual/Quan Teams work separately Best suited to large (e.g. country level) assessments (GUAPA) Quantitative Large household survey Qualitative In-depth work with selected groups Data analyzed separately, integrated as part of write-up and conclusions 4. Integrating Qualitative and Quantitative Approaches
Sequential Qual/Quan (the ‘classical’ approach) Qualitative Use PRA, focus groups, etc to get a grounded understanding of key issues Quantitative Use this material to design a survey instrument Use the survey to test hypotheses that emerged from the qualitative work Examples Survival and mobility in Delhi slums (Jha, Rao and Woolcock, 2007) Evaluating Jamaica Social Investment Fund (Rao and Ibanez, 2002) 4. Integrating Qualitative and Quantitative Approaches
Iterative Qual/Quan (‘Bayesian’ approach) Ongoing dialogue between Qual and Quan Qualitative As above: used to generate initial hypotheses, establish validity of questions Quantitative Hypotheses tested with household survey Return to the field; cycle repeats Example: Potters in India (Rao, 2000) Initial study of marriage markets lead to study of domestic violence, and another on unit price differentials/inequality 4. Integrating Qualitative and Quantitative Approaches
Other uses for Mixed Methods • When existing time and resources prelude doing or using formal survey/census data • Examples: St Lucia and Colombia • When it’s unclear what “intervention” might be responsible for observed outcomes • That is, no clear ex ante hypotheses; working inductively from matched comparison cases • Examples: • Putnam (1993) on regional governance in Italy • Mahoney (2001) on governance in Central America • Collins (2001) on “good to great” US companies • Varshney (2002) on sources of ethnic violence in India