Respondent-Driven Sampling

Respondent-Driven Sampling Carl Kendall, Ph.D. Professor of International Health and Development Tulane University New Orleans, LA, USA FIOCRUZ, July 29, 2006

RDS was developed by: Douglas D. Heckathorn, Ph.D. Professor of Sociology, Cornell University Web Page: www.soc.cornell.edu/faculty/heckathorn.shtml Through support from the Centers for Disease Control and Prevention, the National Institute on Drug Abuse, and the National Endowment for the Arts.

Lecture in two parts • Introduction, theory behind RDS • Brief description of what the PN has been doing in Brazil with RDS

Why? • HIV epidemic is multiple epidemics, taking place in multiple sub-populations • Many of these sub-populations are “hidden” • To understand epidemic dynamics, need to know what is going on in these networks of individuals at enhanced risk because of their behavior and their environment • In order to develop effective interventions for specific populations

Several basic methods have dominated the study of hidden populations. • Institutional sampling—May provide easy access to numerous high-risk individuals, but institutions draw the sample. • Time/Space sampling--Sampling frame combines places and times where the population gathers, and samples are weighted to compensate for variations in population density. Best suited for populations clustered in large public venues. • Targeted sampling—Simplified form of time/space sampling where the space is the street, no weights are used, and in most implementations snowball methods are also used. • Chain-referral sampling—Recruitment through networks reaches respondents who avoid public venues and institutions. Until recently, these have been considered convenience samples. • For a comparison and assessment see: “Street and Network Sampling in Evaluation Studies of HIV Risk-Reduction Interventions” by Salaam Semaan, Jennifer Lauby and Jon Liebman, AIDS Review2002

Respondent Driven Sampling • Chain referral sampling characterized by long referral chains and a statistical theory of the sampling process which controls for bias including effects of choice of seeds, and differences in network size. (Heckathorn 1997, 2002) • RDS also serves as the recruitment mechanism for a form of HIV-prevention intervention, termed a Peer-Driven Intervention (PDI), also know as the “ECHO Model” (Broadhead and Heckathorn 1994, Heckathorn et al, 1999)

Classic Statement on Probability versus Convenience Samples “The major strength of probability sampling is that the probability selection mechanism permits the development of statistical theory to examine the properties of sample estimators. Thus, estimators with little or no bias can be used, and estimates of the precision of sample estimates can be made.” In contrast, the precision of estimators from nonprobability samples can be assessed only by “subjective evaluation.” Kalton (1983) Implication: Making chain-referral sampling a form of probability sampling requires a statistical theory of the sampling process. This is part of a new class of sampling methods termed adaptive/link-tracing designs (Thompson and Frank 2000)

Can chain-referral sampling be a reliable method even though seeds from a hidden population cannot be selected randomly?Referral patterns reflect a self-affiliation bias:

W H O B 36% 45% 81% A Statistical Theory: Recruitment as a Markov Chain,W=white, B=black, H=Hispanic, O=other 8% 10% 5% 43% 2% 6% 50% 63% 63% 14% 38% Recruitment can be seen as a stochastic process that moves from node to node governed by the probabilities associated with the arrows.

Two theorems regarding regular Markov chains are relevant to an understanding of RDS: • THEOREM ONE: The "law of large numbers for regular Markov chains" (Kemeny and Snell 1960) states that the probability that a system will be in any given state over the course of a large number of steps is independent of its starting state. Implication: As the sample expands wave by wave the composition of the sample becomes stable, reaching what is termed an “equilibrium,” so bias from the seeds disappears if the number of waves is large enough. • THEOREM TWO: The equilibrium is attained at a geometric (i.e., rapid) rate. Implication: Only a moderate number of waves of recruitment are required for the subject composition to reach equilibrium (usually only 4 to 6).

Hispanic Seeds White Seeds 70% 70% 17% 17% Simulations of Recruitment in a Respondent-Driven Sample Based on the Recruitment Matrix: Race and ethnicity of recruits in each wave, beginning with all Hispanic or all non-Hispanic white seeds.

Therefore, referral chains should be longThe operational details are: • Recruiters are rewarded • Coupons with unique serial numbers document and ration recruitment • Coupons are dollar-bill sized, printed on medium card stock using PowerPoint • Respondents call for appointments, bring their coupons to the interview, and are later given three more for recruiting peers

Debriefing and Reward for Peer Recruitment Implementation Stages in the Operation of RDS Recruitment of Seeds Recruitment by a Peer in the Community Interview/HIV Testing and Counseling in the Interview Site Recruitment of Peers in the Community After the seeds begin recruiting, the recruitment cycle continues until the sampling goal is reached

Requirements • Four absolute requirements: • Who recruited whom • Recruiters and recruits must know one another • Ration recruitment so a few cannot do all the recruiting • Ask about personal network sizes

Using Respondent-Driven Sampling to Study Spatial Networks (Using Zip Code Level Data) Network structure is revealed by successive waves of peer recruitment. The beginning point for one recruitment network, Seed #4, a black female bass player, is marked by the red pin near Times Square.

Wave 1, Seed 4 Douglas Heckathorn, Cornell University, 2003

All Waves, All Seeds Douglas Heckathorn, Cornell University, 2003

Panning Out…. Douglas Heckathorn, Cornell University, 2003

All Waves, All Seeds Douglas Heckathorn, Cornell University, 2003

All Waves, All Seeds San Francisco RDS Douglas Heckathorn, Cornell University, 2003

Can RDS be a valid method despite: • Differences in network sizes, (more recruitment paths lead to those with large networks, so they are over-sampled); • Self-affiliation bias, (as seen above, people tend to recruit those like themselves); • Differential recruitment, (some groups recruit more than others, so their recruitment patterns are over represented). Therefore, the sample composition may not mirror that of the population from which it was drawn, so a valid population estimator must take these factors into account.

Network Indicators (Proportion of cross-cutting ties and network size) Population Estimates (Proportional group sizes, affiliation indices) Population Estimates from RDS:Population Estimates are Derived from Network Indicators Data from Respondent-Driven Sample Aim: To compensate for the effects of network structure on recruitment into the sample, including differences in network sizes, and clustering.

Biased Network Theory (1) • Developed in the early 1950s by Anatol Rapoport, later elaborated by Fararo and Sunshine (1968) • Network ties are formed randomly, through a stochastic process • In an unstructured system, ties are formed through random mixing, e.g., if a group makes up 75% of the population, it will have 75% in-group ties. • More than a century ago, Galton recognized that friendships tend to form among those who are similar—a tendency called homophily. • Ties can also form based on complementarity, e.g., sexual relations among heterosexuals, this is negative homophily also known as heterophily.

Biased Network Theory and Affiliation Patterns • In Biased Network Theory structure can be defined using an Index of Network Clustering termed Homophily (Fararo and Sunshine, 1968, Heckathorn 2002) • Homophily = 1 if all ties are formed to the in-group; Homophily = 0 if all ties are formed randomly; Homophily = -1 if all ties are formed to the out-group • Intermediate values are defined similarly, e.g., homophily = .32 if ties for formed as though 32% of the time an in-group tie is formed, and the rest of the time ties are formed by random mixing. • This clustering index is used because of its fit with RDS sampling theory: If homophily sums correctly the equilibrium sample composition mirrors the population from which it was drawn.

The Reciprocity Model:How to estimate population size based on network indicators When ties are reciprocal, the number of ties from any group A to B, Tab, is equal to the number of ties from B to A, Tba, i.e., Tab = Tba, e.g., 2 = 2 The number of ties from A to B is the product of four terms: (1) the number of nodes in the system, X, (2) the proportional size of A, Pa, (3) the average network size of A, Na, and (4) the proportion of ties from A to B, Sab. Tab = X * Pa * Na * Sab e.g., 5 * .4 * 2 * .5 = 2

When (1-Pa) is substituted for Pb, this reduces to: The Reciprocity Model (2):How to estimate population size based on network indicators Given that: Tab = Tba, by expansion X*Pa*Na*Sab=X*Pb*Nb*Sba Note that the term for total population drops out, so this model yields population proportions but not absolute sizes.

RDS and IRB issues • Unlike snowball sampling, in which Rs provide peers’ contact information and investigators make contact; RDS does not ask Rs to violate privacy of peers • Incentives should be kept modest enough to prevent coercion; but questions should asked about it to ensure it does not happen • Very large incentives could be viewed as in themselves coercive • IRBs have sometimes resisted raising incentive amounts, so do not start with too small an amount—focus groups and pilot studies are useful • IRBs have also been concerned about giving money to IDUs; recruitment quotas limit the annual amount that can be earned, so incentives will not affect drug habits (e.g., in ECHO studies, 99.7% of Rs earned less than $100/year) • Because of tracking by serial numbers, coupons cannot become an alternative source of currency (unlike store coupons, food stamps, etc.)

Limitations of Respondent-Driven Sampling • Limitations inherent in all sampling methods apply to RDS, e.g., the interview site must be readily accessible, interviewers must be culturally sensitive, and no sampling method can completely eliminate non-response bias. • In addition, there are limitations specific to RDS: • Population members must know one another as members of the target population. This can occur, for example, through the contact patterns created by sexual contact or drug sharing. • Network ties must be dense enough to sustain the chain-referral process. • Means must exist to motivate population members to recruit their peers. • Means must exist for verifying membership in the target population, lest others seek entry into the study to gain respondent fees. • Statistical power decreases when homophily is high.

Advantages of RDS • Controls for the biases associated with chain-referral methods, providing both population estimates and estimates of variability for those estimates. • Requires little formative research, and therefore sampling can begin quickly. In contrast, time/space and targeted sampling require detailed prior mapping of the target population. • Accesses persons through their social networks, even reaching those who shun large public venues and avoid the street. • Recruitment is carried out by respondents at minimal cost, no field staff is required, so training requirements and costs are reduced. • Number of additional questions that must be added to the instrument is small. Therefore, the method’s overhead is minor. • Problem of non-response bias is reduced by dual incentive system (respondent fees and peer pressure)

RDS Software • IRIS coupon manager • RDSat • http://www.respondentdrivensampling.org/main.htm

Software: RDSat • Calculates population estimates based on • Linear Least Squares or Data Smoothing (normal or enhanced) • Arithmetic or Weighted Net Sizes • Net size outliers can be pulled in (large outliers make the arithmetic mean unstable; small ones make the weighted mean unstable) • Equilibrium Sample Composition • Weights • Reciprocity Index • Homophily (useful for calculating design effects) • Standard Errors • Accepts data files created by IRIS 3.0, so calculations can be made in the field • Creates a data file useful for studying recruitment networks using UCINET or Pajik • Limits: sample size  2,500, coupons  40 per respondent

Selected Bibliography on RDS • "Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations." By Douglas D. Heckathorn Social Problems. (1997) • "Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations." By Douglas D. Heckathorn Social Problems, 2002. • "Extensions of Respondent-Driven Sampling: A New Approach to the Study of Injection Drug Users Aged 18-25." By Douglas D. Heckathorn, Salaam Semaan, Robert S. Broadhead, and James J. Hughes. AIDS and Behavior, 2002. • "Group Solidarity as the Product of Collective Action: Creation of Solidarity in a Population of Injection Drug Users." By Douglas D. Heckathorn and Judith E. Rosenstein. Advances in Group Processes, 2002. • "Development of a Theory of Collective Action: From the Emergence of Norms to AIDS Prevention and the Analysis of Social Structure." By Douglas D. Heckathorn In New Directions in Sociological Theory: Growth of Contemporary Theories (Joseph Berger and Morris Zelditch, editors). Rowman and Littlefield, 2002. • “Finding the Beat: Using Respondent-Driven Sampling to Study Jazz Musicians.” By Douglas D. Heckathorn and Joan Jeffri. Poetics, 2001. • “Making Unbiased Estimates from Hidden Populations Using Respondent-Driven Sampling.” By Matthew J. Salganik and Douglas D. Heckathorn. Paper presented at the International Social Network Conference, February, 2003, Cancun, Mexico • “Jazz Networks: Using Respondent-Driven Sampling to Study Stratification in Two Jazz Musician Communities.” By Douglas D. Heckathorn and Joan Jeffri. Paper to be presented at the American Sociological Association meetings, August, 2003, Atlanta, GA.

RDS in Brazil • Planning meeting, introduction to RDS November 2004: • Carl Kendall, Tulane • Keith Sabin, CDC • Protocol development May 2005 • Lisa Johnston, Tulane • Data collection August-October 2005 • Analysis workshop May 2006 • Writing workshop July 2006 UCSF - 15 papers

An empirical comparison of RDS, targetedTLS and snowball sampling methodologies in a hidden population in Fortaleza, Brazil Ligia Kerr†, Carl Kendall‡, Rogério Gondim•, Guillerme Werneck◊, Lisa Johnston‡, Keith SabinΩ

Study design • Cross-sectional study in Fortaleza/Ce • 2002 (401) • 32% “Snow Ball” • 68% TLS • 2005 (406) • 100% RDS • Measures • Questionnaire based on BSS • Socio-economic status (education/social class)

Main results Table 1. Education and social class of two survey rounds using three different methods, Fortaleza/ Ce, 2006.

Secondary results Table 2. Education of Aids cases among MSM in Ceará. Fortaleza/ Ce, 2000-2005.

Public Health implications • RDS reaches lower social class respondents than TLS in this example in Fortaleza. • Social classes D and E have a higher proportion of AIDS cases. • RDS would appear to be the sampling method of choice in Fortaleza.

A EFETIVIDADE DO USO DA METODOLOGIA RESPONDENT DRIVEN SAMPLING PARA VIGILÂNCIA COMPORTAMENTAL DO HIV EM TRABALHADORAS DO SEXO EM SANTOS JULHO – 2006 EXECUTORA: ASPPE SANTOS WWW.ASPPE.ORG

REDE SOCIAL – STATUS HIV

Obligado!

Respondent-Driven Sampling

Respondent-Driven Sampling

Presentation Transcript

behavior therapy respondent

Respondent Conditioning

Respondent-driven Sampling for Characterizing Unstructured Overlays

Respondent and Operant Conditioning Together

Sampling and Sampling Distributions

Habituation and Respondent Learning

Respondent Burden

PF managers respondent d ata

An iterative approach to respondent driven sampling (RDS) using community-based

Respondent Conditioning

Table 1. Respondent department demographics

Testing the Random Recruitment Assumption of Respondent-Driven Sampling: Practical Implications

Population Size Estimations With Respondent Driven Sampling

Sampling Designs Systematic Sampling Cluster Sampling Multistage Sampling

Respondent Driven Sampling (RDS)

Sampling and Sampling Distributions

IT use in a respondent driven sampling survey, Kampala, Uganda

Sampling dan Distribusi Sampling()

Utility-Driven Spatiotemporal Sampling using Mobile Sensors

A stata program for Respondent Driven Sampling

Questionnaire Complexity and Respondent Burden