A stata program for Respondent Driven Sampling

A stata program for Respondent Driven Sampling Matthias Schonlau, DIW, RAND (USA) Elisabeth Liebau, DIW Stata User Conference Berlin, June 25, 2010

What is RDS? • RDS = Respondent Driven Sampling • Invented by a sociologist (Heckathorn, 1997) • RDS is a chain referral sampling procedure • Sampling probabilities can be calculated • It is the only alternative yielding a probability sample when traditional methods do not work.

Typical RDS populations RDS is employed where traditional probabilistic sampling methods do not work well: • Sampling frame cannot easily be constructed • e.g. no registry available • Low prevalence • screening is ineffective/expensive • E.g. jazz musicians • Anonymity is an issue • E.g. Questions about illegal drugs

RDS Sampling Procedure • Approach several seed respondents • Each respondents approaches 3 further respondents from their social network • Payments to respondents and for each referral who contacts interviewers • Stop when desired sample size reached

Differences to snowball sampling • Respondent recruits directly and do not give contact information to interviewer • Length of referral chain is crucial to reach equilibrium • Formal theory requires keeping track of who recruits whom • No theory in snowball sampling • Theory attaches different sampling weights to recruits depending on their network size and the transition matrix • Snowball sampling does not use sampling weights

Single seed 3 recruits Max chain length =3 (not counting seed) Example data from Heckathorn et al. 2002 Red /blue Example The name “red/blue” is explained later.

Motivation for Theory • If the referral chains are sufficiently long, characteristics of the eventual sample will be independent of the seeds • The recruitment distribution reaches an equilibrium • The probability of recruiting someone from a certain group (e.g. „white female“) can be derived.

Example: 2 groups (red/blue) Transition Count Transition probability

Data required • id: respondent coupon • ref1,ref2,ref3 :referral coupons • degree: network size • key: analysis variable

rds syntax Two steps: rds_network analyzes the network rds does the estimation

Example: Iguchi et al. study • Large US Study of Men who have sex with men, drug users, and their sex partners. • Innovative design, multiple sites • For illustration, we look at data from Los Angeles (Phase II) • Iguchi, M., Ober, A., Berry, S., Fain, T., Heckathorn, D., Gorbach, P., et al. (2009). Simultaneous Recruitment of Drug Users and Men Who Have Sex with Men in the United States and Russia Using Respondent-Driven Sampling: Sampling Methods and Implications. Journal of Urban Health, 86, 5-31.

Large number of seed respondents. The largest referral length is 18.

Required referral length (5) is smaller than largest chain (18, previous slide). Convergence has been reached. If there were only two categories (here 4), both transition matrices would be identical.

Cumulative sample proportions for increasing number of waves Theoretically, sample proportions should stabilize after 5 waves (see program output). In practice, cumulative sample proportions stabilize later, perhaps after 13 waves. (In practice, assumptions are never perfectly met.) Los Angeles

Population + Sample proportion The estimated population proportions are the main result. The sample proportions are surprisingly similar here. This is because the Multiplicity degree does not vary a lot by group

Equilibrium If all assumptions are met, the sample proportions will eventually converge to the equilibrium. The equilibrium does not equal the population proportion, because groups that are better networked (larger degree) are sampled more often.

Degree In the sample, each Hispanic reports an average of 15 connections in the target population. By design, Average Degree is always greater than the multiplicity degree.

Homophily Race “other” recruits at random 96% of the time. Race “black” recruits 47% of the time other blacks and 53% of the time at random

Weight For example, each Hispanic receives the weight 1.0954048 . These weights can be exported using the wgt option.

Weights Weights reproduce the estimated proportions rds ethnic, id(id) degree(netsize) recruiter_id(p_id) recruiter_var(p_key) wgt(wgt)

Bootstrap results Bootstrapping is a method for obtaining confidence intervals. bootstrap _b , reps(1000) : /// rds ethnic, id(id) degree(netsize) recruiter_id(p_id) recruiter_var(p_key) estat bootstrap, percentile

Outlook • Currently working on a paper • Software will be downloadable in about a month from within stata by typing Net search rds and following the link. For now please email me and I will send the code.

THE END Contact : Matt Schonlau: mschonlau@diw.de (until August) matt@rand.org Elisabeth Liebau: eliebau@diw.de Acknowledgement: We are grateful to Martin Iguchi, Sandy Berry, Allison Ober, Terry Fain for giving us access to the data for the example. The group is preparing a public release version of the data after additional publications are written.

A stata program for Respondent Driven Sampling

A stata program for Respondent Driven Sampling

Presentation Transcript

Respondent Conditioning

Respondent-driven Sampling for Characterizing Unstructured Overlays

Tech-Driven PR for a Tech-Driven Public

Decision-driven Framework for a National Transportation Data Program

Respondent Burden

Leading a data-driven action program

An iterative approach to respondent driven sampling (RDS) using community-based

Respondent Conditioning

Improved avaliability for respondents – Respondent service

Value-Driven Design A Program Committee Proposal

Testing the Random Recruitment Assumption of Respondent-Driven Sampling: Practical Implications

Respondent-Driven Sampling

Population Size Estimations With Respondent Driven Sampling

Sampling Program Measurement Options

Respondent Driven Sampling (RDS)

IT use in a respondent driven sampling survey, Kampala, Uganda

Python: Creating a menu-driven program

STATA

Utility-Driven Spatiotemporal Sampling using Mobile Sensors

Sampling Program Measurement Options

Behavioral Methods for Changing Respondent Behavior