Michelle Robinson Department of Sociology

Selection and Non-Response Bias: From a Simulation Study to the ‘Real World’ : A “Case Study” of Children, Families and Schools Michelle Robinson Department of Sociology Interdisciplinary Training Program in Educational Sciences

Social Capital test scores school completion time on homework grades absenteeism

Causal Ambiguity OR Social Capital Child Outcomes

The Intervention • Families and Schools Together (FAST) • research-based after-school program • universally recruited 1st grade families • 8 weeks of weekly meetings at schools • 2 years of monthly meetings • designed to strengthen bonds • parents and school staff • parents and other parents • parents and children Social Capital

Research Design Phoenix San Antonio

Research Design Phoenix San Antonio FAST Control

Table 1. Randomization

Longitudinal Design • Cohort 2 • 1st graders • pre-survey • intervention • post-surveys • Cohort 1 • 3rd graders • follow-up surveys • district records • Cohort 1 • 1st graders • pre-survey • intervention • post-surveys • Cohort 2 • 3rd graders • follow-up surveys • district records Wrap-up 2008-09 2009-10 2010-11 2011-12 2012-13 Academic Year

Selection Bias • Successful Randomization at the cluster level ≠ successful randomization within cluster. • No pretreatment differences in recruitment • Pretreatment differences on survey items • Comparison group more advantaged • Complicates our causal model • Unobserved heterogeneity • Biased estimates • Can weighting help correct this bias?

Table 2: Unhappy Randomization

Non-Response Bias • Non-response is also an issue of selection and can contribute to bias • Makes comparisons difficult • Are we comparing apples and apples or apples and bananas? • MCAR, MAR and NMAR • Are the respondents different from nonrespondents? • In ways that matters for our study?

Table 3: Year 1 Posttest Non-Response

Table 4: Year 1 Posttest Non-Response

Ordinary Least Squares • SRS assumes everyone has the same probability of selection (Normal Distribution) • Differences in the distribution are assumed to either be imposed or a result of random chance • BLUE- Best Linear Unbiased Estimator! • Assumptions 1-5 • In a correctly specified model different distributions will yield (on average) same estimates • Implications for survey design and sampling

Weighting • The purpose of sampling weights is to: ”…make the distribution of some set of variables in the data approximate the distribution of those variables in the population from which the sample was drawn.” p.240

Expected vs. OLS vs. WOLS • When correctly specified model is estimated, the parameters will not differ across Expected, OLS and WOLS • Reasons why parameters may differ • Incorrectly Specified Models • Omitted Variable Bias • Nonlinear Relationships • Pooled samples • weighted average of the estimates for the two separate samples

Weighting and OLS • Incorrectly specified models’ parameters are sensitive to relative group size and weighting. • Parameter sensitivity can be used to diagnose model misspecification • Interaction between Weight and X should explain no additional variance • May be a proxy for interactions or omitted variables • Weights are a function of the data!

Estimation Issues in WOLS • Standard errors • Sample Size • Assumes there are Wi individuals in the sample • Homoscedasticity • Errors are multiplied by a constant Wi • Bias can’t be predicted • White (1980) hetereoscedastic consistent estimator

Problem 1: Weights are a function of observed independent variables included in the model • Estimate Weighted Model • DuMouchel and Duncan (1983) • Look for differences between the OLS and WOLS parameter estimates • Goal is to find possible sources of misspecification • No difference, use OLS! • Difference, respecify model

Problem 2: Weights are a function of observed independent variables included in the model and the dependent variable • Estimate Weighted Model • DuMouchel and Duncan (1983) • Look for differences between the OLS and WOLS parameter estimates • Likely to be different in this case • Difference, respecify model and use OLS • If parameters still differ, use WOLS • Sample selection bias

Model Specification as a Golden Ticket • Model specification • Social Phenomena is by nature complex! • “causal model” e.g. correctly specified model, is often unknown • May not be known a priori • Imperfect data • “true causal model” is often complex in ways which violates OLS assumptions • Autocorrelated errors (Assumption 4 and 5) • Clustered data • Longitudinal

Michelle Robinson Department of Sociology

Michelle Robinson Department of Sociology

Presentation Transcript

DEPARTMENT OF SOCIOLOGY

Sociology Department

Prosyanyuk, Daria V . Department of Sociology

Ruth Woodfield Department of Sociology r.woodfield@sussex.ac.uk

Michelle M. Rogers, Michigan Department of Environmental Quality

Department of Sociology and Anthropology

Michelle Wolfe Lanan Bascome Andrea Jackson Keith Robinson

Department of Sociology and Anthropology

Michelle M. Taylor-Robinson Joseph Ura

Dumitru Sandu, University of Bucharest , Sociology Department

by Emma Porio Professor of Sociology, Department of Sociology and Anthropology

Department of Sociology, U of He

Sociology Department Retreat

SOCIOLOGY MRS. MICHELLE VAN SCIVER

Department of Sociology New Graduate Student Orientation

Dr. Sharon Barnartt Department of Sociology

Sociology Department Retreat

DEPARTMENT OF SOCIOLOGY