1 / 16

Rebecca Vassallo ESRC Research Methods Festiva l , July 2012

A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models. Rebecca Vassallo ESRC Research Methods Festiva l , July 2012. Introduction.

reilly
Télécharger la présentation

Rebecca Vassallo ESRC Research Methods Festiva l , July 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models Rebecca Vassallo ESRC Research Methods Festival, July 2012

  2. Introduction • Influence of the interviewer and area on survey response behaviour • Reflects unmeasured factors including the interviewer’s and area’s characteristics • Violation of the assumption of independence of observations • Standard analytical techniques will underestimate standard errors and can result in incorrect inference (Snijders & Bosker, 1999) • Multilevel modelling has become a popular method in analysing area and interviewer effects on nonresponse

  3. Introduction • Estimation problem relating to the identifiability of area and interviewer variation • Interpenetrated sample design considered as the gold standard for separating interviewer effects from area effects • Restrictions in field administration capabilities and survey costs only allow for partial interpenetration • Multilevel cross-classified specification used in such cases (Von Sanden, 2004) • No studies available examining the properties of parameter estimates from such models under different conditions

  4. Study Aims • Examine the implications of interviewer dispersal patterns within different scenarios on the quality of parameter estimates • Percentage relative bias, confidence interval coverage, power of significance tests and correlation of random parameter estimates • Different scenarios vary in sample sizes, overall rates of response, and the area and interviewer variance • Identify the smallest interviewer pool and the most geographically-restrictive interviewer case allocation required for acceptable levels of bias and power

  5. Methodology: Simulation Model • Model: (); ~N(0, ); ~N(0, ) • STATA Version 12 calling MLwiN Version 2.25 through the ‘runmlwin’ command (Leckie & Charlton, 2011) • Markov Chain Monte Carlo (MCMC) estimation method • MCMC method produces less biased estimates compared to first-order marginal quasi-likelihood (MQL) and second-order penalised quasi-likelihood (PQL) (Browne, 1998; Browne & Draper, 2006) • IRIDIS High Performance Computing Facility cluster at the University of Southampton

  6. Methodology: Data Generating Procedure • Overall probability of the outcome for the area and the interviewer with zero random effects determines overall intercept (fixed for all cases) • Cluster-specific random effects for each interviewer and area generated separately from N(0, ) & N(0, ) • and are generated for every simulation, but maintained constant across different scenarios where the only factor that changes is interviewer case allocations • The allocation of workload from different areas to specific interviewers is limited to a finite number of possibilities

  7. Methodology: Data Generating Procedure • () of each case are computed and converted to probabilities • Values of the dependent variable -a dichotomous outcome for each case - are generated from a Bernoulli distribution with probability • For each scenario of the experimental design, 1000 simulated datasets are generated using R Version 2.11.1

  8. Methodology: Simulation Factors • Simulated scenarios vary in the following factors: -the overall sample size (N) -the number of interviewers and areas (; ) -the interviewer-area classifications [which vary in terms of the number of areas each interviewer works in (maximum 6 areas) and the overlap in the interviewers working in neighbouring areas] -the ICC (variances & ) -the overall probability of the outcome variable (π) • Medium scenario design (similar to values observed in a real dataset - a realistic starting point): 120 areas (48 cases/area) allocated to 240 interviewers (24 cases/int), totalling 5760 cases, =0.3, =0.3, π=0.8

  9. Methodology - Quality Assessment Measures • Correlation between area and interviewer parameter estimates. High negative values indicate identifiability problems • Percentage relative bias • Confidence interval coverage for 95% Wald confidence interval and the 95% MCMC quantiles are compared to nominal 95% • Power of Wald test - proportion of simulations in which the null hypothesis is correctly rejected

  10. Results - Power of Tests • For medium scenario power ≈1 for all interviewer case allocations • For smaller N, more sparse allocations are required to get power >0.85 • Lower (0.2) results in lower power • When = more interviewer dispersion is required for acceptable levels of power • Higher π (0.9) requires 2 areas/int for power>0.9 • Reduced interviewer overlapping for a constant number of areas/int does not improve power

  11. Results - Correlation between & Estimates • For all scenarios, high negative ρ (>-0.4) are observed when interviewers work in 1 area only • No substantial change in ρ with varying total sample sizes • Very high negative ρ (up to -0.9) for =scenarios; ρ only reduced to <-0.1 when interviewer is working in 4+ areas (compared to 2+ areas/int for =2*scenarios) • Higher ρ with increasing π up till 2 areas/int allocations; thereafter no change in ρ by π • Lower ρ with increasing up till 3 areas/int allocations • Lower ρ with less interviewer overlapping for the 2 areas/int cases

  12. Results – Percentage Relative Bias • In most scenarios N=5760, the relative percentage bias is around 1-2% once interviewers are allocated to 2+ areas • Further interviewer dispersion (3+ areas) & less interviewer overlapping do not yield systematic drops in bias • When interviewers are working in 2+ areas, the bias in the estimate is generally greater than the bias in estimate [when =2*] • Greater bias observed for smaller sample sizes, with the scenario including 1440 cases with =obtaining bias values between 5-13% for all allocations

  13. Results - Confidence Interval Coverage • Close to 95% nominal rate in all scenarios • Some cases of under-coverage or over-coverage for scenarios when interviewer works in just one area -87% coverage (N=5760, =2*, =0.2, =0.3, π=0.8, one area/int) for CI -88% coverage (N=2880 or N=1440, =2*, =0.3, =0.3, π=0.8, one area/int) for CI -100% coverage (N=5760 or 2880 or 1440, =, =0.3, =0.3, π=0.8, one area/int) for for and for CIs • No clear evidence that the MCMC quantiles perform better than the Wald asymptotic normal CIs

  14. Conclusion • Interpenetration not required to distinguish between area and interviewer variation • Good quality estimates obtained for large sample sizes (≈6000 cases) if interviewers work in at least two areas • Better estimates obtained when the number of interviewers is greater than the number of areas • Higher overall probabilities & smaller variances (smaller ICC) require more interviewer dispersion for some survey conditions • The extent of interviewer overlapping shown not to be important • Results and their implications can be extended to other applications

  15. Acknowledgements • University of Southampton, School of Social Sciences Teaching Studentship • UK Economic and Social Research Council (ESRC), PhD Studentship (ES/1026258/1) • Gabriele B. Durrant & Peter W. F. Smith, PhD Supervisors

  16. References • Browne, W. J. (1998). Applying MCMC Methods to Multi-level Models. PhD thesis, University of Bath. • Browne, W. & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1, 473-514. • Leckie, G. & Charlton, C. (2011). runmlwin: Stata module for fitting multilevel models in the MLwiN software package. Centre for Multilevel Modelling, University of Bristol. • Snijders, T.A.B. & Bosker, R.J. (1999). Multilevel Analysis: an introduction to basic and advanced multilevel modelling.London: Sage. • Von Sanden, N. D. (2004). Interviewer effects in household surveys: estimation and design. Unpublished PhD thesis, School of Mathematics and Applied Statistics, University of Wollongong.

More Related