170 likes | 186 Vues
Explore techniques like Deviation Analysis and Correlation Analysis to detect duplicate responses in survey data. Discover how to identify top duplicates and clusters by supervisors.
E N D
Detection Techniques Applied Ali Mushtaq WSS – Dec 2
Survey Data Overview • PAPI • 5,000+ completed responses • ~150 Interviewers: 10 to 50 interviews each • 20+ Supervisors: 75 to over 500 interviews overseen for each • Questions: ~200, multiple-response categorical
Deviation Analysis • Deviation Score: compare response pattern distributions, stratified by Group – calculation of deviation using something like Chi-square • Correlation Analysis: same, on joint response patterns • Over all Qs, how many are outlying, what is the average deviation • Assumption: Falsified data is small in scale and is likely to deviate from overall distribution
Duplicate Analysis • Compare one interview record against all others, one at a time, to measure the length of duplicate sequences • Flag pairs with long duplicate sequences • Examine unusually long sequences (esp complete duplicates) • Identify clusters by interviewer, supervisor
Top Duplicates Survey A • “Top Duplicates” : pairs of surveys sharing half or more of their responses in sequence • Top Duplicates Clustered by Supervisor: • All other supervisors (15+) have no top duplicates
Thank You! Questions?
Top Duplicates Survey B • Top Duplicates Clustered by Supervisor: • All other supervisors (15+) have none of the top duplicates