1 / 28

Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information

Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information. Donsig Jang, Xiaojing Lin, Amang Sukasih Mathematica Policy Research, Inc. Steve Cohen, Kelly Kang National Science Foundation ITSEW 2008 Research Triangle Park, NC, June 2, 2008. Disclaimer.

zihna
Télécharger la présentation

Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information Donsig Jang, Xiaojing Lin, Amang Sukasih Mathematica Policy Research, Inc. Steve Cohen, Kelly Kang National Science Foundation ITSEW 2008 Research Triangle Park, NC, June 2, 2008

  2. Disclaimer The opinions and assertions are those of the authors and do not reflect the views or policies of the National Science Foundation

  3. Survey Data Collection • Involves many complex processes including • Sampling frame construction • Sample selection • Data collection • Data processing • Estimation • Each process subjects to error • Attempt to decompose the total survey errors into separate stages of processes

  4. Parameter Sampling Frame Sample Respondent Data Estimator Total Survey Errors Misclassification error Coverage error Sampling error Nonresponse error Measurement error Estimation error

  5. Misclassification Error in Stratification • Focus of this talk • A part of non-sampling error • Important but often overlooked component

  6. Trade-off: cost to gather stratification information at the frame construction vs. optimal sample allocation • Loss of effective sample sizes for some analytic domains Stratification in Sampling • Enhance precision of survey estimates • Precision requirements for analytic domains • Often imperfect information on stratification variables • Misclassification in stratification • Trade-off: cost to gather stratification information at the frame construction vs. optimal sample allocation • Loss of effective sample sizes for some analytic domains

  7. Misclassification Matrix True classification A Stratification classification A* the proportion of units classified as category jin true category k and

  8. Measures for Misclassification Effects • Bias • Effective sample size change

  9. Bias Due to Misclassification where = true population props. = Identity matrix = sample proportions s denotes sample, wi the sampling weight for unit i, and I(.) the indicator function Kuha and Skinner 1997

  10. Bias Estimation If the true classification is available from the sample: where

  11. Effective Sample Sizes and Variance Inflation Factors for domain d constructed based on true value for domain d constructed based on misclassified value • Measures the inflation of variance due to weight variation

  12. Example: National Survey of Recent College Graduates (NSRCG) • Sponsored by National Science Foundation • Collecting education, employment, and demographic information from recent graduates with Bachelor’s or Master’s in science, engineering, or health fields • For details, • http://www.nsf.gov/statistics/srvyrecentgrads

  13. NSRCG (Continued) • Two stage sample design: school sample at the first stage and graduate sample at the second stage • Crucial to collect key sampling variables (degree date, degree level, field of major, race/ethnicity, and gender) from schools for eligibility determination and stratification (frame variables) • Sample was designed to have moderate weight variation within domains while meeting certain sample size thresholds • Quality of sampling variables compromised due to schools’ reluctance to release the student’s information, non-standard formats used by schools, and inaccurate/incomplete administrative data Jang and Lin (2007 JSM)

  14. NSRCG (Continued) • Same information (degree date, degree level, field of major, race/ethnicity, and gender) were also collected from sampled graduates • Able to measure the quality of school provided information for stratification by assessing discrepancies between school provided information and reported values • Looking at two survey data (2003 and 2006 NSRCG)

  15. Misclassification for Gender NSRCG2003 NSRCG2006 ReBias for PMale= -0.01% ReBias for PMale = 0.50%

  16. Misclassification for Race/Ethnicity NSRCG2003 NSRCG2006

  17. Effective Sample Sizes and Variance Inflation Factors • What if taking reported values for discrepant cases? • Result in more weight variation within domains based on reported values due to unequal selection probabilities across classes • Check domain specific sample sizes and variance inflation factors

  18. = White, = Asian, = Minority Variance Inflation Factors Domain: race/ethnicity by degree level by major field by gender NSRCG2003 NSRCG2006

  19. = White, = Asian, = Minority Ratio of Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field by gender NSRCG2003 NSRCG2006

  20. = White, = Asian, = Minority Ratio of Effective Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field by gender NSRCG2003 NSRCG2006

  21. = White, = Asian, = Minority Variance Inflation Factors Domain: race/ethnicity by degree level by major field NSRCG2003 NSRCG2006

  22. = White, = Asian, = Minority Ratio of Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field NSRCG2003 NSRCG2006

  23. = White, = Asian, = Minority Ratio of Effective Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field NSRCG2003 NSRCG2006

  24. = White, = Asian, = Minority Variance Inflation Factors Domain: race/ethnicity by gender NSRCG2003 NSRCG2006

  25. = White, = Asian, = Minority Ratio of Sample Size, n_R / n_F Domain: race/ethnicity by gender NSRCG2003 NSRCG2006

  26. = White, = Asian, = Minority Ratio of Effective Sample Size, n_R / n_F Domain: race/ethnicity by gender NSRCG2003 NSRCG2006

  27. Summary • Misclassification in stratification may reduce the effective sample sizes for domains that were sampled with high sampling rates • Crucial to have good classification in stratification, especially with substantially unequal probability selections implemented

  28. Next Steps • Population counts for key domains available but based on misclassification • Estimation of population counts: • Weighted sums of correct classification from the sample • Use of misclassification parameter estimates, where is the vector with population counts of domains defined by A* • Raking adjustments of the weights using • Comparison of key estimates

More Related