1 / 28

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse. Richard A. Moore Company Statistics Division US Census Bureau Presented by Samson Adeshiyan. 2002 Survey Of Business Owners (SBO) Primary Goal. Provide Business Ownership Statistics State

kaden-gomez
Télécharger la présentation

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census Bureau Presented by Samson Adeshiyan

  2. 2002 Survey Of Business Owners(SBO) Primary Goal • Provide Business Ownership Statistics • State • Industry • Demographic Group • Race --- Native American, Asian, Black, Hawaiian/Pacific Islander, White, Public • Ethnicity --- Hispanic, Non-Hispanic • Gender --- Female, Equal, Male

  3. SBO Primary Publication Level Statistics • Black-owned Grocery Stores in North Dakota (ND) • Number • Aggregate Sales • Aggregate Payroll • Aggregate Employment

  4. What Do We Have?(Econ Census and Tax Returns) • 5.5 mil. companies with paid employees • Receipts, Payroll, Employment • Geographic Codes • Industry Codes • 17.5 mil. companies without paid employees • Receipts • Industry and Geography Codes

  5. What Are We MissingFor Each Business? • Race of Ownership • Ethnicity of Ownership • Gender of Ownership • Obtain this from a stratified sample of 2.5 million businesses

  6. Distribution At the US Level23 Million Companies • Women --- 28% • Hispanic --- 7% • Black --- 5% • Asian --- 5% • Native American --- 1% • Hawaiian/Pacific Islander --- 0.1%

  7. Problem 1: Need Sufficient Representation in the SampleBlack-Owned Groceries in ND • 2002 Estimates • 78 Black-owned businesses in ND • 15 of these in Retail • Only 4 are Grocery Stores • Can’t list groceries in ND in random order and sample systematically

  8. “Modeled Guess” Codes from Admin Info For Each Company • Response from a Previous SBO • Population Distribution by ZIP Code • State/Industry Distribution in 1997 SBO • Owner’(s) Social Security Number when Available • Race/Hispanic/Gender Codes on SSN Application • Surnames (e.g. LOPEZ or WANG) • Country of Birth (e.g. Korea or CUBA) • Decennial Responses

  9. Example • Name …. Michelle Wie’s Pro Shop • Modeled Guess …. Asian Female • Likelihood-Race ……. 0.8912 • Likelihood-Hisp ……. 0.0012 • Likelihood-Female …. 0.9500

  10. Warning: Model is not 100% accurate • Michelle Wie’s Pro Shop • Responds As White, Non-Hispanic,Male • Tabbed As White, Non-Hispanic,Male • If Business response is inconsistent with modeled likelihoods, tabulate by the responses • If a business does not respond, don’t directly infer responses from likelihoods

  11. Problem 2:Differential Response Rates Between Demographic Groups OwnerLikelihood-HispanicResponse Jose Martinez 0.985 Hispanic John Martinez 0.940 ??? Jose’s Sub Shop 0.123 Non-Hispanic Juanita Martin 0.060 Non-Hispanic John Martin 0.040 Non-Hispanic

  12. Likelihoods Aid in Non-Response Adjustment Likelihood-HispanicResponseWeight 1 0.985 Hispanic 4.0 2 0.940 ??? 4.0 3 0.123 Non-Hispanic 4.0 4 0.060 Non-Hispanic 4.0 • 0.040 Non-Hispanic 4.0 Response Rate Adjusted Hispanic-owned Est…5.0 (4.0 * 5/4) Hot Deck Imputed Hispanic-owned Est … 8.0 (4.0 + 4.0)

  13. For Variance:Random Group Replication (RG) • Considerable number of cases where the modeled guess disagrees with the actual response • Cases tabbed from other stratum • Considerable variability in the weights of the tabulated cases

  14. Likelihoods Aid in Non-Response Adjustment LikeResponseWeightRGRcts 1 0.98 Hispanic 4.0 1 10 2 0.94 ??? 4.0 2 1 3 0.12 Non-Hispanic 4.0 3 5 4 0.06 Non-Hispanic 4.0 4 6 • 0.04 Non-Hispanic 4.0 5 8 Imputed Hispanic Firms Est = 8 Imp Hispanic Receipts = 44

  15. For variance calculation:Wt Adjustment MethodFactors on Responding Firms • Firms • Respondents Estimate = 4 • Post Impute Estimate = 8 • Weight Adjustment Factor = 2.0 • Receipts • Respondents Estimate 40 • Post Impute Estimate = 44 • Weight Adjustment Factor = 1.1

  16. Oh-Scheuren Adjustment Factor (1983) • r = # respondents • i = # imputed cases • n = i + r = total number of cases • V1 = variance with impute treated as reported • V2 = V1 * (n/r + i/n)

  17. Oh-Scheuren MethodProblems with Comparison • Research developed for Jackknife not Random Group • Calculate response rates for cell • Best response for our example • Not Missing Random • True response rate is 4 of 5 • Response rate for Hispanics is 1 of 2

  18. Donor Imputation Method(RG # Also Donated) LikelihoodResponseWeightRGReceipts 1 0.98 Hispanic 4.0 1 10 2 0.94 ??? 4.0 2 1 1 0.98 Hispanic 4.0 1 10 • 0.94 Hispanic 4.0 1 1 Imputed Hispanic Firms Est = 8 Imputed Hispanic Receipts = 44 Only RG #1 is non -zero. Same Estimates. Higher Variances.

  19. Advantages of Donating RG # • No need to add multiple factors to record • No need to calculate factors • No problems for microdata users

  20. Compare the Ratios of the Variance of the three Methods • R1 = VAR(Oh-Scheuren) / VAR (Weighted Adjustment) • R2 = VAR(Donor) / VAR (Weighted Adjustment) • Mean for R1 and R2 across publication cells • Std Dev for each of the means of R1 and R2 • Null Hypothesis: Ri = 1 (90% confidence)

  21. Ratio of Variances --- Firm Counts • * Not Statistically Significant from 1.00 at 90%

  22. Ratio of Variances --- Receipts • * Not Statistically Significant from 1.00 at 90%

  23. Ratio of Variances --- Firm Counts • * Not Statistically Significant from 1.00 at 90%

  24. Ratio of Variances --- Receipts • * Not Statistically Significant from 1.00 at 90%

  25. Are the differences acceptable? • Firm Count Variance Ratios Differ by 10% • Receipts Variances Differ up to 70% • => • Firm Count Relative SEs Differ by about 5% • Receipts Relative SEs Differ by up to 30%

  26. Asian-Owned Retail Operations in New Hampshire in 2002

  27. Lingering Question • Is the donation of the RG Number sufficient or do we need to augment the resulting variance with a factor (similar to the Oh-Scheuren factor)?

  28. Any Questions? • Richard Moore • Richard.A.Moore.Jr@census.gov

More Related