1 / 27

Dual System Estimation and Census Adjustment

Dual System Estimation and Census Adjustment. Stephen E. Fienberg Statistics 36-149 Department of Statistics Carnegie Mellon University November 27-29, 2001. fish* penguins homeless prostitutes in Glasgow Italians with diabetes*. people in the U.S.** people with HIV virus

kira
Télécharger la présentation

Dual System Estimation and Census Adjustment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dual System Estimation and Census Adjustment Stephen E. Fienberg Statistics 36-149 Department of Statistics Carnegie Mellon University November 27-29, 2001

  2. fish* penguins homeless prostitutes in Glasgow Italians with diabetes* people in the U.S.** people with HIV virus adolescent injuries in Pittsburgh, PA WWW What Do Following Populations Have in Common?

  3. Example 1: Diabetes Prevalence • Bruno et al. (1994) used 4 sources for • ascertainment of diabetes in Casale Monferrat, • Northern Italy • s1: diabetes clinic and/or family physicians • s2: patients discharged with diagnosis from hospitals • s3: insulin or oral hypoglycaemic prescriptions • s4: requests for reimbursement for insulin and • reagent strips

  4. Example 1: Diabetes (cont.) • s1 Yes Yes No No • s2 Yes No Yes No • s3 s4 • Yes Yes 58 46 14 8 • Yes No 157 650 20 182 • No Yes 18 12 7 10 • No No 104 709 74 - n = 2069

  5. Example 2: Fish in a Lake • 200 fish caught 1st time • 150 fish caught 2nd time • Of 150 fish in 2nd sample, 125 were among 200 counted in 1st sample • Total number of fish caught = 200 + (150 - 125) = 225 • But how many fish have gone undetected?

  6. Example 2: Fish in a Lake • Proportion of fish in 2nd sample also in 1st = 125/150 = 5/6 • Generalize from sample to population (5/6) N = 200 N = (6/5) 200 = 240 • This is method of capture-recapture due to Peterson, Lincoln, Schnabel, etc. ^ ^

  7. Capture-Recapture Model • Sample 2 • In Out Total • In a bn1 • Out c d ??N -n1 • Total n2 N - n2 N ?? Sample 1 ^ N = n1 n2/a

  8. Role of Independence

  9. Some Formal Details • Alternatively, we think in terms of the ratio of odds for row 1 vs. odds for row 2: • P{A and B} / P{A and Bc} • P{Acand B} / P{Acand Bc} • P{A and B} P{Acand Bc} • P{Acand B} P{A and Bc} • and under independence this equals 1. =

  10. Some Formal Details • Back to data. • We think of independence in terms of equality of odds, and we set • ad/bc = 1 • and estimate unobserved d by • d = bc/a • N = a+ b+ c+bc/ a • = n1 n2/a ^ ^

  11. More Formal Version 125 75 200 25 ? 150 Ñ= 150 200/125 = 240 Ñ =n1n2/a

  12. Example 1: DiabetesLooking at Pairs of Lists ^ Pair N s1, s2 2,351 s1, s3 2,185 s1, s4 2,262 s2, s3 2,057 s2, s4 803 s3, s4 1,555 Estimated s.e.’s are on the order of 100. Only 3 of 6 estimates exceed n = 2069.

  13. Diabetes Example:What is Going Wrong? • Independence of lists in the pairs!

  14. Capture-Recapture Assumptions • Random samples • Independence • Closed population • Perfect matching (no tag loss) • Homogeneity • How do we check on assumptions? • The problem of the “wiley trout.”

  15. Accuray and Coverage Evaluation Survey • Survey approximately 314,000 HH in 11,000 blocks. • Used to correct raw census counts using “capture-recapture” or dual systems estimation methodology. • Correct for omissions AND erroneous enumerations.

  16. ACE Design • Two parts to ACE sample of blocks: • sample of population -- P-sample • used to estimate omissions • matched records against those for census • sample of census -- E-sample • used to estimate erroneous enumerations • subtract out EEs from census counts before using DSE

  17. Dual Systems Components

  18. DSE With Same Values As Fish 125 75 200 25 ? 150 nCEN=census count - EEs Ñ =nCENnACE/a Ñ= 150 200/125 = 240

  19. DSE Features in 2000 • Excluded homeless/shelters and group quarters from calculations in 2000 • Adjusted sample counts for movers • Searching in adjacent blocks

  20. Some Practical Issues • How big is d relative to c? • Within HH vs between HH omissions • Counts of zero • “Negative” adjustment factors -- <1 • some blocks go up in size after DSE and some go down

  21. Dual Systems Assumptions • Perfect matching • idea of probabilistic matching with variable probabilities for different individuals • Homogeneity • Dependence between sample and census • heterogeneity and dependence get combined in what is called correlation bias • Errorless assessment of erroneous enumerations

  22. ACE Implementation • Aggregate counts from census blocks for various demographic and racial/ethnic groups. • Apply DSE for these aggregates (called post-strata). • Generalizing from adjustments for the ACE sample of blocks and strata to the nation. • synthetic error

  23. Post-strata • Instead of doing DSE at the block level, we reorganize the data by grouping parts of blockes according to • age • race/ethnicity • sex • occupancy status • mail return rate • Results in over 480 post-strata, and we apply DSE in each.

  24. What Do We Know About Dual SystemsAssumptions at Post-strata Level?

  25. Synthetic Assumption • Carrying the adjustments back to the individual blocks not in the ACE sample: • Assumes the homogenity of all of those parts of blocks in each post-stratum. • Result is that some blocks increase and some blocks decrease in estimated population size • decreases total 1 million • increases total 4.3 million

  26. March 2001 Adjustment Decision • Not ready to adjust using DSE. • Concerns: • DA • loss functions • counties under 100,000 • balancing error • synthetic error

  27. Oct. 2001 Adjustment Decision • Still not ready to adjust! • Old concerns: • DA • loss functions? • balancing error - no • synthetic error -no • New concern: • missed EEs in ACE

More Related