1 / 16

Post-stratification

Post-stratification. Sometimes there is an obvious stratification variable Don’t know stratum assignment for each SU  can’t stratify Take a SRS, e.g. Know stratum totals, N h , which can be used to improve estimation relative to SRS estimators

hea
Télécharger la présentation

Post-stratification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Post-stratification • Sometimes there is an obvious stratification variable • Don’t know stratum assignment for each SU  can’t stratify • Take a SRS, e.g. • Know stratum totals, Nh, which can be used to improve estimation relative to SRS estimators • Very common for household and population surveys • Census data provide number of persons, households per area, by age, …

  2. Food spending example • Objective: estimate the average amount spent on food per week in NC • Possible stratification variable: household composition • Family households might be expected to have higher food bills than non-family households • Sampling frame • List of all households in NC • No information on household composition • From U.S. census data, the distribution of household composition is known

  3. Food spending example – 2 • 2000 Census data on household composition in NC

  4. Post-stratification – 2 • Design phase • SRS of n OUs (could be another design) • Identify poststrata • Sample selection phase • SRS of n Ous • After sampling, get n1 , n2 , …, nH - BUT can’t determine at this point • Data collection phase • Include a question that gathers information on stratum assignment • OU i belongs to poststratum h • Can determine values for n1 , n2 , …, nH • Note that values for nh are random – differ for each sample

  5. Food spending example – 3 • Select SRS of n = 1000 households • Collect data on household composition • List each household member and relationship to respondent • Tabulate number of households for different size categories (nh) • Use Census 2000 population information on number of households for composition categories (Nh)

  6. Food spending example – 4

  7. Post-stratification – 3 • Note that sample composition across post-strata is different from population composition • Consider percentage distribution across post-strata for population (column 3) and for sample (column 5) • Could improve estimates by “calibrating” to the post-stratum population totals – this is the basis for post-stratification estimator • Another way to look at the sample composition is to compare the expected sample size for post-strata with the observed sample size for post-strata obtained from the SRS • Expected sample size for post-stratum h

  8. Food spending example – 5

  9. Post-stratification – 4 • Estimating a population mean • Domain estimation for means, then pool stratum estimates • Variance approximation (nh > 30, n large)

  10. Food spending example – 6 • Food expenditures last week

  11. Food spending example – 7 • Estimate population mean • Estimate SE of estimated mean

  12. Post-stratification – 5 • Formulas involve weighted averages of stratum sample means and variances • Mean estimator looks like stratification estimator • Variance estimator is not the stratification variance estimator • Estimating a population total? • Estimating a population proportion?

  13. Post-stratification – 6 • Estimator for population total • Weight under post-stratified estimator • whj = Nh /nh

  14. Post-stratification and nonresponse • May get disproportionate allocation across poststrata because of differential stratum nonresponse rates • Same approach can be used to improve estimation by using ratio of post-stratum population size to total population in averaging estimates across post-strata

  15. Food spending example – 8

  16. Implicit assumption • Sample post-stratum mean from responding units is an unbiased estimate of the population post-stratum mean • Distribution of Y for responding part of post-stratum population is (approximately) • Same as distribution for whole poststratum population • Same for the nonresponding poststratum population • Often a poor assumption

More Related