1 / 71

Ch 4: Stratified Random Sampling (STS)

Ch 4: Stratified Random Sampling (STS). DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called strata, and then selecting a random sample from each stratum. Procedure.

iren
Télécharger la présentation

Ch 4: Stratified Random Sampling (STS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 4: Stratified Random Sampling (STS) • DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called strata, and then selecting a random sample from each stratum

  2. Procedure • Divide sampling frame into mutually exclusive and exhaustive strata • Assign each SU to one and only one stratum • Select a random sample from each stratum • Select random sample from stratum 1 • Select random sample from stratum 2 • … Stratum H Stratum #1 h=1 h=2 . . . . . . h=H

  3. Ag example • Divide 3078 counties into 4 strata corresponding to regions of the countries • Northeast (h = 1) • North central (h = 2) • South (h = 3) • West (h = 4) • Select a SRS from each stratum • In this example, stratum sample size is proportional to stratum population size • 300 is 9.75% of 3078 • Each stratum sample size is 9.75% of stratum population

  4. Ag example – 2

  5. Procedure – 2 • Need to have a stratum value for each SU in the frame • Minimum set of variables in sampling frame: SU id, stratum assignment

  6. Ag example – 3

  7. Procedure – 3 • Each stratum sample is selected independently of others • New set of random numbers for each stratum • Basis for deriving properties of estimators • Design within a stratum • For Ch 4, we will assume a SRS is selected within each stratum • Can use any probability design within a stratum • Sample designs do not need to be the same across strata

  8. Uses for STS • To improve representativeness of sample • In SRS, can get ANY combination of n elements in the sample • In SYS, we severely restricted the set to k possible samples • Can get “bad” samples • Less likely to get unbalanced samples if frame is sorted using a variable correlated with Y

  9. Uses for STS – 2 • To improve representativeness of sample - 2 • In STS, we also exclude samples • Explicitly choose strata to restrict possible samples • Improve chance of getting representative samples if use strata to encourage spread across variation in population

  10. Uses for STS – 3 • To improve precision of estimates for population parameters • Achieved by creating strata so that • variation WITHIN stratum is small • variation AMONG strata is large • Uses same principal as “blocking” in experimental design • Improve precision of estimate for population parameter by obtaining precise estimates within each stratum

  11. Uses for STS – 4 • To study specific subpopulations • Define strata to be subpopulations of interest • Examples • Male v. female • Racial/ethnic minorities • Geographic regions • Population density (rural v. urban) • College classification • Can establish sample size within each stratum to achieve desired precision level for estimates of subpopulations

  12. Uses for STS – 5 • To assist in implementing operational aspects of survey • May wish to apply different sampling and data collection procedures for different groups • Agricultural surveys (sample designs) • Large farms in one stratum are selected using a list frame • Smaller farms belong to a second strata, and are selected using an area sample • Survey of employers (data collection methods) • Large firms: use mail survey because information is too voluminous to get over the phone • Small firms: telephone survey

  13. Estimation strategy • Objective: estimate population total • Obtain estimates for each stratum • Estimate stratum population total • Use SRS estimator for stratum total • Estimate variance of estimator in each stratum • Use SRS estimator for variance of estimated stratum total • Pool estimates across strata • Sum stratum total estimates and variance estimates across strata • Variance formula justified by independence of samples across strata

  14. Ag example – 4

  15. Ag example – 5 • Estimated total farm acres in US

  16. Ag example – 6

  17. Ag example – 7 • Estimated variance for estimated total farm acres in US

  18. Ag example – 8 • Compare with SRS estimates

  19. Estimation strategy - 2 • Objective: estimate population mean • Divide estimated total by population size • OR equivalently, • Obtain estimates for each stratum • Estimate stratum mean with stratum sample mean • Pool estimates across strata • Use weighted average of stratum sample means with weights proportional to stratum sizes Nh

  20. Ag example – 9 • Estimated mean farm acres / county

  21. Ag example – 10 • Estimate variance of estimated mean farm acres / county

  22. h=1 h=2 . . . . . . h=H Stratum 1 Notation Stratum H • Index set for stratum h = 1, 2, …, H • Uh = {1, 2, …, Nh } • Nh= number of OUs in stratum h in the population • Partition sample of size n across strata • nh = number of sample units from stratum h (fixed) • Sh = index set for sample belonging to stratum h

  23. Notation – 2 • Population sizes • Nh= number of OUs in stratum h in the population • N = N1+ N2 + … + NH • Partition sample of size n across strata • nh = number of sample units from stratum h • n = n1+ n2 + … + nH • The stratum sample sizes are fixed • In domain estimation, they are random • For now, we will assume that the sampling unit (SU) is an observation unit (OU)

  24. Notation – 3 • Response variable Yhj = characteristic of interest for OU j in stratum h • Population and stratum totals

  25. Notation – 4 • Population and stratum means

  26. Notation – 5 • Population stratum variance

  27. Notation – 6 • SRS estimators for stratum parameters

  28. STS estimators • For population total

  29. STS estimators – 2 • For population mean

  30. STS estimators – 3 • For population proportion

  31. Properties • STS estimators are unbiased • Each estimate of stratum population mean or total is unbiased (from SRS)

  32. Properties – 2 • Inclusion probability for SU j in stratum h • Definition in words: • Formula hj =

  33. Properties – 3 • In general, for any stratification scheme, STS will provide a more precise estimate of the population parameters (mean, total, proportion) than SRS • For example • Confidence intervals • Same form (using z/2) • Different CLT

  34. Sampling weights • Note that • Sampling weight for SU j in stratum h • A sampling weight is a measure of the number of units in populations represented by SU j in stratum h

  35. Example • Note: weights for each OU within a stratum are the same

  36. Example – 2 • Dataset from study

  37. Sampling weights – 2 • For STS estimators presented in Ch 4, sampling weight is the inverse inclusion probability

  38. Defining strata • Depends on purpose of stratification • Improved representativeness • Improved precision • Subpopulations estimates • Implementing operational aspects • If possible, use factors related to variation in characteristic of interest, Y • Geography, political boundaries, population density • Gender, ethnicity/race, ISU classification • Size or type of business • Remember • Stratum variable must be available for all OUs

  39. Allocation strategies • Want to sample n units from the population • An allocation rule defines how n will be spread across the H strata and thus defines values for nh • Overview for estimating population parameters Special cases of optimal allocation

  40. Allocation strategies – 2 • Focus is on estimating parameter for entire population • We’ll look at subpopulations later • Factors affecting allocation rule • Number of OUs in stratum • Data collection costs within strata • Within-stratum variance

  41. Proportional allocation • Stratum sample size allocated in proportion to population size within stratum • Allocation rule

  42. Ag example – 11

  43. Proportional allocation – 2 • Proportional allocation rule implies • Sampling fraction for stratum h is constant across strata • Inclusion probability is constant for all SUs in population • Sampling weight for each unit is constant

  44. Proportional allocation – 3 • STS with proportional allocation leads to a self-weighting sample • What is a self-weighting sample? • If whj has the same value for every OU in the sample, a sample is said to be self-weighting • Since each weight is the same, each sample unit represents the same number of units in the population • For self-weighting samples, estimator for population mean to sample mean • Estimator for variance does NOT necessarily reduce to SRS estimator for variance of

  45. Proportional allocation – 4 • Check to see that a STS with proportional allocation generates a self-weighting sample • Is the sample weight whj is same for each OU? • Is estimator for population mean equal to the sample mean ? • What happens to the variance of ?

  46. Ag example – 12 • Even though we have used proportional allocation, rounding in setting sample sizes can lead to unequal (but approximately equal) weights

  47. Neyman allocation • Suppose within-stratum variances vary across strata • Stratum sample size allocated in proportion to • Population size within stratum Nh • Population standard deviation within stratum Sh • Allocation rule

  48. Caribou survey example

  49. Optimal allocation • Suppose data collection costs chvary across strata • Let C = total budget c0 = fixed costs (office rental, field manager) ch = cost per SU in stratum h (interviewer time, travel cost) • Express budget constraints asand determine nh

  50. Optimal allocation – 2 • Assume general case: stratum population sizes, stratum variances, and stratum data collection costs vary across strata • Sample size is allocated to strata in proportion to • Stratum population size Nh • Stratum standard deviation Sh • Inverse square root of stratum data collection costs • Allocation rule

More Related