1 / 31

Analysis of complex sampling designs: A brief primer  

This primer provides an overview of complex survey designs, including unequal probability of selection and clustering of observations. It discusses the major problems with analyzing complex surveys and provides definitions and strategies for sample selection. The primer also covers different types of sampling designs, such as simple random sampling, stratified sampling, and cluster sampling, as well as the use of survey weights.

fedwin
Télécharger la présentation

Analysis of complex sampling designs: A brief primer  

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of complex sampling designs: A brief primer   Clyde Dent, PhD PSU quantitative interest group Brown Bag December 2009

  2. What makes a survey design ‘complex’? • unequal probability of selection: some groups may be intentionally sampled at higher rates • clustering of observations: elements are intentionally sampled in intact groups

  3. The Major Problems with analysis of complex survey designs Observations are not independent in complex surveys. Standard statistical analysis generally overestimates the precision of estimates Unequal selection probabilities needs weighted analysis

  4. Some definitions in survey sampling: Complex surveys: A survey implementation where sample elements are not drawn by simple random sampling method. Population: The entire set of individuals about which findings of a survey refer to. Sample: A subset of population selected for a study. Sample Design: The scheme by which items are chosen for the sample.

  5. Identifying the Sample Frame • Sampling units or elements - The “things” that data are collected data about • A sampling frame is a list of units or elements that defines the target population. • A sampling frame should cover all of the target population.

  6. Strategies in sample selection • Census • Probability Sample • Non-Probability samples

  7. Probability Sampling • All members of a population have some chance (or non-zero probability) of being sampled • Employs random selection

  8. Survey Designs Three Basic Designs: Simple random sampling Stratified sampling Cluster sampling Two methods of conducting surveys: Single-stage sampling plans Multi-stage sampling plans

  9. Simple Random Sampling • In addition to having a non-zero chance of being selected, each element of the population has an equalchance of being selected • Each element is selected independently

  10. Simple Random Sampling • Ease of analysis • Need an exhaustive sample frame • Typically need large samples to get a desired precision in estimates • Can be high cost • Can take more time

  11. Stratified Random Sampling • Divide population into various “strata” or subgroups • Randomly sample within these strata • Strata may be geographical areas, races or ethnicities, socioeconomic classes, etc…

  12. Stratified Random Sampling • Can increase precision of estimates • Allows for differential sampling for distinct sub-pops (over-sampling) • May decrease costs • Accomplishes two tasks: • Makes sample more representative of population • Controls for confounding effects of the stratification criteria

  13. Cluster Sampling • Identify “clusters” within a population • Counties, nursing homes, factories, etc… • Randomly sample these clusters • Survey a census of individuals in each sampled cluster (single-stage) • The major difference in this technique is that the primary sampling element is the cluster, not the individual

  14. Cluster Sampling • Useful when a sampling frame of individuals is difficult to get or does not exist • Lowers cost • Travel • Supervision • Decreases time • Good when elements w/in cluster are heterogeneous • Loss of precision in estimates

  15. Multi-Stage Cluster Sampling • Break population into clusters • Take random sample of clusters (stage 1) • From this sample take random sample of individuals (stage 2)

  16. Multi-Stage Cluster Sampling – Terms and Examples • Primary Sampling Unit (PSU) • The first set of clusters identified and sampled • Secondary Sampling Unit (SSU) • The second or sub-set of clusters identified and sampled • Example: Population is the entire adult U.S. population • PSU’s may be all U.S. counties • SSU’s may be ZIP codes within selected counties • A sample of individuals from within selected ZIP codes of selected counties from within the U.S. is taken as the final sampling unit (FSU)

  17. Stratified Multi-Stage Cluster Sampling • Same as multi-stage cluster sampling, except… • Stratify PSU’s prior to initial sampling • In last Example: Stratifying counties into three strata – Urban, Suburban and Rural

  18. Weighting

  19. What is a Survey Weight? • A value assigned to each case in the data file. • Normally used to make estimates computed from the data more representative of the population. • E.g., the value indicates how much each case will count in a statistical procedure. • Examples: • A weight of 2 means that the case counts in the dataset as two identical cases. • A weight of 1 means that the case only counts as one case in the dataset. • Weights can (and often are) fractions, but are always positive and non-zero.

  20. Types of Survey Weights • Two most common types: • Design Weight: • Normally used to compensate for over- or under-sampling of specific cases or for disproportionate stratification. • Post-Stratification Weight. • This type is used to compensate for fluctuations in sampling on important non-design characteristics (ie, age).

  21. Calculating Design Weights If we know the sampling fraction for each case, the weight is the inverse of the sampling fraction. Design Weight = 1/(sampling fraction) The sampling ”fraction” could be over 1.0 Example: If we over-sampled African Americans at a rate 4 times greater than the rate for Whites, than the design weight for an African American would be ¼ of that for a White respondent.

  22. Calculating Post-Stratification Weights • This is normally more difficult then design weights. • It requires the use of auxiliary information about the population • may take a number of different variables into account. • Information usually needed: • Population estimates of the distribution of a set of demographic characteristics that have also been measured in the sample • For example, information found in the Census such as: • Gender, Age, Educational attainment, • Household size, Residence (e.g., rural, urban), Region

  23. Problems with Weights • Weights primarily adjust means and proportions. OK for descriptive data but may adversely affect inferential data and standard errors. • Weights almost always increase the standard errors of your estimates. Introduce imprecision into your data. • Very large weights (or very small ones) can also introduce imprecision.

  24. Variance estimation

  25. Variance Estimation • Sampling variation depends on the estimator, sample design and sample size • Many researchers believe it depends only on sample size • Standard variance formulae available for most analysis methods • Typically assume SRS • However these formulae do not work for the sample designs used in most complex surveys

  26. Classical Approaches • Variance formula have also been developed for some estimators under a wide range of sample designs • See books by authors such as Cochran and Kish • 1950’s to 1970’s • Design effect • Ratio of actual variance to variance assuming SRS of same size • Typically varies from one item to the next • Usually under 2 for well designed surveys, but sometimes more • Can be much higher for other surveys – e.g. 25 for some items in AK BRFSS

  27. Design Effect (deff) Vc = variance of an estimator from a complex design Vsrs = variance of an estimator from a simple random sample of the same size Deff = Vc / Vsrs

  28. Effective Sample Size n = actual sample size n’ = ‘effective sample size’ i.e size of a simple random sample with the same variance n’ = n/deff Example n = 10,000 deff = 2 n’ = 10,000/2 = 5000

  29. What Affects Design Effects? • Stratification – can reduce deffs • Clustering – generally increases deffs • Unequal probabilities of selection and Post Stratification leading to unequal weights – increases deff • In complex design the total deff is product of these

  30. General Methodsfor Variance Estimation • Variance formula may already be available • If not, there are several general methods for variance estimation for complex surveys • Linearization • Resampling methods • Balanced repeated replication (BRR) • Jackknife • Bootstrap

  31. Test adjustments • Pearson Chi-Square • Difference between observed and expected freq • Based on SRS, too liberal, esp if obs N is used • May be OK if using effective n • Rao-Scott Chi-Square • Adjusts Pearson for design effects • Uses observed cell proportions to adjust • Modified Rao-Scott / Wald • design correction uses the null hypothesis proportions to adjust • Adjusted F/ Wald log linear • Corrects for test instability w/small number of clusters

More Related