1 / 121

Applied Sampling [ Notes based on Graham Kalton’s Sage Publication and Prof. Jim Lepkowski’s Lecture Notes ]

Applied Sampling [ Notes based on Graham Kalton’s Sage Publication and Prof. Jim Lepkowski’s Lecture Notes ]. Partha Lahiri Joint Program in Survey Methodology University of Maryland, College Park . 1. Course Overview. Design perspective Historical perspective Population perspective.

monita
Télécharger la présentation

Applied Sampling [ Notes based on Graham Kalton’s Sage Publication and Prof. Jim Lepkowski’s Lecture Notes ]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applied Sampling[Notes based on Graham Kalton’s Sage Publication and Prof. Jim Lepkowski’s Lecture Notes] Partha Lahiri Joint Program in Survey Methodology University of Maryland, College Park

  2. 1. Course Overview Design perspective Historical perspective Population perspective

  3. A. Design Perspective • Experiments: control (C) or use of randomization (R) for disturbing variables D • Quasi-experimental observational studies • Survey samples

  4. B. Historical Perspective • Late 19th century: • Complete enumeration (census) • Monography (purposive selection) • Kaier (1895) proposed the representative method • Eventually known as the sample survey method

  5. Mathematics and Sampling • Bowley (1906) proposed equal chance selection through randomization • Neyman (1934) eventually provided a complete theory for inference • Two general strategies evolved : • Purposive selection (the representative method) • Probability sampling (chance or randomization theory method)

  6. Elements of Sample Surveys • Randomization inference  • Representativeness • Finite populations • Large samples • Chance selection: equal/epsem • Stratification to improve precision and administrative control • Clustering

  7. C. Population Perspective • Target or ideal population • Survey population • Sampling frame ┌─────────────────────────────────────────────────────────┐ │Target │ │ ┌───────────────────────────────────────────────────┐ │ │ │Survey │ │ │ │ ┌────────────────────────────────────────────┐ │ │ │ │ │Frame │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌───────────────────────────────┘ │ │ │ └───│────────────|──────────────────────────────────┘ │ └───────|────────────|────────────────────────────────────┘ |_ _ _ _ _ _ |

  8. Survey Sampling Design Typology

  9. Population • N elements labeled i = 1, 2, ... , N • Characteristic denoted Y1, Y2, Y3, ... , YN

  10. Sample • Sample n elements from N,i = 1, 2, . . . , n • Values denoted as y1 , y2 , . . . , yn

  11. Sampling Distributions: Population Elements ──┐ │ ─┤ │ ─┤ │ ─┤ │ ─┤ │ ──┤ │ ─┤ │ ─┤ │ ─┤ │ ─┤ │ ──┼────┬────┬────┬────┬────┬────┬────┬────┬─────┬────┬────┬────┬────┬──── │ │ │ │ │ │ │ │ │ │ │ │ │ │ 5 10 15 20 25 30 35 40 45 50 55 60 65 Income ($1,000's)

  12. Sampling Distributions: Sample Means ──┐ │ ─┤ │ ─┤ │ ─┤ │ ─┤ │ ──┤ │ ─┤ │ ─┤ │ ─┤ │ ─┤ │ ──┼────┬────┬────┬────┬────┬────┬────┬────┬─────┬────┬────┬────┬────┬──── │ │ │ │ │ │ │ │ │ │ │ │ │ │ 5 10 15 20 25 30 35 40 45 50 55 60 65 Income ($1,000's)

  13. 2. Simple Random Sampling Implementation Inference Sample size determination

  14. A. Implementation • For a population of N, select a sample of n • Random selection using mechanical device • Employ a table of random numbers, using labels i to identify selected units • Not haphazard • Base procedure • SRS – without replacement • SRS seldom used for selection, even from a simple frame • All samples of n distinct elements equally likely

  15. Sample Estimates

  16. Sampling Variance

  17. Population Total • Inflate the sample total to the population:

  18. Proportions • For Yi = 1 or 0:

  19. B. Inference • From a single sample of size n, estimate variability of sample mean across all possible samples of size n: • Standard Error Estimate: • (1 - ) × 100% confidence interval:

  20. C. Sample Size • Sampling error depends very little on f (unless greater than 0.05) • Same precision for n = 1,000 in College Park or the People's Republic of China • Sample size determination in SRS • Specify desired level of precision • Obtain value for element variance • Solve for n

  21. Computation • Desired precision: • Solving for n: • Ignoring the fpc: • Use a guess value for

  22. Illustration: Relative Precision • The desired level of precision can be specified by relative precision: • For example, suppose for P = 0.10 • Obtain sample size as before.

  23. 3. Cluster Sampling Clusters and survey costs Variance of the mean Design effect Subsampling clusters roh and sample size Homogeneity and roh Portability of roh Subsample size

  24. A. Clusters and survey cost • Populations often distributed geographically • Cannot afford to create an element frame • Cannot afford to visit n units drawn randomly from the entire area • Cluster selections are used to reduce costs • Select clusters and list elements only for selected clusters • Clusters naturally occurring units: • Seldom equal size

  25. B. Variance of the mean

  26. Clustered population

  27. Notation

  28. Element sample

  29. Cluster sample

  30. Computing formulas

  31. C. Design effect

  32. Features of • (although generally) • If -- the cluster sample is the equivalent of SRS of size • If Deff = B and • The cluster sample is equivalent to an SRS of a elements

  33. Source and nature of  • For human populations in naturally occurring clusters, factors include • Environment (exposure to infectious disease) • Self-selection (poor households in same block) • Interaction (shared attitudes among neighbors) • The size depends on • Characteristic Y (disease status, age) • Nature of clusters (naturally formed) • Size of cluster (blocks, census tracts, counties)

  34. Estimation of  • Proper estimation, especially for multistage stratified samples, is cumbersome • It is useful for design to estimate in a straightforward fashion • Synthetic estimate

  35. Sample size considerations • Implications of > 0 for sample size: • Compute an SRS sample size • For B and (“guesstimate?”), compute • Design effect and confidence intervals: • With (a - 1) d.f. • or

  36. D. Subsampling clusters • Select b < B elements from a clusters epsem : first-stage sampling rate second-stage sampling rate overall sampling rate -SRS of clusters and SRS of elements within cluster • This two-stage cluster sampling design is epsem • An unbiased estimator of population mean:

  37. Variability of

  38. Two-stage cluster sample

  39. Estimation of variance

  40. Approximation • If a/A is negligible, then reduces to This approximation avoids the need to compute • There is an alternative conceptual explanation for using a reduced or simplified variance estimate ...

  41. E. roh and sample size • Effective sample size: size of SRS that would give the same precision as the cluster sample • neff= n/deff • deff = 2.0 and n=1500 then neff = 1500/2 = 750 • Varying subsample size for fixed n changes the number of clusters, deff, and the effective sample size

  42. Effective sample size • Cluster sample with n = 2000, and roh=0.03. • For b = 10, deff = 1 + (10 - 1) (0.03) = 1.27 • neff = 2000/1.27 = 1540 • For b = 20, deff = 1 + (20 - 1)(0.03)= 1.57 • neff= 2000/1.57 = 1250 • For b = 50, deff = 1 + (50 - 1)(0.03) = 2.47 • neff= 2000/2.47 = 800

  43. F. Homogeneity and roh • Sample 10 school classrooms from 1,000 • Each classroom has exactly 24 children • Alternative 1: sample characteristic is dichotomy • Here the intra-class correlation roh = 0.088 • Moderate amount of homogeneity within clusters • Actual sample size n = 240 • Effective sample size:

  44. Nearly Perfect homogeneity • Homogeneity within, heterogeneity among:

  45. Perfect heterogeneity • Heterogeneity within, homogeneity among: • = undefined (total population)

  46. Estimation Design G. Portability of roh

  47. Illustration • Crime victimization survey in a large public housing project • Sample apartments • Responsible adult interviewed about victimizations occurring to any member of the household • A = 400 floors with exactly B = 15 apartments • a = 10 floors selected by SRS • b = 5 apartments selected by SRS

  48. Sample results Sample floor HH’s “touched” by crime 1 4 2 4 3 5 4 5 5 3 6 1 7 0 8 1 9 2 10 1

  49. Estimation

More Related