Variance Estimation in Complex Surveys

Variance Estimation in Complex Surveys Drew Hardin Kinfemichael Gedif

So far.. • Variance for estimated mean and total under • SRS, Stratified, Cluster (single, multi-stage), etc. • Variance for estimating a ratio of two means under • SRS (we used linearization method)

What about other cases? • Variance for estimators that are not linear combinations of means and totals • Ratios • Variance for estimating other statistic from complex surveys • Median, quantiles, functions of EMF, etc. • Other approaches are necessary

Outline • Variance Estimation Methods • Linearization • Random Group Methods • Balanced Repeated Replication (BRR) • Resampling techniques • Jackknife, Bootstrap • Adapting to complex surveys • ‘Hot’ research areas • Reference

Linearization (Taylor Series Methods) • We have seen this before (ratio estimator and other courses). • Suppose our statistic is non-linear. It can often be approximated using Taylor’s Theorem. • We know how to calculate variances of linear functions of means and totals.

Linearization (Taylor Series Methods) • Linearize • Calculate Variance

Linearization (Taylor Series) Methods • Pro: • Can be applied in general sampling designs • Theory is well developed • Software is available • Con: • Finding partial derivatives may be difficult • Different method is needed for each statistic • The function of interest may not be expressed a smooth function of population totals or means • Accuracy of the linearization approximation

Random Group Methods • Based on the concept of replicating the survey design • Not usually possible to merely go and replicate the survey • However, often the survey can be divided into R groups so that each group forms a miniature versions of the survey

Stratum 1 1 2 3 4 5 6 7 8 Stratum 2 1 2 3 4 5 6 7 8 Stratum 3 1 2 3 4 5 6 7 8 Stratum 4 1 2 3 4 5 6 7 8 Stratum 5 1 2 3 4 5 6 7 8 Treat as miniature sample Random Group Methods

Unbiased Estimator (Average of Samples) • Slightly Biased Estimator (All Data)

Random Group Methods • Pro: • Easy to calculate • General method (can also be used for non smooth functions) • Con: • Assumption of independent groups (problem when N is small) • Small number of groups (particularly if one strata is sampled only a few times) • Survey design must be replicated in each random group (presence of strata and clusters remain the same)

Resampling and Replication Methods • Balanced Repeated Replication (BRR) • Special case when nh=2 • Jackknife (Quenouille (1949) Tukey (1958)) • Bootstrap (Efron (1979) Shao and Tu (1995)) • These methods • Extend the idea of random group method • Allows replicate groups to overlap • Are all purpose methods • Asymptotic properties ??

Balanced Repeated Replication • Suppose we had sampled 2 per stratum • There are 2H ways to pick 1 from each stratum. • Each combination could treated as a sample. • Pick R samples.

Balanced Repeated Replication • Which samples should we include? • Assign each value either 1 or –1 within the stratum • Select samples that are orthogonal to one another to create balance • You can use the design matrix for a fraction factorial • Specify a vector ar of 1,-1 values for each stratum • Estimator

Balanced Repeated Replication • Pro • Relatively few computations • Asymptotically equivalent to linearization methods for smooth functions of population totals and quantiles • Can be extended to use weights • Con • 2 psu per sample • Can be extended with more complex schemes

The JackknifeSRS-with replacement • Quenoule (1949); Tukey (1958); Shao and Tu (1995) • Let be the estimator of  after omitting the ith observation • Jackknife estimate • Jackknife estimator of the • For Stratified SRS without replacement Jones (1974)

The Jackknifestratified multistage design • In stratum h, delete one PSU at a time • Let be the estimator of the same form as when PSU i of stratum h is omitted • Jackknife estimate: • Or using pseudovalues

The Jackknifestratified multistage design • Different formulae for • Where • Using the pseudovalues

The JackknifeAsymptotics • Krewski and Rao (1981) • Based on the concept of a sequence of finite populations with L strata in • Under conditions C1-C6 given in the paper Where method is the estimator used (Linearization, BRR, Jackknife)

The BootstrapNaïve bootstrap • Efron (1979); Rao and Wu (1988); Shao and Tu (1995) • Resample with replacement in stratum h • Estimate: • Variance: • Or approximate by • The estimator is not a consistent estimator of the variance of a general nonlinear statistics

The BootstrapNaïve bootstrap • For • Comparing with • The ratio does not converge to 1for a bounded nh

The BootstrapModified bootstrap • Resample with replacement in stratum h • Calculate: • Variance: • Can be approximated with Monte Carlo • For the linear case, it reduces to the customary unbiased variance estimator • mh < nh

More on bootstrap • The method can be extended to stratified srs without replacement by simply changing • For mh=nh-1, this method reduces to the naïve BS • For nh=2, mh=1, the method reduces to the random half-sample replication method • For nh>3, choice of mh …see Rao and Wu (1988)

SimulationRao and Wu (1988) • Jackknife and Linearization intervals gave substantial bias for nonlinear statistics in one sided intervals • The bootstrap performs best for one-sided intervals (especially when mh=nh-1) • For two-sided intervals, the three methods have similar performances in coverage probabilities • The Jackknife and linearization methods are more stable than the bootstrap • B=200 is sufficient

‘Hot’ topics • Jackknife with non-smooth functions (Rao and Sitter 1996) • Two-phase variance estimation (Graubard and Korn 2002; Rubin-Bleuer and Schiopu-Kratina 2005) • Estimating Function (EF) bootstrap method (Rao and Tausi 2004)

Software • OSIRIS – BRR, Jackknife • SAS – Linearization • Stata – Linearization • SUDAAN – Linearization, Bootstrap, Jackknife • WesVar – BRR, JackKnife, Bootstrap

References: • Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of statistics 7, 1-26. • Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters using sample surveys. Statistical Science, 17, 73-96. • Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Properties of linearization, jackknife, and balanced replication methods. The annals of statistics. 9, 1010-1019. • Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical Statistics 20, 355-375. • Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex survey data. JASA, 83, 231-241. • Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation under stratified multistage sampling. Communications in statistics. 33:, 2087-2095. • Rao, J. N. K., and Sitter, R. R. (1996). Discussion of Shao’s paper.Statistics, 27, pp. 246–247. • Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005). On the two-phase framework for joint model and design based framework. Annals of Statistics (to appear) • Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag. • Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics. 29:614. Not referred in the presentation • Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Verlag. • Shao, J. (1996). Resampling Methods in Sample Surveys. Invited paper, Statistics, 27, pp. 203–237, with discussion, 237–254.

Variance Estimation in Complex Surveys