Variance Estimation When Donor Imputation is Used to Fill in Missing Values

Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International Conference on Establishment Surveys Montréal, June 18-21, 2007

Overview • Context • Donor imputation • Variance estimation • Simulation study • Conclusion

Context • Population parameter to be estimated : • Domain total: • Estimator in the case of full response: • Calibration estimator • Horvitz-Thompson estimator

Donor Imputation • Imputed estimator : • With donor imputation, the imputed value is • A variety of methods can be considered in order to find a donor l(k) for the recipient k with

Donor Imputation • Two simple examples: • Random Hot-Deck Imputation Within Classes • Nearest-neighbour imputation • Practical considerations that add some complexity to the imputation process: • Post-imputation edit rules • hierarchical imputation classes

Imputation Model • Most imputation methods can be justified by an imputation model: • The donor imputed estimator is assumed to be approximately unbiased under the model:

CurrentVariance Estimation Methods • Assuming negligible sampling fractions • Chen and Shao (2000, JOS) for NN imputation • Resampling methods • Our method is closely related to: • Rancourt, Särndal and Lee (1994, proc. SRMS): Assumes a ratio model holds • Brick, Kalton and Kim (2004, SM): Condition on the selected donors

Imputation Model Approach • Variance decomposition of Särndal (1992, SM): • For any donor imputation method, we have:

Estimation of the nonresponse variance • The estimation of the nonresponse variance is achieved by estimating • Noting that the nonresponse error is: • Then, the nonresponse variance estimator is:

Estimation of the mixed component • Similarly, the estimation of the mixed component is achieved by estimating • The mixed component estimator is: • This component can be either positive or negative and may not always be negligible

Estimation of the sampling variance • Let be the full response variance est. • The strategy consists of • Estimating • Replace by their estimates the unknown • This leads to the sampling variance estimator:

Estimation of the sampling variance • This strategy is essentially equivalent to • Randomly imputing the missing values using the imputation model • Computing the full response sampling variance estimator by treating these imputed values as true values • Repeating this process a large number of times and taking the average of the sampling variance estimates • Similar to multiple imputation sampling variance estimator

Simulation study • Generated a population of size 1000 • Two y-variables: • LIN: Linear relationship between y and x • NLIN: Nonlinear relationship between y and x • Two different sample sizes: • Small sampling fraction: n=50 • Large sampling fraction: n=500 • Response probability depends on x with an average of 0.5

Simulation study • Imputation: Nearest-Neighbour imputation using x as the matching variable • Estimation of • LIN: Linear model in perfect agreement with the LIN y-variable • NPAR: Nonparametric estimation using the procedure TPSPLINE of SAS

Simulation study • Two objectives: • Compare the two ways of estimating • LIN and NPAR • Compare three nonparametric methods: • NPAR • NPAR_Naïve: NPAR with the sampling variance being estimated by the naïve sampling variance (Brick, Kalton and Kim, 2004) • CS : method of Chen and Shao (2000)

Results: Large sampling fraction

Results: Small sampling fraction

Results: Large sampling fraction

Conclusion • Nonparametric estimation of seems beneficial (robust) with Nearest-Neighbour imputation • Our proposed method is valid even for large sampling fractions • It seems to be slightly better to use our sampling variance estimator instead of the naïve sampling variance estimator

Conclusion • Work done in the context of developing a variance estimation system (SEVANI) • Methodology implemented in the next version 2.0 of SEVANI • Estimation of : • Linear model • Nonparametric estimation

Thanks - Merci Jean-François Beaumont Jean-Francois.Beaumont@statcan.ca Cynthia Bocci Cynthia.Bocci@statcan.ca

Variance Estimation When Donor Imputation is Used to Fill in Missing Values

Variance Estimation When Donor Imputation is Used to Fill in Missing Values

Presentation Transcript

Variance Estimation in Complex Surveys

Replacing Missing Values

Working with Missing Values

Multiple Imputation of missing data in longitudinal health records

Least-squares imputation of missing data entries

Missing values problem in Data Mining

Missing Values

Treatment of missing values

Managing Coagulopathy in the Donor: When Clotting is Factor

Variance Estimation in EU-SILC Survey

WHO practices in imputation and estimation

Partial (donor) imputation with adjustments

Missing Values in SAS

Variance Estimation

Special Topic: Missing Values

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse

Variance Estimation in Complex Surveys

Output Analysis: Variance Estimation

WHO practices in imputation and estimation

Comparing Methods of Estimating Missing Values in One Way Analysis of Variance

Used car values