240 likes | 444 Vues
NLSCY - Variance. NLSCY - National Longitudinal Survey of Children and Youth. Objectives of the Presentation - Demonstration. Why is it necessary to compute the variance? How can the variance be computed with NLSCY data?. Why Compute the Variance?.
E N D
NLSCY - Variance NLSCY - National Longitudinal Survey of Children and Youth
Objectives of the Presentation - Demonstration • Why is it necessary to compute the variance? • How can the variance be computed with NLSCY data?
Why Compute the Variance? • NLSCY data come from a probabilistic survey: • Variability associated with estimates produced with data from any probabilistic survey • To make valid inferences about the population of interest, that variability must be measured
Main Difficulty: Complexity of the NLSCY’s Sampling Plan • Two different sample frames used to select the sample: • Labour Force Survey (LFS), itself a survey with a complex sample design • Birth Register • Use of two frames for certain groups (five-year-olds, Cycle 3)
Complexity of the NLSCY’s Sampling Plan (continued) • Children’s selection probabilities very uneven • Non-response adjustments that cross strata boundaries • Empty clusters from the LFS
Effects of the Complexity of the NLSCY’s Sampling Plan • No exact analytical formula for computing the variance because of the complex sample design. • No commercial application can fully take the NLSCY’s complexity into account in computing the variance.
How to Compute the Variance for the NLSCY • 3 solutions: • Approximate sampling variability tables provided in the user’s guide (in the form of coefficients of variation (CVs)). • Approximate CV tables for a number of specific subject areas (Excel spreadsheet). • Use bootstrap weights and SAS program supplied by the NLSCY.
How to Compute the Variance for the NLSCY • Of these 3 solutions: • The first two can be used for exploratory analysis. These 2 methods provide an approximation of the variance • Only the third solution computes the variance “more exactly”
Sampling Variability Tables • Very very limited… • Users’ guide explains how to use them
Approximate CV Tables (Excel) • Brief excerpt:
Approximate CV Tables (Excel) • Let’s go directly to the CV table and take a closer look...
Approximate CV Tables (Excel) • Originally created to answer the question: • Is the Cycle 5 sample size large enough? • Approximation of the exact variance: • Takes the sample design into account by using bootstrap weights. • On the other hand, uses a random variable instead of real variables.
Approximate CV Tables (Excel) • CVs available for many subject areas, for a number of proportions. • Lots of additional information available: • sample size, projected size, confidence interval
Approximate CV Tables (Excel) • Functions: • Can choose areas of interest and obtain an approximate CV. • Possibility of making queries • Example: What subject areas have CVs of less than 25%?
Approximate CV Tables (Excel) • In a nutshell: • Much more detailed than the tables provided in the user’s guide, but … • they can’t replace exact variance calculation • limited number of subject areas
Bootstrap Weights and NLSCY_VES • Computing the variance using the replicate (bootstrap) method: • for longitudinal estimates • for cross-sectional estimates • for all cycles • for all desired subject areas
Bootstrap Weights and NLSCY_VES • SAS program called NLSCY_VES is provided for computing the variance: • set of macros • easy to use, well documented • examples provided • computes variance for totals, ratios, differences between ratios, linear regressions, logistic regressions
Bootstrap Basics • A. Select a subsample from the original sample with replacement • B. For this subsample, calculate the weights as if it were the full sample • Repeat A and B many times (1,000) to produce a set of bootstrap weights
Bootstrap Basics (continued) • For a given estimate: • Calculate the estimate with each set of weights • Calculate the variance of the estimates obtained
Structure of NLSCY_VES • All the macros are in SAS file: NLSCY_VES.sas • No changes needed • Another SAS program calls the macros and allows the user to set the various parameters.
What You Need • The SAS file NLSCY_VES.sas • The SAS program to call the macros • Set of data for which the variance is required • The file of 1,000 bootstrap weights for the appropriate cycle and type of analysis (longitudinal or cross-sectional)
Conclusion • The variance must be computed if we are to make valid inferences • The sample design must be taken into account if we want the variance calculation to be valid. Otherwise, we may draw incorrect conclusions
Conclusion (continued) • The NLSCY provides 3 tools for computing the variance: • tables in the user’s guide (too limited…) • Excel file for many domains (to get an idea) • bootstrap weights (the best approach)