230 likes | 448 Vues
Training Workshop on the ICCS 2009 database Weighting and Variance Estimation. picture. 1. Content of this presentation. Analyzing weighted data Standard errors What are they? Why do we need them? How do we estimate them?. What are sampling weights ?.
E N D
Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture 1
Content of this presentation • Analyzing weighted data • Standard errors • What are they? • Why do we need them? • How do we estimate them?
Whataresamplingweights? • Values assigned to all sampling units • Weighted results from the sample can be generalized for the whole population • Weights allow unbiased estimates of population parameters • Based on the sample selection probabilities • Applied at each sampling stage • Adjusted to correct for non-response • Applied at each sampling stage
Weights in ICCS • The ICCS Data contain several weight variables • Total Student Weight: TOTWGTS • Total Teacher Weight: TOTWGTT • Total School Weight: TOTWGTC • The IDB Analyzer automatically selects the correct weight
1:10 1:1 Analyzing weighted data – a simple example
Example using ICCS data • Civic knowledge score in an ICCS country Unweighted: average of 493.83
Example using ICCS data • Difference: 10.1 score points • Reason for the difference: over-sampling of students in private schools • 13.7% of the tested students • 5.9% of the sum of weights
What are standard errors? • The standard error of an estimate is the standard deviation of the sampling distribution associated with it • The sampling distribution is the distribution of the statistic for all possible samples of the same size and method • Since we do not select all possible samples, we can only estimate the standard error
What are standard errors good for? • The ICCS results are based on samples • All ICCS results are therefore estimates of unknown population values • Standard errors can be used to measure how close these estimates are to the real values
Confidence Intervals • Let ε stand for any statistic of interest • A 95% confidence interval is defined as • This is the black bar in Table 3.4 • With a confidence of 95%, the true mean is between 554.3 and 563.7... • Take rounding into account!
Estimating standard errors • In a simple random sample, estimating the standard error of a mean x is easy • Just divide the standard deviation of the sample (s) by the square root of the sample size (n) • In a complex sample design like in ICCS, it is not as easy to estimate the standard error as in a simple random sample ^
Complex sample design • Clustered sample • students within a school are more similar to each other than students from different schools • Stratification • usually increases sampling precision • Weights • complicate the calculations
Why not just use SPSS? • Standard software packages like SPSS will not give correct estimates for standard errors • The software assumes that the data is from a simple random sample, and uses the incorrect formula • Generally, the estimate will be too small
Jackknife Repeated Replication • Solution: Jackknife Repeated Replication (JRR) • Used for estimating standard errors in complex designs • Basic idea: systematically re-compute a statistic on a set of replicated samples • setting the weights to zero for one school at a time • while doubling the weights of another school • Estimate the variability of that statistic from the variability of that statistic between the full sample and the replicates
The JRR in ICCS • Jackknife variance estimation in ICCS • Participating schools are paired according to the order in which they were sampled • These school pairs are called jackknife zones –JKZONES (JKZONET, JKZONEC) • One school in each zone is randomly assigned an indicator of 1 (0 for the other school) – JKREPS (JKREPT, JKREPC) • This indicator decides whether a school gets its replicate weight doubled or zeroed
Example using ICCS data • Standard error of the teacher age • SPSS just can‘t do that
SE and plausible values For ICCS achievement data, the standard error consists of two components • Sampling error • this is what we just discussed • Addtionally: measurement error • resulting from the use of plausible values • This is the topic of the next presentation
Conclusion Use the sampling weights! Compute standard errors using the JRR!