150 likes | 276 Vues
Session 10. Sampling Weights: an appreciation. Session Objectives. To provide you with an overview of the role of sampling weights in estimating population parameters To demonstrate computation of sampling weights for a simple scenario
E N D
Session 10 Sampling Weights:an appreciation
Session Objectives • To provide you with an overview of the role of sampling weights in estimating population parameters • To demonstrate computation of sampling weights for a simple scenario • To highlight difficulties in calculating sampling weights for complex survey designs and the need to seek professional expertise for this purpose • To learn about file merging and continue with the on-going project work
What are sampling weights? • Real surveys are generally multi-stage • At each stage, probabilities of selecting units at that stage are not generally equal • When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population • This scaling-up factor, applied to each unit in the sample is called its sampling weight.
A simple example • Suppose for example, a simple random sampleof 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line • Hence total in population living below the povertyline = (140/500)*7349 =2058 • Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line. • Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer. • i.e. sampling weight for each HH = 14.7
Why are weights needed? • Above was a trivial example with equalprobabilities of selection • In general, units in the sample have very differing probabilities of selection • To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection • Thus sampling weight=(1/prob of selection)
An example • Consider a conveniently rectangular forest witha river running down in the middle, thus dividingthe forest into Region 1 and Region 2. • Region 1 is divided into 96 strips, each 50m x 50m, while Region 2 is divided into 72 strips. • Data are the number of small trees and the number of large trees in each strip. • Aim: To find the total number of large trees, the total number of small trees, and hence the total number of trees in the forest.
Weights in stratified sampling • Each region can be regarded as a stratum: 8strips were chosen from region 1 and 6 from region 2. • Mean number of large trees per strip were: • 97.875 in region 1, based on n1=8 • 83.500 in region 2, based on n2=6 • Hence total number of large trees in the forest can be computed as (96*97.875) + (72*83.5) = 15408 • So what are the sampling weights used for each unit (strip)?
Self-weighting • The sampling weights are the same for all strips, whether in region 1 or region 2. Why is this? • What are the probabilities of selection here? • In region 1, each unit is selected with prob=8/96 • In region 2, each unit is selected with prob=6/72 • A design where probabilities of selection are equal for all selected units is called a self-weighting design. • Regarding the sample as a simple random sample then gives us the correct mean.
Results for means • Easy to see that the mean number of large treesin the forest is [(96/168)*97.875 ] + [(72/168)*83.5] = 91.71 • Regarding the 14 observations as though they were drawn as a simple random sample gives 91.71, i.e. the same answer. • The results for variances however differ • Variance of stratified sample mean=1.28 • Variance of mean ignoring stratification = 2.18
More on weights • Important to note that the weights used incomputing a mean, i.e. • (96/168)*(1/8) = 1/14 for strips in region 1, & • (72/168)*(1/6) = 1/14 for strips in region 2, are not sampling weights • Sampling weights refer to the multiplying factor when estimating a total. • Essentially they represent the number of elements in the population that an individual sampling unit represent.
Other uses of weight • Weights are also used to deal withnon-responses and missing values • If measurements on all units are not availablefor some reason, may re-compute the sampling weights to allow for this. • e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis.
Computation of weights • General approach is to find the probability ofselecting a unit at every stage of the sample selection process • e.g. in a 3-stage design, three set of probabilities will result • Probability of selecting each final stage unit is then the product of these three probabilities • The reciprocal of the above probability is then the sampling weight
Difficulties in computations • Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys • Complex sampling designs are common • Computing correct probabilities of selection can then be very challenging • Usually professional assistance is needed to determine the correct sampling weights and to use it correctly in the analysis
Software for dealing with weights • When analysing data from complex surveydesigns, it is important to check that the softwarecan deal with sampling weights • Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights • However, need to be careful that the approaches used are appropriate for your own survey design Note: Above discussion was aimed at providing you with an overview of sampling weights. See next slide for work of the remainder of this session.
Practical work • To understand how files may be merged, work through sections 10.5 and 10.6 of the Stata Guide. • Now move to your project work and practice file merging to address objectives 4 and 5 of your task. • A description of the work you should undertake is provided in the handout titled Practical 10.