Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Types of Surveys

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Types of Surveys**Cross-sectional • surveys a specific population at a given point in time • will have one or more of the design components • stratification • clustering with multistage sampling • unequal probabilities of selection Longitudinal • surveys a specific population repeatedly over a period of time • panel • rotating samples**Cross Sectional Surveys**Sampling Design Terminology**Methods of Sample Selection**Basic methods • simple random sampling • systematic sampling • unequal probability sampling • stratified random sampling • cluster sampling • two-stage sampling**Simple Random Sampling**Why? • basic building block of sampling • sample from a homogeneous group of units How? • physically make draws at random of the units under study • computer selection methods: R, Stata**Systematic Sampling**Why? • easy • can be very efficient depending on the structure of the population How? • get a random start in the population • sample every kth unit for some chosen number k**Additional Note**Simplifying assumption: • in terms of estimation a systematic sample is often treated as a simple random sample Key assumption: • the order of the units is unrelated to the measurements taken on them**Unequal Probability Sampling**Why? • may want to give greater or lesser weight to certain population units • two-stage sampling with probability proportional to size at the first stage and equal sample sizes at the second stage provides a self-weighting design (all units have the same chance of inclusion in the sample) How? • with replacement • without replacement**With or Without Replacement?**• in practice sampling is usually done without replacement • the formula for the variance based on without replacement sampling is difficult to use • the formula for with replacement sampling at the first stage is often used as an approximation Assumption: the population size is large and the sample size is small – sampling fraction is less than 10%**Stratified Random Sampling**Why? • for administrative convenience • to improve efficiency • estimates may be required for each stratum How? • independent simple random samples are chosen within each stratum**Example: Survey of Youth in Custody**• first U.S. survey of youths confined to long-term, state-operated institutions • complemented existing Children in Custody censuses. • companion survey to the Surveys of State Prisons • the data contain information on criminal histories, family situations, drug and alcohol use, and peer group activities • survey carried out in 1989 using stratified systematic sampling**SYC Design**strata • type (a) groups of smaller institutions • type (b) individual larger institutions sampling units • strata type (a) • first stage – institution by probability proportional to size of the institution • second stage – individual youths in custody • strata type (b) • individual youths in custody • individuals chosen by systematic random sampling**Cluster Sampling**Why? • convenience and cost • the frame or list of population units may be defined only for the clusters and not the units How? • take a simple random sample of clusters and measure all units in the cluster**Two-Stage Sampling**Why? • cost and convenience • lack of a complete frame How? • take either a simple random sample or an unequal probability sample of primary units and then within a primary take a simple random sample of secondary units**Synthesis to a Complex Design**Stratified two-stage cluster sampling Strata • geographical areas First stage units • smaller areas within the larger areas Second stage units • households Clusters • all individuals in the household**Why a Complex Design?**• better cover of the entire region of interest (stratification) • efficient for interviewing: less travel, less costly Problem: estimation and analysis are more complex**Ontario Health Survey**• carried out in 1990 • health status of the population was measured • data were collected relating to the risk factors associated with major causes of morbidity and mortality in Ontario • survey of 61,239 persons was carried out in a stratified two-stage cluster sample by Statistics Canada**OHSSample Selection**• strata: public health units – divided into rural and urban strata • first stage: enumeration areas defined by the 1986 Census of Canada and selected by pps • second stage: dwellings selected by SRS • cluster: all persons in the dwelling**Longitudinal Surveys**Sampling Design**British Household Panel Survey**Objectives of the survey • to further understanding of social and economic change at the individual and household level in Britain • to identify, model and forecast such changes, their causes and consequences in relation to a range of socio-economic variables.**BHPS: Target Population and Frame**Target population • private households in Great Britain Survey frame • small users Postcode Address File (PAF)**BHPS: Panel Sample**• designed as an annual survey of each adult (16+) member of a nationally representative sample • 5,000 households approximately • 10,000 individual interviews approximately. • the same individuals are re-interviewed in successive waves • if individuals split off from original households, all adult members of their new households are also interviewed. • children are interviewed once they reach the age of 16 • 13 waves of the survey from 1991 to 2004**BHPS: Sampling Design**Uses implicit stratification embedded in two-stage sampling • postcode sector ordered by region • within a region postcode sector ordered by socio-economic group as determined from census data and then divided into four or five strata Sample selection • systematic sampling of postcode sectors from ordered list • systematic sampling of delivery points (≈ addresses or households)**Survey Weights: Definitions**initial weight • equal to the inverse of the inclusion probability of the unit final weight • initial weight adjusted for nonresponse, poststratification and/or benchmarking • interpreted as the number of units in the population that the sample unit represents**Interpretation**Interpretation • the survey weight for a particular sample unit is the number of units in the population that the unit represents**Effect of the Weights**• Example: age distribution, Survey of Youth in Custody**Observations**• the histograms are similar but significantly different • the design probably utilized approximate proportional allocation • the distribution of ages in the unweighted case tends to be shifted to the right when compared to the weighted case • older ages are over-represented in the dataset**Survey Data Analysis**Issues and Simple Examples from Graphical Methods**Issues**iid (independent and identical distribution) assumption • the assumption does not not hold in complex surveys because of correlations induced by the sampling design or because of the population structure • blindly applying standard programs to the analysis can lead to incorrect results**Example: Rank Correlation Coefficient**Pay equity survey dispute: Canada Post and PSAC • two job evaluations on the same set of people (and same set of information) carried out in 1987 and 1993 • rank correlation between the two sets of job values obtained through the evaluations was 0.539 • assumption to obtain a valid estimate of correlation: pairs of observations are iid**Scatterplot of Evaluations**• Rank correlation is 0.539**A Stratified Design with Distinct Differences Between Strata**• the pay level increases with each pay category (four in number) • the job value also generally increases with each pay category • therefore the observations are not iid**Correlations within Level**Correlations within each pay level • Level 2: –0.293 • Level 3: –0.010 • Level 4: 0.317 • Level 5: 0.496 Only Level 4 is significantly different from 0**Graphical Displays**first rule of data analysis • always try to plot the data to get some initial insights into the analysis common tools • histograms • bar graphs • scatterplots**Histograms**unweighted • height of the bar in the ith class is proportional to the number in the class weighted • height of the bar in the ith class is proportional to the sum of the weights in the class**Body Mass Index**measured by • weight in kilograms divided by square of height in meters • 7.0 < BMI < 45.0 • BMI < 20: health problems such as eating disorders • BMI > 27: health problems such as hypertension and coronary heart disease**Bar Graphs**Same principle as histograms unweighted • size of the ith bar is proportional to the number in the class weighted • size of the ith bar is proportional to the sum of the weights in the class