1 / 22

P|E|A|S

P|E|A|S. Practical Exemplars on the Analysis of Surveys Web site to help people analyse surveys Supported by the ESRC research methods programme Authors Gillian Raab, Napier University Susan Purdon, National Centre for Social Research Kathy Buckner, Napier University

raquel
Télécharger la présentation

P|E|A|S

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P|E|A|S • Practical Exemplars on the Analysis of Surveys • Web site to help people analyse surveys • Supported by the ESRC research methods programme • Authors • Gillian Raab, Napier University • Susan Purdon, National Centre for Social Research • Kathy Buckner, Napier University • Iona Waterston, Web designer • http://www.napier.ac.uk/depts/fhls/peas

  2. Summary of this presentation • Background to the project • Our starting point and basic principles • Important concepts in survey design and analysis • Software for survey analysis • Approaches to missing data • What we have learned from the project • Survey methods • Survey software • Missing data challenges • Questions

  3. Starting points (1) • Survey data has special features that need to be considered in the analysis • There is an enormous academic literature on survey analysis • Universities in the UK have less expertise in survey analysis than in North America or Europe • Most of the expertise lies in survey organisations

  4. Starting points (2) • The ESRC makes lots of data available via their survey archive – lots of it from Scotland • Scottish Health Survey • Scottish Household Survey • This investment is to encourage use from e.g. • University researchers • Government departments, • local authorities a • voluntary organisations • But there is limited expertise on how best to analyse survey data

  5. Starting points (3) • Basic statistical theory for analysing sample surveys was developed from the 1950s to the 1970s • Cochrane, Kish, Rao • The methods calculate confidence intervals and standard errors that take account of the survey design • But none of this methodology has found its way into commonly used statistical packages until very recently • STATA – version 8? • SAS version 8 onwards • SPSS version 12 onwards • Splus/R survey packages last two years • More recent methods are also available (especially in STATA and R)

  6. Basic principles of what we present on P|E|A|S • To illustrate how to use these new survey procedures effectively • To help you to use them on your own data • To use them to see how effective the design of the survey has been in getting accurate and precise estimates • Like driving a car • We don’t expect you to understand all the details of how it works • But you do need to know the general principles • How to use the controls effectively • What regular checks you should be doing • What roads you should not be driving down

  7. Survey features • Based on current UK practice by ONS and survey organisations • Weighting • Clustering • Stratification • Each of these has an impact on the results you get from analysing a survey. • Only weighting will affect the estimates • But all three will affect the standard errors

  8. Weighting can make a large difference to answers Smoking rates from the 1998 Scottish Health Survey (ex3)

  9. Weighting • Why do we do it/ need to do it? • To make the sample match the population • Because of selection as part of the design • Different sampling fractions in different areas • Selection of one adult per household • To adjust for non-response • How does it affect the precision of estimates? • It depends on both the weights and the data being analysed • It can help or hurt • If the weights are not related to the data being analysed then it will hurt to have unequal weights

  10. Effect of weighting on standard errors (ex4) • WERS 98 – a survey of workplaces run by the DWP • Stratified by workplace size • Sampling fractions much larger in strata of large workplaces • This is often helps if we want to estimate something like the total numbers of employees with disabilities • But for the proportion of workplaces with an equal opportunities policy it hurts

  11. Stratification • Divide up the sampling frame into strata (e.g. region, type of area) • Take a sample of a fixed number of units from each stratum • Stratification can be either proportionate or disproportionate • Proportionate stratification means that the sample will match the population BETTER than would be expected by chance • So proportionate stratification improves precision • If it is disproportionate weights will be needed to estimate population totals • Disproportionate stratification may help or hurt precision

  12. Clustering • Multi-stage designs very common in government surveys • First a sample of clusters (e.g. post-code sectors) – stage 1 • Then a sample of households within each cluster – stage 2 • If clusters are selected with probability proportional to size and a fixed sample size id taken within each cluster, then no weighting is required • Clustering almost always makes survey estimates less precise

  13. Design effects (1) • A design effect is a ratio that compares the precision of a survey with what would have been achieved from a simple, unclustered, unweighted, unstratified random sample of the same population. • A large design effect is bad • A design effect of 2 means that your effective sample size is only half of the responses you have achieved • Means, proportions, differences between groups, regression coefficients, hazard ratios should all have design effects and chi-squared tests need adjustment by design effects • Design effects are often quite different for subgroups of a sample – often not so bad • And differences between groups are often very different from the overall mean – also often much better

  14. Design effects (2) • Many surveys publish tables of design effects or design factors for key variables, but rarely more than a page of them and almost never for things like differences, • The design factor is just the square root of the design effect • The idea is that you can just do an ordinary analysis and multiply your standard error by the design factor. • This was for the pre-survey-software days • On balance it probably gave standard errors that were too large for a lot of analyses, since people would try to play safe by taking the biggest design effect in the table. • We don’t need to use design effects like that if we use thew correct software • But they are a measure of how well the design has worked to get good answers

  15. To summarise • To get unbiased estimates need to use survey weights. • To get correct standard errors need to take into account survey design, in particular weighting, clustering and stratification. • We can now do this with standard software using survey methods • Survey analysis software can also compare groups, carrying out regression analyses etc

  16. Software for survey analysis • You need a package that will allow for the survey design • Specialist packages (SUDAAN, WESVAR) have been in use for many years • STATA was the first general package with survey methods • SAS, SPSS (add-on) and Splus/R all now • Different ways of getting of describing the survey design • And different capabilities in • Variety of methods • What feedback they give you about what you have done • Warning you when things are not going right • Latest versions of all four packages will cover almost everything you would expect

  17. Non-response • An increasing problem for survey researchers • From Alasdair Crockett :Weighting the Social Surveys (ESDS web site)

  18. Two ways of dealing with it • Post-stratification • Re-weighting the sample so as to match population totals • Gets a new set of weights • Imputation • Fills in the missing values • Different procedures available • Used in censuses (one number) • And most often in longitudinal surveys

  19. Post-stratification • Only as good as the totals you are using for the population • Will only correct non-response bias if the difference between responders and non-responders is explained by the post-stratification factors • Census survey-link scheme informs us about this • It has the potential to improve precision (see slide 12 if time) • Survey firms and ONS are reluctant to use it because it may interrupt time series • But post-stratification of old survey data is also a possibility • Some survey packages will do it for you (R/Splus, STATA add on package for version 8, SAS Calmar macro) • Analysing a survey to take account of post-stratification needs extra tricks (Splus/R and STATA provide them)

  20. Imputation • Most often carried out by census takers using detailed information from the census forms • Usually picking up data from other similar individual households or household members • More recently model based methods have become popular (books by Little and Rubin, Schafer are Bibles) • Very large literature on this now • And many sets of recommendations • e.g. make your imputation model large • Carry out multiple imputations and combine estimates • software to do this is available in Splus/R, STATA and SAS

  21. Our experience • Working with data from the Edinburgh Study of Youth Transitions and Crime (Exemplar 6) • It is tricky to get imputation models right for real data • Things can go horrible wrong especially if models are too big for the data • Its important to check things out • Choice of variables is more important than choice of models • We still have a lot to learn about this • Need to try these methods out on real data, not just simulated data

  22. What we (I) have learned • There is a lot more to know about survey design and analysis and new methods that need to be made available • The literature still does not provide definitive answers to some questions • But a lot of ground rules are well known • Survey software is developing and improving fast • It will do so even more if more people use it and feed back to the providers • Non-response remains an important problem • The jury is out as to whether and when post-stratification weighting, imputation or neither is the best approach to deal with non-response

More Related