1 / 104

Large-scale Microdata workshop: An introduction to the SARs and ESDS Government Surveys

Large-scale Microdata workshop: An introduction to the SARs and ESDS Government Surveys. University of Plymouth 15 April 2005 Jo Wathan & Reza Afkhami. Today. SARs: Introduction to 2001 Individual Licensed SARs Hands-on: Accessing the SARs in Nesstar Lunch 12:30

kail
Télécharger la présentation

Large-scale Microdata workshop: An introduction to the SARs and ESDS Government Surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-scale Microdata workshop:An introduction to the SARs and ESDS Government Surveys University of Plymouth 15 April 2005 Jo Wathan & Reza Afkhami

  2. Today SARs: • Introduction to 2001 Individual Licensed SARs • Hands-on: Accessing the SARs in Nesstar • Lunch 12:30 • Working with the Individual Licensed SAR: Data quality and analysis issues • Hands-on: The SARs in SPSS • Coffee 14: 45 • Further SARs Issues – CAMs, Household data, SAMs files, User support ESDS Government Data End 16:00

  3. Introduction to the 2001 Licensed Individual SAR Background to data development Licensing Accessing the data

  4. Census Microdata • Census outputs have historically been aggregate tables – safe but inflexible • Can be obtained from: • ONS: http://www.statistics.gov.uk • Casweb: http://www.census.ac.uk/cdu • Well suited to analyses at small geographical detail • Microdata permits more flexibility • Longitudinal Survey links data from 1971 good for process but has to be securehttp://www.celsius.lshtm.ac.uk/ • Demand for a cross-sectional dataset that can be used on own desktop

  5. The 1991 Samples of Anonymised Records • Available for the first time after research into the confidentiality risk • Two samples • Individual SARDetailed geog (large LAs)2% Sample • Household SARHierarchical, linked individuals- Detailed occupational information1% Sample

  6. The Request for the 2001 Individual SAR • Request sent in autumn 2001 • Following consultation with users and confidentiality assessment, we asked for similar detail as 1991, e.g: • 16 categories of ethnic group (or national equivalent) • SOC 2000 minor (81 categories) • But with a 3% sample and more LADs • ONS greater concerns over confidentiality • ‘Controlled Access Microdata Sample’ more detailed available in safe setting

  7. Safe Data • Subject to extensive disclosure control • Broad banding • Special uniques analysis • Further recodes • Less detail than 1991 on: • Geography • Industry/occupation • Age • Country of birth • Released October 2004

  8. Second version of SARs • ONS reconsidered confidentiality of SARs • Current version of data is version 2: contains more detail than version 1 • Users must undertake to destroy version 1 before downloading version 2

  9. Licensed file content - geographical • Regional Geography • GOR Region PLUS • Inner/Outer London • Northern Ireland • Scotland • Wales • Country of birth • 16 categories • Increased from version 1

  10. Licensed file contents: demographic • Age banded v.2 • Individual year to 15 • 16-19; 20-24; 25-29; 30-44; • 45-59; 60-64; 65-69; • 70-74; 75-94 single years; 95+ • Ethnic group v.2 • 16 categories (E and W) • 14 Scotland • 2 N. Ireland

  11. Licensed file content:Socio-economic • Occupation • 2000 SOC Minor categories • NS-SEC 38 valid categories • Industry • 15 categories A-O, P, Q • Hours of work – single hours to 80+

  12. New or Improved Data • Improved highest qualification • 4 categories • Religion – varies considerably by nation v.2 • 9 categories in England and Wales • 7 in Scotland – current only • 7 in Northern Ireland, plus religion brought up in • General health • Good / fairly good / not good • Caring • Hours caring, 3 bands • Number of carers in household

  13. Research value • Ability to recode variables as wished • Ability to select populations and variables • Ability to conduct multivariate analysis • Learning and Teaching • Preliminary work before using in-house file (CAMS)

  14. The Licence • All users need to be licensed • Academics complete license as part of the Census Registration System Process • Non-academic users sign license as part of the data registration process • Cannot pass the data to an unlicensed user • Cannot attempt to identify an individual

  15. The licence – good practice • Keep your data password protected • Destroy your data when you have finished using it • Remove SAR files before passing on your PC to someone else • Tell CCSR about your publications • Tell CCSR if you leave your institution

  16. Access Arrangements • Data distributed by CCSR • Academics, no charge • Register for the data under Census Registration System • Access the data online from CCSR website • Non-academics • Not for profit £500 per file • Business users £1000 per file • 10 users per application, incl. software • Download End User License from web

  17. Accessing the data • Non-academic users • Data available in NSDstat • Other formats available on CD • Can arrange direct download • Academic users • Direct download (SPSS/Stata/tab delimited) • Nesstar, explore online and subset (wider range of formats available) • NSDstat available

  18. Working with the 2001 Licensed Individual SAR Coverage and quality SAR data issues Analysing SAR data Software

  19. Census coverage • Major effort to improve coverage in 2001 • One Number Census • Use of large Census Coverage Survey to correct census results, 300K households • Design independent of census; • Used matched census and CCS data to estimate total population in each area, • adjusted all results for census non-response using imputation of households and individuals • Results in final database for UK adjusted for non-response

  20. Census coverage • Coverage before imputation: • 94% households returned forms, with another 4% estimated to be in households identified by enumerators. • Response rate lowest for • Young people in their early 20s (men aged 20-24 resp. rate of 87%) • Inner London (resp rate of 78%) • Once imputed cases are included estimated to be 100% coverage

  21. Population base • One population base: usual residents • differs from 1991 when user had to chose either present or usual resident base • Students enumerated at term time address • Communal establishments are included

  22. Implications for 2001 SARs • 1991 SARs selected from 10% sample • Did not include imputed households • 96% coverage • 2001 SARs selected from 100% ONC database • 94% response; 6% imputed • Imputed individuals/hholds are identified • Imputed items are flagged

  23. Two kinds of imputation • Entire individual or household may be imputed as part of ONC • Complete records copied from enumerated individuals/hhold • Variable oncperim • Variables imputed when information missing

  24. Edit • 13.7 million edit procedures undertaken • 28% population had 1+ items imputed • Common: • Missing prof quals set to none • Carer set to no where missing (unless economic activity also missing) • Travel to work set to ‘work mainly at/from home’ where workplace was ‘mainly at/from home • Others • 14k people multi-ticked ‘sex’ (so imputed) • 6k children had marital status changed to single • impossible values set to missing then imputed • Missing values are imputed on the basis of similar local cases • does not remove unlikely values

  25. Item imputation For census output database as a whole: • One or more items imputed for 28% of the population • Employment variables most affected: • Industry ever worked: 18% • Occupation ever worked: 14% • Workplace size: 9% • Under-enumerated groups are most imputed, esp. single people

  26. Can I tell what/who has been imputed? • Oncperim records whether an individual has been imputed as part of the ONC • Copies entire record from census database • ‘z’ variables identify whether individual has imputed information on a specific variable • Parallel set of variables • zethew, zage0

More Related