300 likes | 404 Vues
Learn about hierarchical data structures in ESDS Government datasets, analyze at multiple levels, and use SPSS and Stata for manipulation. Identify units and levels for better analysis.
E N D
Using the hierarchy of the government surveys Jo Wathan Centre for Census and Survey Research Economic and Social Data Service (Government Data)
In this section… • What is hierarchical data? • What is the research purpose of hierarchical data? • What hierarchy is available in ESDS Government datasets? • Working with hierarchy in SPSS and Stata • Practical exercise ESDS Using Hierarchy: v.12/04
What is hierarchy? • Data which can be analysed at more than one level, where smaller levels are nested within higher levels • Most commonly seen in the form of household data, where information is collected on all individuals within the household • Data contains a variable indicating which household an individual lives in • Data can be analysed at the household level or the individual level • Often possible to analyse at the family level too • Other forms of hierarchy available, eg. Sub-individual level (e.g. information per hospital stay, per crime reported) ESDS Using Hierarchy: v.12/04
Compared with flat files… • Contextual information may be present, e.g. individual asked about size of household but: • Information collected from only one level • Not usually appropriate to use data at other levels • Not usually possible to create additional derived variables at other levels • E.g. information collected from one individual within household ESDS Using Hierarchy: v.12/04
Hierarchical data: conceptually ESDS Using Hierarchy: v.12/04
More complex hierarchy… ESDS Using Hierarchy: v.12/04
What does the data look like?Flattened data (GHS) ESDS Using Hierarchy: v.12/04
What does the data look like (2)Multiple tables (FES) Household.por Jobmain.por ESDS Using Hierarchy: v.12/04
Use the hierarchy to… • Better describe the household • Describe the household context of an individual • Look at intra-household differences (& sameness) ESDS Using Hierarchy: v.12/04
Describing the household e.g. Is the household deprived / in poverty? • Equivalising income (e.g. FRS) • Need information on total income (all members not just Household Reference Person) • Need information on household composition • Identifying workless households • E.g. Gregg and Wadsworth (1999) ESDS Using Hierarchy: v.12/04
Source: Richard Dickens, Paul Gregg and Jonathan Wadsworth (2000) ‘New Labour and the Labour Market, CMPO Working Paper Series 00/19 Table 5 ESDS Using Hierarchy: v.12/04
The effect of partnership on employment (mothers) ESDS Using Hierarchy: v.12/04
Ethnic homogeneity -% hhold members in same ethnic group as HOH Source 1991 Household SAR ESDS Using Hierarchy: v.12/04
Hierarchy in some key datasets ESDS Using Hierarchy: v.12/04
Main Levels • Household • group who have the accommodation as their only or main residence and who either share one meal a day or share the living accomodation. • Useful for coresidence or policy related issues • Family Unit • An individual plus partner plus any unmarried children • The census definition of family unit excludes single childless individuals • Useful for identifying partnership and parenthood relationships • Benefit Unit • Adult children in separate unit from parents • Useful when considering income and benefits • Check your definitions (despite harmonisation) ESDS Using Hierarchy: v.12/04
Identifying the units • You will need a unique identifier for the unit at each level • Several variables may be needed to be used in combination • You may need to compute a unique identifier • Will need to read the documentation to assess this ESDS Using Hierarchy: v.12/04
Straightforward: GHS 00-01 • To identify a household use HSERIAL • To identify an individual within the household use PERSNO • To identify a family unit use FSERIAL • To identify a family unit within a household use AFAM • To identify the household reference person test for PERSNO = HRP (HRP gives the person no. for the HRP) • Similarly to locate the Family Unit head test for FUH=PERSNO ESDS Using Hierarchy: v.12/04
Complex e.g. QLFS 2003 • If interested in using household information use the Household File • Information about identifiers is in the read file • Household identifier is Remserno – however this is not present in all LFS datasets • To compute use: • Week x 10000000 + • W1yr x 1000000 + • Qrtr x 100000 + • Add x 1000 + • Wafnd x 100 + • Hhd • This has to be used together with either CASEID or QUOTA (which are identical) – could combine this with Remserno to derive an easier to use household ID • To identify a person in the household use person ESDS Using Hierarchy: v.12/04
Working with hierarchical data • Which level should I analyse at? • Manipulating data in SPSS • Menu driven approach • Syntax • Manipulating data in Stata ESDS Using Hierarchy: v.12/04
Working with hierarchy in SPSS • SPSS is not good at data manipulation! • To generate a household variable from individual data need to use the aggregate command. • Aggregate command creates a household level file, with: • 1 case per household • Contains the household ID variable specified plus any aggregate variables defined • Slow, memory intensive, unnecessarily complicated compared with some other packages… ESDS Using Hierarchy: v.12/04
Aggregation at the household level • You can work at the level of the household • Use the aggregate outfile • Remember to carry across other household level variables that you will need into the aggregate file as part of the aggregate procedure • Or match the household level variable back to the original individual level dataset… ESDS Using Hierarchy: v.12/04
Aggregate and match back to individual file • Usually it is best to match back your aggregated variable to the master file • the household variable is distributed to each individual • you can then select on household head, family head to work at level of household or family • Or you can link information about the household to the individual ESDS Using Hierarchy: v.12/04
SPSS syntax used *compute a variable which is a low value, but which takes the (higher) value for health when respondent is hrp. compute hlthrep = -9. if (reltohrp = 1) hlthrep = health. crosstabs hlthrep by health by reltohrp. sort cases hid. aggregate outfile = "c:\work\esds\aggfile.sav" /break hid /nperhh = n(hid) /oldest = max(age) /hrphlth = max(hlthrep). execute. match files /file = * /table = "c:\work\esds\aggfile.sav" /by hid. execute. ESDS Using Hierarchy: v.12/04
Working with hierarchy in Stata • Stata much better at data manipulation than SPSS • Not necessary to create an additional file • Simply run the appropriate procedure for each household separately • Sort the data by the household identifier first • Use the by household identifier subcommand ESDS Using Hierarchy: v.12/04
The equivalent Stata commands: sort hid egen nperhh = count(hid), by (hid) egen oldest = max(age), by (hid) gen hlthrep = -9 replace hlthrep=health if (reltohrp == 1) egen hrphlth = max(hlthrep), by (hid) ESDS Using Hierarchy: v.12/04
Some issues… • Is the data representative for your choice of unit? • Looking at individuals in a household survey will generally omit individuals not living in households • Weighting may be necessary to counteract survey design • If the survey was not designed to analyse using the units you use, will it still be representative? • Will there be any clustering effects? • Individuals within households will be more alike than individuals in general • This could affect the accuracy of the estimates ESDS Using Hierarchy: v.12/04