1 / 31

Introduction to Stata using the Northern Ireland Household Panel Survey (NIHPS)

Introduction to Stata using the Northern Ireland Household Panel Survey (NIHPS). Katrina Lloyd (QUB) Patricia McKee (UU). Format. 9:15 Intro to NIHPS 9:30 Intro to Stata 10:30 – 11:00 Coffee break 11:00 Stata files – log / do Advantages of Stata 12:30 Questions / examples. NIHPS.

domani
Télécharger la présentation

Introduction to Stata using the Northern Ireland Household Panel Survey (NIHPS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Statausing the Northern Ireland Household Panel Survey (NIHPS) Katrina Lloyd (QUB) Patricia McKee (UU)

  2. Format • 9:15 Intro to NIHPS • 9:30 Intro to Stata • 10:30 – 11:00 Coffee break • 11:00 Stata files – log / do Advantages of Stata • 12:30 Questions / examples

  3. NIHPS • NIHPS began in 2001 and is an extension of the BHPS (1991) • ISER at Essex University has overall responsibility for survey • NISRA carries out fieldwork in NI • 6 waves of NIHPS data available from UK Data Archive (2001-2006)

  4. NIHPS • NIHPS follows representative sample of individuals • Household-based interviewing: • All adults aged 16+ • From Wave 4 all children aged 11-15 (Youth Panel) • Unique value is that NIHPS measures change at the individual level

  5. NIHPS • Achieved sample (full interviews all years) • Wave 1 - 3,458 individuals in 1,978 households • Wave 2 - 2,692 individuals • Wave 3 - 2,414 individuals • BY Wave 6 - 2,151 individuals • Attrition

  6. Content of the NIHPS • NIHPS has 3 components: • Core component asked every year • Includes health, housing, finances • Rotating core component – every 3 years • Includes wealth, assets and debt, parenting • Variable component – once in the panel • Includes race, place of birth, age left school

  7. NIHPS datasets • Cross-sectional files for each wave • Longitudinal files for individuals • Files linked by common variables • PID (unique Personal Identification Number) • wHID (Household ID – changes year on year) • wPNO (person number – changes year on year) • w refers to the wave id k,l,m,n,o,p - years 2001-2006 respectively

  8. NIHPS Record Type Record Description wHHSAMP - household-level data for issued households wHHRESP - household-level data for responding households wINDSAMP - individual-level data for issued households wINDALL - enumerated individuals' data (including children and non- respondents)

  9. NIHPS Record Type Record Description wINDRESP - individual-level data for respondents wEGOALT - relationship of each individual in a household wINCOME - income and payment data wJOBHIST - information from the employment history

  10. NIHPS additional files wMARRIAG - one record for each reported legal marriage wCOHABIT - one record for each cohabitation spell outside marriage wCHILDAD - information about adopted and/or step- children wCHILDNT - information about natural children wCHILD - information on children and parenting styles wYOUTH - responses to the Young persons questionnaire wLIFEMST - information about employment status spells

  11. NIHPS additional files For ALL Waves XWAVEID - information for matching individuals between waves XWLSTEN - information on the latest known sample status of individuals XWAVEDAT - central source of data on individuals which is fixed and only measured once in the panel e.g. race

  12. Files using today: wINDALL

  13. Stata windows Previous commands Results Variables Commands

  14. Edit Preferences Click on Edit tab Come down to preferences Select general preferences

  15. LOG files – record your session • Start • Either click icon or select File > Log > Begin • Types • .smcl = Stata formatted • .log = a text file or ASCII file  • Choices • View existing file • Append new to old • Overwrite with new • Closure • When you exit • Choose to suspend / resume

  16. Log file Choose folder LOG file Give filename Choose type LOG Note : if a log file is on the name appears below results and above commands

  17. DO files • Text file containing commands rather than typing commands at the keyboard • Contents of review window (previous commands can be saved into a do file • Do files may call other do-files which call other do-files nested 64 deep orin a master.do up to 1,000 do files can be called one after the other

  18. Do file Note: comment Select commands to run and click icon

  19. Built-in Variables • _pi contains the value π to machine precision • _n contains the number of the current obs. • Eg age 23 34 45 56 _n 1 2 3 4 • _N contains the total number of obs. • Eg age 23 34 45 56 _N 4 4 4 4 Note Stata respects case: 3 distinct names myvar Myvar MYVAR

  20. Example of _n and _N use kindall, clear sort khid kpno // sort file hhold and pno within gen totcases = _N // generate total number of obs * For each hhold generate no of people in hhold bysort khid: gen totninhh = _N * For each hhold generate the number within bysort khid: gen nwithinhh = _n list pid khid kpno totninhh nwithinhh in 1/20 tab totninhh nwithinhh ,miss // crosstab include missing

  21. gen totcases = _N // generate total number of obs tab totcases

  22. bysort khid: gen totninhh = _N tab totninhh totninhh Freq. Percent Cum. 1 518 9.98 9.98 2 1,238 23.86 33.85 <- 2 persons 3 915 17.64 51.48 4 1,176 22.67 74.15 <- 4 persons 5 830 16.00 90.15 6 252 4.86 95.01 7 175 3.37 98.38 8 56 1.08 99.46 9 18 0.35 99.81 10 10 0.19 100.00 Total 5,188 100.00

  23. list pid khid kpno totninhh nwithinhh in 1/20 Case pid khid kpno totninhh nwithinhh 1. 118500023 11850027 1 3 1 2. 118500058 11850027 2 3 2 3. 118500074 11850027 3 3 3 4. 118500317 11850043 1 1 1 5. 118501135 11850116 1 1 1

  24. Saved Results summarize produces summary statistics sum kage12  Variable Obs Mean Std. Dev. Min Max kage12 5188 35.46164 22.59792 0 97 Also saves in r( ) 19 scalars like: r(N) – no of obs r(mean) – mean r(sum) – sum of age r(sd) – std deviation r(p1) – 1st percentile r(p95) 95th percentile some are only available with sum kage12, detail To list results stored in r( ) type return list

  25. . sum kage12, detail age at 1.12.2001 Percentiles Smallest 1% 0 0 5% 3 0 10% 6 0 Obs 5188 25% 16 0 Sum of Wgt. 5188 50% 34 Mean 35.46164 Largest Std. Dev. 22.59792 75% 53 92 90% 68 94 Variance 510.6658 95% 75 96 Skewness .2723639 99% 83 97 Kurtosis 2.072386

  26. scalars: r(N) = 5188 r(sum_w) = 5188 r(mean) = 35.46164225134927 r(Var) = 510.66577343513 r(sd) = 22.59791524533026 r(skewness) = .2723638715033958 r(kurtosis) = 2.072386222684342 r(sum) = 183975 r(min) = 0 r(max) = 97 r(p1) = 0 r(p5) = 3 r(p10) = 6 r(p25) = 16 r(p50) = 34 r(p75) = 53 r(p90) = 68 r(p95) = 75 r(p99) = 83 After sum kage12,detail type return list

  27. LOCAL variables eg var referred to as `var’` from key beside 1 and ‘ from key down beside L Programming - loop over items/values • foreachvar in – loops over items • Can be varlist or newlist or numlist • forvaluesx = – loops over consecutive values • loop is executed as long as `x’ is in range

  28. Example * Comment Setup a local variable testvars local testvars " khgr2r khgsex kage12" * Start of loop – note { and ending } * Could also use foreach x in khgr2r khgsex kage12 { foreach x of local testvars { display " the current variable is `x' tab `x' // displays frequencies sum `x' // produces summary statistics ret list // displays all the saved results } // end of loop

  29. Merging data files • Two kinds of merges • One-to-one • Match-merge • Result contained in new var _merge • 1 = obs occurred ONLY in master dataset • 2 = obs occurred ONLY in using dataset • 3 = obs occurred in BOTH master and using datasets

  30. Example of merging Local dirdata “j:\nihps\nihps data\” foreach x in k l m n o p { use “`dirdata’`x'indall”, clear keep pid `x'age12 `x'newhy sort pid save temp`x’,replace } use tempk,clear foreach x in l m n o p { merge pid using temp`x', _merge(mer`x') sort pid }

  31. Command to check number of obs: tab1 *newhy

More Related