1 / 14

Data Cleaning Process

Data Cleaning Process. Patrick Bartels MEA Frankfurt, December 6 th. A short reminder. „Respondents don´t lie!“ only change values if you´re really sure gather information about your country_specific database by references of survey agencies by information of remarks

Télécharger la présentation

Data Cleaning Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6th

  2. A short reminder • „Respondents don´t lie!“ • only change values if you´re really sure • gather information about your country_specific database • by references of survey agencies • by information of remarks • by own investigation • write syntax or do-file, don´t change the data directely • save original variable, when recoding valuese.g. varname_original • indicate by flag_variablee.g. varname_flag • save corrected data files with new namee.g. filename_corrected

  3. What we do consistency checks between cv_r & modules between wave_1 & wave_2 for demography for children fixing of interchanged IDs by automatic exchanges Division of work

  4. Automatic corrections (respid) month / year of birth_w2 month / year of birth_w1 gender_w2 sampid respid gender_w1 100123 01 female male Okt. 1945 Apr. 1942 02 male 100123 Apr. 1942 Okt. 1945 female

  5. Automatic corrections (respid) month / year of birth_w2 month / year of birth_w1 gender_w2 sampid respid gender_w1 wave1wave2 100123 female 01 01 02 male Okt. 1945 Apr. 1942 02 01 02 male 100123 Apr. 1942 Okt. 1945 female compute respid_original = respidcompute respid_flag = 1

  6. Overview of merge between wave_1 and wave_2 wave_1 - gender afterauto-corrections afterauto-corrections afterauto-corrections wave_2 - gender afterauto-corrections afterauto-corrections

  7. What we do consistency checks between cv_r & modules between wave_1 & wave_2 for demography for children fixing of interchanged IDs by automatic exchanges correction of wave_1 by further information in wave_2 What we want you to do ID-corrections initiated by survey agencies check booklets, tests, HH-composition (> Omar) check financial modules (> Mario) check remarks (> Laura) check country specific deviations (> Stephanie) encoding open questions priority: education, ep005 Division of work we can fix a lot of cases you´re much better in doing this 

  8. What we do consistency checks between cv_r & modules between wave_1 & wave_2 for demography for children fixing of interchanged IDs by automatic exchanges correction of wave_1 by further information in wave_2 response for not fixable cases to country-teams What we want you to do ID-corrections initiated by survey agencies check booklets, tests, HH-composition (> Omar) check financial modules (> Mario) check remarks (> Laura) check country specific deviations (> Stephanie) encoding open questions priority: education, ep005 Division of work we can fix a lot of cases you´re much better in doing this  check data again, inquire survey agencies if necessary

  9. Do-File or Syntax • name of author, date of program • short description of ‘what is made‘ • which database • and which modules • version of data, date of publishing • conditions / order of do-files • for STATA-users: define global path

  10. Example of STATA-do_file (1) which dataset short description /****************************************************************************** This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_flag" is added to the final dataset : it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household. • data-version: 2007/Oct/26 • Omar Paccagnella, 30 October 2007 • VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" ! • **********************************************/ data-version author´s name & date of program order of do-files

  11. Example of STATA-do_file (2) for which modules? global drive global drive “S:/Share/wave2“ /************************************************************* THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV **************************************************************/ foreach module in ac as br cf ch co cs dn ep ex hc hh ho iv mh pf ph sp ws { use $drive/sharew2_`module' gen mix_hh_flag=0 gen sampid_original = sampid gen respid_original = respid replace respid=1 if sampid=="1604200015300" & cvid==2 & respid==2 replace mix_hh_flag=1 if sampid=="1604200015300" [...] save $drive/sharew2_`module'_corrected } save original variables flag-variable new version of data

  12. Example of SPSS-syntax (1) short description which dataset COMMENT This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_w2" is added to the final dataset (called sharew2_`var'_checked): it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household. * date of data: 2007/Oct/26 * Omar Paccagnella, October 2007 * VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, * THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", * "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" ! **************************************************************************** *THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV data-version author´s name order of syntax for which modules?

  13. Example of SPSS-syntax (2) GET FILE='S:\SHARE\wave2\dn_module.sav'. EXE. compute mix_hh_flag=0. compute cvid_original = cvid. compute respid_original = respid. compute sampid_original = sampid. if (sampid = 1604200015300 & cvid = 2) cvid = 1. if (sampid = 1604200015300 & cvid = 2) respid = 2. if sampid = (1604200015300) mix_hh_flag=1. EXE. [...] SAVE OUTFILE='S:\SHARE\wave2\dn_module_corrected.sav'. EXE. save original variables flag-variable

  14. Any problems with programming do-files or syntax? Please give us a call

More Related