140 likes | 449 Vues
Data Processing and Tabulation, Part I. Importance of data processing. If data aren’t clean and well edited: analysis will be flawed data may be inconsistent across years country data may not compare well with those of other countries
E N D
Importance of data processing If data aren’t clean and well edited: • analysis will be flawed • data may be inconsistent across years • country data may not compare well with those of other countries • publications may need to be reissued when errors are found Source: www.sterlingsolutions.co.uk
First steps Data from paper surveys need to be carefully entered into the computer • Minimize typing errors and correct those that are found • Code verbatim and “other, specify” responses if they actually belong in an existing category • Develop a system for controlling which questionnaires have and have not yet been entered • Gather questionnaires that have issues so they will be easier to access and investigate later. Make sure the data file is an appropriate size and the number of records is on target
What to look for Response values that are invalid • A code of “7” would be invalid for a question where the possible response codes are “1” and “2” • Find invalid codes by producing a frequency distribution or by sorting the data on that variable. Determine the correct response by checking the paper questionnaire or by looking at which subsequent questions were answered. Logically inconsistent responses • Responses that can’t both be true. For example, a respondent who is recorded as being younger than her child • Determine the correct response by checking the paper questionnaire or by looking at which subsequent questions were answered.
What to look for Impossible responses • Responses that can’t be true. For example, a respondent who reports working more than 7 days a week. • Check the paper questionnaire to determine if it is a typo and what the correct answer is. Improbable responses • Responses that are more likely to be a mistake than be true. For example, a 16 year old recorded as having a PhD. • Check the paper questionnaire to determine correct responses.
What to look for Omissions of responses or sections of the questionnaire • The response may have been missed during data entry or the interviewer may have failed to follow the skip instructions. • Check the paper questionnaire and subsequent responses to determine what happened and if an answer is available.
What to look for Responses to questions or sections of the questionnaire that should not have been asked • The interviewer may have failed to follow the skip instructions correctly. For example, both the employment and unemployment sections may have been asked of the same person, but only one of those sections can be appropriate. • Use the responses and/or the sorting question to determine which section is valid and which has nonsense answers. Delete the nonsense section answers from the clean file.
What to look for The number of responses to questions • This should be relatively stable and differences should be largely explainable by skips taking people to different follow-up questions. • Unexplained differences may indicate skips that were not followed, coding errors, or other types of issues. The number of missing values • The number of missing values for a question should be appropriate. • Questions that everyone gets should have few missing values. • Questions that only some people get should have more. When a question sorts people to one of two follow-up questions, the number of responses to one of the questions should be about the same as the number of missing values for the other question.
Variable transformations Some variables need to be transformed from what is easy for respondents to provide into what can be readily used by researchers. • If the respondent’s month and year of birth were recorded, age can be created from this and the month and year of the interview. • Recording age and other such variables on the file eliminates the need to recreate it every time someone works with the data and reduces the risk of errors.
Variable transformations - Counts Knowing how many of something there are or how many times it was done can be useful. • Variables for presence and number of children in the household, for example, can be created from the roster • Children in the household may be the children of different adults (for example, sisters living together), so it is important to associate the correct adults and children.
Variable transformations - Recodes Recodes change the category labels in a variable. • For example, marital status may contain “divorced,” “separated,” and “widowed” as separate categories but it may be that the agency will frequently want to combine these three groups. • A recode of the marital status variable with these combined into one group (and with any other desired combinations) could be made. • The original variable with the three separate groups should be maintained, as it provides valuable information.
Variable transformations – Creating Concepts Multiple variables can be used together to create a concept of interest. • For example, unemployed can be created from variables for work during the past week, active job search, availability for work, waiting to start, and waiting to be recalled. • The new variable makes the data easier to use and minimizes the risk of it being created incorrectly in the future. • The variables that make up the new concept should be kept on the file because they make other concepts as well and because they can be used on their own for analysis.
Variable transformations – Math Mathematical operations, such as creating rates, means, and medians may be desirable. • The unemployment rate may be created by dividing the number of unemployed by the civilian labor force and multiplying this by 100. • Decisions about rounding and how many decimal places to store must be made. Source: www.cabbagetreesolutions.com