110 likes | 212 Vues
Explore the process of coding, data cleaning, and statistical analysis in social research. Learn about the logic of data manipulation and the role of computers. Discover the methods for developing code categories and constructing a codebook for error-free data entry.
E N D
Quantifying DataAdvanced Social Research (soci5013) Peter Njuguna Source: Course Pack Chapter 14 (Page 383 – 395)
Overview • Data Analysis • Statistical • Quantitative • Mostly computer aided nowadays • Process • Mass observations • Quantification (through coding) • Coding error reduction (Data cleaning) • Data Analysis
Introduction • Social Science data (largely non-numeric) • Machine Readability, Manipulation • Logic of data manipulation in quantitative analysis • Biological & Physical science data (mostly numeric attributes, eg counts, pH, length, temp.,..) • Baseline: The logic remains same even with development of more powerful technology • Computers are tools to enhance research. They understand only the basics
Computers in Social Research • France (1801) Joseph marie-Jacquard (automatic loom, punched cards, weaving patterns) • USA (1790) 10-year census – under 4 mil. People • 1880 Over 62 million. (9 years to tabulate!) • 1890 Herman Hollerith: Punched card system (Results reported in 6 weeks) • Tabulating Machine Co. + mergers = IBM • Baseline: Information coding, storage, Retrieval. • Today’s computer data analysis: Converting observations into machine readable form, electronic data storage, retrieval, manipulation and presentation • Statistical Analysis (Some programs specific for social Science eg SPSS)
Coding for Quantitative Analysis • Social science methods (interviews, questionnaires, .) • Open-ended & closed-ended questions : Non-numeric responses • Coding reduces responses to limited set of attributes to enable analysis • use pre-established coding: Comparable with others • coding from the data set (responses): Flexibility response coverage • Coding system should be appropriate to theoretical concepts • If data coded to maintain detail, can be combined where detail not necessary, but not vice versa
Developing code categories • Well developed coding scheme • Derived from research purpose • Existing coding scheme (comparable) • Generate codes from your data • Many possible schemes (cf. pg 388, 389), specific to your research purpose • Review for recoding as you progress • Code categories should be; • Exhaustive • Mutually exclusive • Coder reliability (including yourself) crucial
Codebook construction • Codebook (describes location of variables; assignment of codes to attributes) • Primary guide in coding process • Guide for locating variables & interpreting codes in data file during analysis • Contains • Variable names, • Full descriptions (cf. exact wording of questions) • Categorized response options
Coding and data entry options (1) • Transfer sheets • Useful technique especially with complex questionnaires and other data sources • Source Course pack pg 391
Coding and data entry options (2) • Edge-coding • Direct data entry (pre-coded questionnaires) • Data entry by interviewers • e.g. CATIs • Closed-ended data ready for analysis • Open-ended responses - additional coding step before analysis • Coding to optical scan sheets • Coder error high • Low scanner tolerance • Direct coding on op sheets by respondent • Connecting with data analysis program • eg SPSS – blank data sheets – entry – analysis • Create data set (spreadsheet, etc) – import & export • Compatibility options well developed
Screening and elimination of errors (Data cleaning) • Errors almost inevitable • Incorrect coding • Incorrect reading of codes • Sensing of marks, etc • Two types of data cleaning methods • Possible code cleaning • By checking for errors as data is entered (“beep!”) • Testing for illegitimate codes in stored data files • Contingency cleaning • That only cases relevant to attribute have such entries (cf. No of pregnancies in men) inappropriate. • Can be ignored sometimes (significance, discretion) • Remember that “dirty” data almost always produces misleading results ….
AT LONG LAST, …. YOUR DATA IS READY FOR ANALYSIS … GO!