Survey Research Methods

Survey Research Methods 2. Reliability, Validity and Scale Construction Steve Fisher, Robert Andersen and Anthony Heath Department of Sociology UNIVERSITY of OXFORD

The Logic of Sampling & Measurement

General Problem of MeasurementReliability & Validity Reliability • Refers to the replicability of the measurement procedure to yield consistent results Validity • Refers to the extent to which the measurementprocedure actually measures the concept that it isintended to measure

Different aspects of reliabilityTest-retest, Inter-observer, Inter-item Test-retest reliability • Consistency of repeated measurements on the same subjects • Used for concepts that are believed to be stable • N.B. subjects may change; conditioning Inter-observer reliability • Repeated measures by different observers on the same subject • Especially important in coding open-ended questions Inter-item reliability • Do the items in a composite measure correlate highly • Cronbach’s alpha

Types of Validity 1. Face and Content Face validity • Basically a subjective measure of validity • Does it seem that we are measuring what we claim? Content validity • Does the content of the measuring instrument cover the full domain of the concept? • e.g., measures of left/right ideology requires items tapping different but related things like redistribution, privatisation, government intervention etc.

Types of Validity 2. Criterion-related Validity Criterion-related validity • Correlation with other measures known to have validity. (e.g., questionnaire measures of turnout validated against registers) • Must know the criterion itself has been measured well • Appropriate criteria do not always exist Predictive validity • Does our measure predict expected outcomes • (e.g., Attitudes to taxes can be validated by their ability to predict electoral support for tax-cutting party. But other factors influence voting behaviour, so this is not clear-cut and more theory-dependent)

Types of Validity3. Construct Validity • Based on a theoretical prediction about the relationship between the concept and other items. • Does the measured concept relate empirically to other measured variables in ways that are theoretically expected (i.e., does the measure yield the expected correlations?) • Theory-laden and rather weak but at least it can always be attempted • NB the lack of an expected correlation may reflect a bad theory or another measure involved was badly measured • If your measure has been taken to test a theory you cannot use the same theory to test the construct validity of the measure

Reliability and validitySome conclusions • Reliability is relatively straightforward to demonstrate, validity is much more difficult and often theory-laden • You don’t necessarily have to demonstrate reliability and validity every time a measurement procedure is used • Sometimes there will be existing measures which previous researchers have shown to be valid and reliable and you can simply borrow these • Remarkably few such measures in sociology and political science. See Heath, A. and J. Martin (1997) “Why Are There so Few Formal Measuring Instruments in Social and Political Research?” in Lyberg et. al (eds.) Survey Measurement and Process Quality. New York: Wiley.

Advantage of Composite Measures • Based on idea triangulation—several reliable measures improves validity • Error scores tend to cancel out when we sum over items • Gives greater variability in respondents’ scores

Measuring abstract conceptsIndices and scales • Composite measures of an abstract concept • Adds together scores assigned to several different measures of a construct • Assumes an underlying continuum (i.e., there is structure in the data)—scores represent specific points along the continuum • Indices: Individual measures can be distinct, but together they represent a larger abstract concept (e.g.,Consumer Price Index, United Nations Index of the best Countries to live) • Scales: All indicators measure a single dimensional concept (e.g., Likert scales measuring attitudes)

Measuring attitudinal intensityLikert-scales • Evenly balanced response choices • Mix direction of questions to avoid response set bias • Standard response formats typically have five categories for each item • Strongly agree, agree, disagree, strongly disagree, undecided • Definitely like—definitely dislike • Very important—very unimportant • Definitely true—definitely false • Assign scores to the response categories (1-5) • Sum together items (average if desired)

Using scales to measure political attitudes • For each statement, please circle the category that best reflects your opinion • I believe that I can help change the minds of public officials • Sometimes politics and government seem so complicated that a person like me can’t really understand what is going on • People should vote in elections because each individual vote can make a difference • Generally, those elected to parliament soon lose touch with people • The government cares about what people like me think • I doubt that individual people like me could influence the platforms of political parties • Each of these items are Likert-items with response categories ranging from strongly agree to strongly disagree Adapted from: Gray, G. and N. Guppy (1999). Successful Surveys. Research Methods and Practice. Toronto: Harcourt Brace, p.72.

Some other types of scales to consider Feeling thermometers • Scale of 0 (very cold)-100 (very warm) • Often used to determine how people feel about certain groups (used in AES for Bogardus Social Distance Scale • Measures the distance separating ethnic or other groups from each other Semantic Differential Scales • Measures subjective feelings using polar opposite adjectives (e.g., light/dark, deep/shallow, modern/traditional, bad/good)

An example from the BSAMeasuring income satisfaction • It is sometimes useful to give more detailed response categories • e.g., Which of the phrases on this card would you say comes closest to your feelings about your household’s income these days? • Living comfortably on present income • Coping on present income • Finding it very difficult on present income • Other answer (WRITE IN) • (Don’t know) • Feel free to be creative

Constructing Scales • Choose items • Check for dimensionality using factor analysis • Compute a scale (i.e. sum the scores on the items, recoding if necessary) • Check reliability of scale using Chronbach’s alpha and bivariate correlations • Make adjustments and test reliability again if necessary

How big an alpha? • Early stages of research, >0.7 • Individual level comparisons, >0.9 • Even with 0.9 the standard error of measurement is almost a third of the standard deviation of test scores. (Nunnally and Bernstein, 1994).

Group work • Spend the next session working in groups in Seminar room A or B • Aim is to design a set of questions that can be used to build a valid and reliable scale for your concept • Bring a draft questionnaire with you next week

Survey Research Methods