Operationalizing concepts

DTC Quantitative Methods Operationalization of Concepts, Questionnaire Design and Scale ConstructionThursday 24th January 2013

Operationalizing concepts Qualitative research: concepts are often derived from data (e.g. via ‘Grounded Theory’) Quantitative research: data are ‘collected’ (constructed) using pre-determined measures of the concepts of interest The operationalization of concepts is a vital process in terms of the validity and reliability of the data collected. How good are the measures used as indicators of the concepts of interest? (For a discussion of these issues see the online course extract from De Vaus, 2002).

How theory and data are linked… THEORY CONCEPTS SUB-CONCEPTS DATA QUESTIONS INDICATORS CODING

The relationship between ‘class’ and ‘health’ Table 1: Death rates by sex and social (occupational) class (15-64 years) Rates per 1000 population England and Wales 1971 • Source: Occupational Mortality 1970-72 (Decennial supplement). • Adapted from: Townsend, P. and Davidson, N. 1982. Inequalities in Health: the Black Report. Harmondsworth: Penguin.

Standardization of rates • When examining the relationship between class and health, it will often be desirable to control for other factors, for example, via a statistical multivariate analysis. • More specifically, it is common to present death rates which are standardised for age, such as the Standardised Mortality Ratio (SMR; as discussed by Marsh and Elliott, 2009, and earlier by Marsh, 1988#). • Such measures remove the impact of age structure on rates, and thus allow appropriate comparisons to be made over time, or between countries with different age structures. [#Marsh, C. 1988. Exploring Data. Cambridge: Polity Press.]

Measuring inequality • Marsh and Elliott (2009) also discuss some other ‘specific’ quantitative measures, such as the Gini coefficient for measuring (income) inequality • This can be adapted for use in measuring some forms of health inequalities, occupational segregation, etc. • Overtly quantitative disciplines such as economics and demography can be useful sources of measures that can be adapted for specific purposes by researchers in other social science disciplines!

Social mobility table Class distribution of respondents by class of father at respondent’s age 14 Adapted from Table 2.2 in Goldthorpe, J.H. with Llewellyn, C. and Payne, C. 1987. Social Mobility and Class Structure in Modern Britain (2nd edition). Oxford: Clarendon Press. [Page 49].

How should class be measured? • The preceding slide shows a table constructed using the Goldthorpe class schema – but what do the categories mean? • Social class is an interesting concept, as it is a common features of ‘lay’ discussions without being especially clearly define. One might therefore usefully ask : ‘What IS social class?’ • Or, if one is happy to restrict attention to occupational social class: ‘What are the theoretically important characteristics of occupations that should be used to allocate occupations to particular social classes?’ • Goldthorpe classes are, supposedly, (neo-)Weberian in theoretical origin, reflecting the different market and work situations of people with different occupations (and emphasising features such as economic prospects, and authority and autonomy at work.) • Another sociological class schema owes more theoretically to Marx: Erik Olin Wright’s neo-Marxist class schema (Wright, 1985). Wright, E.O. 1985. Classes. London: Verso.

An aside on odds ratios as a measure of differences between groups/inequality: • Returning to the measurement of inequality, the patterns in the earlier social mobility table can be expressed as differences in percentages (e.g. the differences between the percentages of sons with fathers in classes I and VII who are themselves in classes I and VII. • However, an alternative way of quantifying these class differences is to compare the odds of class I fathers having sons in class I as opposed to class VII with the odds of class VII fathers having sons in class I as opposed to class VII. • The ratio of these two sets of odds is an odds ratio, which will have a value of close to 1.0 if the two sets of odds are similar, i.e. if there is little or no difference between the chances of being in classes I and VII for sons with fathers in classes I and VII respectively.

Pros and cons of different class measures • Wright’s classes are not very good predictors of empirical outcomes • His schema has a high level of content validity (i.e. it looks credible in theoretical terms) but is less good in terms of construct validity. • Conversely, Registrar General’s Social Class (RGSC), which was used in UK official statistics until 2001, was frequently criticised for its lack of clear theoretical underpinnings but related relatively strongly to class-related outcomes (i.e. it was better in terms of its construct validity). • An awareness of the limitations of Registrar General’s Social Class (and a more detailed set of occupational categories, Socio-Economic Groups) led to an ESRC (Economic and Social Research Council) Review of the OPCS (since renamed ONS [Office for National Statistics]) Social Classifications • This aimed to produce more theoretically and empirically satisfactory sets of social classifications (Rose and O’Reilly, 1997). Rose, D. and O’Reilly, K. (eds) 1997. Constructing Classes: Towards a New Social Classification for the UK. Swindon: ESRC/Office for National Statistics.

NS-SEC • The revised social classification is discussed in Roberts (2001: online course extract; see also Rose and Pevalin, 2002; Rose et al., 2005). • Its finalised version, NS-SEC (the National Statistics Socio-Economic Classification) is now a standard feature of British official statistics (for example, it has been used in analyses of the 2001 Census data). • NS-SEC echoes key aspects of the conceptual basis of Goldthorpe’s schema. • Further details can be found within the ONS ‘Guidance and Methodology’ web pages under ‘Standard Classifications’.

…so what is ‘class’? • To quote from the ONS web pages, “The NS-SEC has been constructed to measure the employment relations and conditions of occupations (see Goldthorpe 2007). Conceptually, these are central to showing the structure of socio-economic positions in modern societies and helping to explain variations in social behaviour and other social phenomena.” • Of course, an implication of this is that UK official statistics conceptualise ‘class’ in a particular way, theoretically-speaking (and they don’t call it ‘class’!)

Occupational categories • Sets of class categories such as those described in the preceding slides are typically based on a list of occupations known as the Standard Occupational Classification, or SOC (ONS, 2000). • There is also a similar, international list: ISCO {International Standard Classification of Occupations}. See http://www.ilo.org/public/english/bureau/stat/isco/index.htm • Occupations are matched to one of the occupations on this list, and cross-classified with an individual’s employment status to identify to which category (or sub-category) of the schema in question that individual belongs. • A quite detailed set of questions needs to be asked for an individual’s occupation to be matched in an unambiguous way to one of the SOC occupations and/or to a class category.

Why seven classes? Why not a scale? • How many classes should one bring into consideration in a piece of research? • Why use a set of categories (i.e. a nominal or ordinal-level variable) rather than a position on a scale (interval-level variable)?

Perspective and pragmatism • An answer to the above questions is likely to mix theoretical and practical considerations. However, the need for aggregating categories to produce a smaller number of ‘classes’ in some circumstances is fairly clear, which has the implication that we need to be able to ‘recode’ variables in our statistical analyses • It is notable that there is a high profile occupational scale (the Cambridge Scale; see Prandy, 1990) in British sociological research, and, perhaps more importantly, that North American social researchers also often use the position of occupations on a scale to measure ‘Socio-Economic Status’ (SES). • Scales and sets of categories have different pros and cons (e.g. in terms of the level of detail measured/represented), and lead to the use of different techniques of statistical analysis. • It is worth noting that the Cambridge scale is derived empirically by looking at the behaviour of people with different occupations rather than allocating occupations to categories on a more overtly ‘conceptually-driven’ basis. But, either way, the discussion so far has focused on ‘objective’ measures of class, rather than subjective self-identified measures. Prandy, K. 1990. ‘The Revised Cambridge Scale of Occupations’, Sociology, 24.4: 629-655.

Class… and gender • If we were considering the social classes of married women in a piece of research, should we consider their own occupations, or the occupations of their husbands? • Does it make any difference whether we are thinking about: (i) their attitudes towards work; (ii) their material circumstances; (iii) their voting behaviour; (iv) the occupational attainment of their children?

Politics and practicalities • There has been a long-standing sociological debate on this issue. See Roberts (1993) for a discussion. Presumably one should work on the basis of the relative theoretical and empirical merits of different approaches! • Note that a decision to base a married person’s class on both their own and their partner’s occupation can lead to a need to aggregate information from two variables and derive a new variable for one’s statistical analyses. Roberts, H. 1993. ‘The women and class debate’. In Morgan, D. and Stanley, L. (eds) Debates in Sociology. Manchester: Manchester University Press.

Is social position best thought of in terms of ‘class’? • It is worth noting that recent political discussions of social mobility (e.g. a November 2008 Cabinet Office Strategy Unit document: see the link below) draw upon evidence of intergeneration income mobility as much as upon evidence of intergenerational class mobility, suggesting that social position (in the context of the concept of ‘social mobility’) is being viewed as much in terms of income as in terms of class! http://www.cabinetoffice.gov.uk/media/cabinetoffice/strategy/assets/socialmobility/gettingon.pdf • More generally, in recent years a number of authors have suggested that it may be appropriate to bring additional dimensions into considerations of ‘class’/’social’ position, over and above the ‘occupational dimension’ (e.g. an ‘educational’ dimensions, leading to considerations of educational mobility)

Multi-item scales • Some measures that take the form of scales are derived from multiple items • An example is the frequently-used GHQ (General Health Questionnaire), a scale based on the aggregation of 30 items such as “... have you recently ... been feeling reasonably happy, all things considered?” and “... have you recently ... been losing confidence in yourself?” • It was originally designed as a screening instrument for psychiatric illness, but has been used by researchers as a measure of self-perceived well-being (Blaxter, 1990). • Something similar (WEMWBS) has recently been developed at Warwick, and has already been subject to processes of critique and revision: see Stewart-Brown et al., 2009). Blaxter, M. 1990. Health and Lifestyles. London: Tavistock. Stewart-Brown, S., Tennant, R., Tennant, A., Platt, S., Parkinson, J. and Weich, S. 2009. ‘Internal construct validity of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS): a Rasch analysis using data from the Scottish Health Education Population Survey’,Health and Quality of Life Outcomes 7.15.

Using existing measures of concepts Reasons for using existing measures of concepts: • Availability (although even secondary analysts may be able to reconfigure data to produce their own measures or variants of measures, e.g. a different class schema) • Comparability • The time and effort saved • Existing measures also may have the benefit of being ‘tried and trusted’ (e.g. thoroughly piloted). However, the crunch question is whether there is an adequate match in terms of validity between the concept as operationalized and the researcher’s intended use for it. For those considering using existing measures/questions, the The Survey Question Bank (SQB) (see web pages) is an helpful on-line resource containing measures and questions from various high profile UK government and academic surveys.

A doctoral example: Measuring cultural capital In her research on cultural capital and educational attainment, Sullivan operationalises the former concept (as discussed by Bourdieu) multidimensionally, using the following types of indicator, which were included as six explanatory variables in her statistical analyses of GCSE results: “1. Activities • Reading: type and amount of books read, library use, newspapers read. • Television: type of programmes watched. • Music: type of music listened to, playing an instrument • Participation in ‘public’ or ‘formal’ culture: art gallery, theatre and concert attendance. 2. Cultural knowledge • Tested knowledge of famous cultural figures. 3. Language • Active and passive vocabulary test scores.” From: Sullivan, A. 2001. ‘Cultural Capital and Educational Attainment’, Sociology 35.4: 893-912. [p899]

...and an official example: Social capital • An operational definition and indicator/set of indicators for this concept, as written about by Bourdieu, Coleman, Putnam, etc., has been developed on behalf of the Health Development Agency using questions within the General Household Survey • See the National Statistics website for relevant material. (The material is listed under the heading ‘The Social Capital Project’ within the ‘Guidance and Methodology’ section). • The questions used to tap this multi-dimensional concept relate to civic engagement and aspects of neighbourhoods/local areas, social networks and support, trust and mutual reciprocity, etc.

Questionnaire Design: Reading material • In addition to the online course extract from Aldridge and Levine (2001), there are useful chapters in Oppenheim (1992) and De Vaus (2001). Remember • While the research instrument in ‘face-to-face’ surveys is arguably more correctly referred to as the ‘interview schedule’, the literature often incorporates such instruments under the heading ‘questionnaire design’.

The four stages of questionnaire design (i) “Mapping the semantic domain” This stage involves identifying for each of the concepts fundamental to the proposed research its components (i.e. sub-concepts), context (i.e. related and ‘similar but different’ concepts), and parameters (i.e. the conditions which must be present for the concept to apply). (ii) Formulating the question ideas This stage involves identifying the range of issues on which specific questions need to be asked, i.e. it identifies that a question has to be asked relating to a particular sub-concept: “We need to ask a question about...”. At this stage one is moving from concepts towards indicators. (iii) Writing the questions At this stage the specific wording of the questions is determined (iv) Assembling the questionnaire At this stage the questions are assembled into a coherent, structured questionnaire. Source: Halfpenny, P., Parthemore, J., Taylor, J. and Wilson, I. 1992. ‘A knowledge based system to provide intelligent support for writing questionnaires’. In Westlake, A. et al. (eds) Survey and Statistical Computing. London: North Holland.

Four aspects of a questionnaire’s contents De Vaus refers to (i) Measures of the dependent variables (ii) Measures of the independent variables (iii) Measures of “test” variables (i.e. variables that intervene between the independent and dependent variables, or which are temporally prior and possibly causally related to both) (iv) Background measures (i.e. ‘demographic’ variables)

Preliminaries • Unstructured discussions with potential respondents/‘key informants’. The idea of this qualitative work is to give the researcher a more complete picture of the topic, and the language and perspectives of respondents. • A search for existing questions on the topic, (e.g. via the ‘Question Bank’). However, inheriting measures from previous research carries a risk of reflecting the pre-conceptions of earlier researchers. • Piloting the draft questionnaire, to check for problems and to allow questions to be refined.

Some general points • “Avoid putting ideas into the respondent’s mind early in the interview if we need spontaneous responses later on” (Oppenheim). • Beware of asking questions that unwittingly reveal the researcher’s attitude to the topic. • “A question that strikes the respondent as rude or inconsiderate may affect not only his [sic] reply to that particular question but also his [sic] attitude to the next few questions and to the survey as a whole” (Oppenheim) • Make the questionnaire attractive to the respondent by making it interesting and of obvious relevance to its stated purpose.

A framework for question sequences Oppenheim outs forward the following, chronological framework (derived from a Gallup schema): • Respondent’s awareness of issue. • Respondent’s general feelings about issue. • Respondent’s views on specific aspects of issue. • Reasons for respondent’s views. • Strength of respondent’s views.

‘Open’ or ‘Closed’ I? • ‘Open’ questions, where the respondent’s verbatim answer to the question is recorded in full, are easy to ask, less easy to answer and difficult to analyse. The emphasis is on the respondent’s perspective, but there is still the possibility that answers will reflect what is uppermost in respondents’ minds. • ‘Closed’ questions are easier to quantify, but result in a loss of spontaneity and expressiveness, and the ‘forced’ choice of answers may result in bias (and shift the balance towards the researcher’s perspective). • A compromise is to ask an ‘Open’ question and then a similar ‘Closed’ question later in the questionnaire/interview. • Oppenheim comments that “All closed questions should start their careers as open ones, except those where certain alternatives are the only ones possible”.

‘Open’ or ‘closed’ II? • Awareness of issue  Closed, • General feelings on issue  Open, • Views on specific aspects of issue  Closed, • Reasons for views on issue  Open or Closed, • Strength of views on issue  Closed. Source: De Vaus (1986), adapted from Gallup.

Types of ‘forced choice’ answers • A Likert scale: e.g. a range of answers from ‘Strongly agree’ through to ‘Strongly disagree’. • A semantic differential scale: A range of positions between two extremes of a continuum (e.g. ‘Caring’ through to ‘Uncaring’). • A checklist: e.g. a list of leisure activities. • A ranking of items: e.g. placing the most important attributes of a potential partner in order. • A choice between statements: e.g. a choice of responses to the acquisition of the knowledge that one’s best friend’s partner is being unfaithful to them.

Some key issues Some ‘crunch’ questions that one might ask: • Does the respondent understand the question (in the same way the researcher does!), • Are they willing to answer it (accurately?), and • Are they able to answer it?

Scale construction • Researchers sometimes want to measure some latent characteristic of their respondents, e.g. whether they have a ‘traditional’ or a ‘modern’ viewpoint on couple relationships. • This is often done by asking a number of questions which each tap that characteristic and, when aggregated, collectively do so in more reliable way.

Some issues... • How does one know that a measure constructed by aggregating various items to give a scale is measuring the ‘right’ quantity, i.e. is a valid measure? • How does one ensure that what a measure is measuring is unidimensional, i.e. that it is not a composite of measures of two or more underlying concepts? • How does one assess which items need to be included to maximise the reliability of the measure? • And how does one assess the overall reliability of the scale?

Some answers... • For a discussion of assessing various forms of validity, see Oppenheim (1992). • Unidimensionality can be assessed using a technique called factor analysis, see DeVellis (2003). • Reliability can be assessed using a measure called Cronbach’s alpha (see De Vaus, 2001; DeVellis, 2003).

Factor analysis Factor analysis generates a set of underlying factors which successively maximise the amount of (remaining) variation in the items that they can explain. If a scale is working properly unidimensionally, then the first factor will explain a high proportion of the variation, and the subsequent factors similar, small amounts.

Cronbach’s alpha • According to DeVellis (2003: 95), “Alpha is an indication of the proportion of variance in the scale scores that is attributable to the true score”. • Items are chosen for inclusion so as to maximise that proportion, and if they have relatively high correlations with the rest of the items within the scale (viewed collectively). • De Vaus (2001) suggests that a value of at least 0.7 is preferable.

Some additional issues... • Should all the items in the scale be treated as of equal importance? Or should their values be added in such a way as to increase/ decrease the relative importance of some items? • Are the gaps between the values that a variable can take uniform in meaning?

Operationalizing concepts