Political Science 585Techniques of Political Analysis Concepts, Variables and Measurement April 3, 2007
Key terms in today’s lecture • Quantitative and qualitative data • Operationalization • Working hypothesis • Indicator • Dummy variable • Multidimensional concepts • Measurement error • Validity (face and construct) • Reliability
Quantitative and qualitative techniques • In both quantitative and qualitative techniques, we proceed from the theory/hypothesis stages by operationalizing the relevant concepts, especially our independent and dependent variables. • Most of our focus, however, will be on quantitative techniques. This means that no matter what measurement strategy we use, our final product will be numerical in nature. • For example, if we measure income precisely (i.e. a person’s score might be 30K) the values will be obviously numerical. • However, if we choose a categorical measurement technique (i.e. ask people if they are “rich”, “middle income” or “poor”, we will translate those scores into numerical values (i.e. 2,1,0)
Quantitative and qualitative techniques (cont.) • Note that it is not the presence of verbal scores that makes a technique qualitative. • Asking a question about a person’s religion yields verbal answers, but we translate these measures into numerical values. • On the contrary, in qualitative techniques, our data is mostly in the forms of prose, discussions, and we do not generally seek to translate these responses into numerical categories. • In this case, the data need not be translated into numbers because the data analysis does not use numbers. It uses argument and interpretation. • That having been said, most of today’s discussion is primarily, if not exclusively, focused on an end goal of quantitative analysis.
Two major measurement decisions • To decide how to measure the concept captured by each of our key variables, we must make two key decisions: • The first is substantive: what concrete referent or referents in the real world is going to represent our abstract concept. • The second is the level and precision of measurement.
Measurement • In order to begin the process of testing our hypotheses, we need to figure out what concrete measures are going to “stand in” for our abstract concepts. We call this “operationalization”. • First, we should note that measurement decisions have little to do with whether the variable/concept in question is the independent or dependent variable. • It will matter later on when we choose techniques of data analysis, but it should not matter in measurement decisions.
Operationalizing Concepts • Generally, we are faced with one of two scenarios with respect to data: • Sometimes, we need to take this step in order to create an instrument: a survey question, or rubric for collecting data. • For example, we want to find out about individuals’ ideology, but don’t know what questions to ask them. • Other times, we have data, but we must consider what abstract concepts our data can stand in for. • For example, we have data on the number of terrorist attacks, the number of deaths and injuries, and the costs of damages in a country, but we don’t know which one is most appropriate as a measure of terrorism.
Example: Voting • Let’s look at a hypothesis based on our running example about voting: • Theory: Potential voters consider whether the benefits of voting outweigh the costs. • Hypothesis: The competitiveness of an election will increase voting. (Note: this is a different hypothesis than the one week looked at last time. As we mentioned, theories imply multiple hypotheses.) • Our variables here are competitiveness and voting. However, we do not yet have a sufficient level of concreteness. • Until you could tell a research associate exactly what data to collect and what it looks like, the process of operationalization is not complete.
Aspects of the operationalization process • For now, let’s assume we could represent our concept with one measure. We will return to cases where we need multiple measures later. • The important steps in the process are: • First, we want to analyze our concept. What is its “essence?” A good first step is always to define the term completely. Then we can start considering potential measures. • We want to be certain that our measure actually represents the concept itself, and not a related concept. • For example, ideology and party identification are closely related, but asking a person’s party in order to determine their ideology is incorrect. The definition of ideology relates to principles and policy preferences, and ideology would exist in a country with no parties.
Competitiveness • Let’s consider the concept in the voting example. Competitiveness is too abstract a concept in its current form. • At its essence, the concept of competitiveness is about whether each candidate has an equal chance of winning. This is the idea we’re trying to get at. • We might still ask a number of questions before choosing an indicator: • Do we care about real competitiveness or perceived competitiveness? • Do we care about Election Day as a snapshot, or the whole campaign period? • We might settle on recent polls, actual election results, or individuals’ perceptions. At this point, we have what Manheim calls a working hypothesis.
Aspects of the operationalization process (cont.) • The process is not complete yet. Second, we want to consider what type of data the measure we have chosen will yield. • Let’s say we decide to use opinion poll results. The difference between Candidates A and B, one week before the election captures our concept well. Thus, this number (A’s percentage minus B’s percentage) would be what Manheim calls our indicator. • Now, what else should we know? • The level of measurement for our indicator. In this case, the indicator is interval level. • The range of our indicator. In this case, (50-50) = 0 yields the lowest value; (100-0) = 100 yields the highest value. • The meaning of measure values. 0 indicates a very competitive race; 100 indicates a very uncompetitive race.
Aspects of the operationalization process (cont.) • Finally, do we want to change the level of measurement? Although we generally prefer the highest level of measurement possible, do we have a compelling reason to break our measure down into ordered categories? • Let’s say we want to break the above polling data into three categories: close, competitive, and uncompetitive races. • We then must decide what values to assign to each category. • For example, we might say that races where (A-B) is less than 5 are “close”, races where (A-B) is between 5 and 15 are “competitive” and other races are “uncompetitive”. • These categorization decisions are a largely subjective part of the operationalization process.
Voting (cont.) • If we turn to our dependent variable, voting, we want to measure whether or not an individual voted, i.e. a yes/no question. • Indicators that answer a yes/no question are a special class of nominal level variables, called a dummy variable. A dummy variable takes on the value of 1 if the answer is yes, and 0 if the answer is no. • As you can see, this answers the questions about level of measurement, range, and meaning of values in one felt swoop.
One final caveat • Voting might seem to already be concrete and suggest an obvious indicator. We could count the number of people (or the percent) who voted in each race and this is our value for the dependent variable for that observation. • However, this is not in fact a good operationalization because our hypothesis was about individual behavior and our measure is of aggregate behavior. We want each observation to be a single voter. • Our measurement is now complete. For each individual, we would have generated data that might look like the following:
Choosing measures • Sometimes, measures are chosen due to practical considerations. You might have data available which does not perfectly capture the concept, but you cannot afford to collect perfect data. • Other times, choosing a measure requires an intellectual trade-off: how close can you get to the abstract idea without the concrete version being too complex, unobservable, or not measurable? • For example, we might like to know more than whether a person voted; we could measure how enthused they were about voting from measuring brain activity. Not very practical though. • However, often, complex theories require more complex operationalizations.
Multidimensional concepts • Some concepts have multiple aspects or dimensions at the abstract level. (Arguably, most or all concepts do). • For example, “democracy” might entail: • Regular, free and fair elections • Multiple political parties • Peaceful transitions in power, etc. • Thus, at the concrete level, we seemingly have no choice but to collect multiple measures.
Multiple measures • If we are analyzing second-hand data, we may use multiple measures because we lack a single strong, conceptually faithful measure. • For example, if we are trying to measure ideology but lack an appropriate question, we might look at a battery of questions that ask people about policy positions. More questions may allow us to “triangulate” on the concept of ideology itself.
Scales • Although we may collect multiple indicators, we still may choose to use only one variable to represent the concept. • For example, the well-known POLITY scores summarize the various aspects of democracy into a single score. • Each characteristic of democracy that a country possesses might earn it one point, and those points would then be summed (or weighted, then summed). • This sum would then be the value of the variable democracy for that observation. • Chapter 9 of Manheim (not required) talks more about scaling techniques.
Working hypotheses • Remember that refutation of a working hypothesis is not exactly the same as refutation of a hypothesis: • If the working hypothesis is disconfirmed, we may always question the operationalization of the variables • Only if we accept the working hypotheses would we accept that the hypotheses are disconfirmed by them.
Krasno and Green • The theory in this article involves the relative importance of national and local forces in convincing candidates to run. • There are basically two main hypotheses: • The more positive the national forces are for a candidate’s party, the higher quality challenger will emerge. • The more positive the local forces are for a candidate’s party, the higher quality challenger will emerge. • The article is also interested in the relative importance of these factors. At the end of this class, we’ll examine techniques for comparing two causal factors. • The variable in question is the dependent variable, challenger quality. The two IV’s are relatively uncontroversial in terms of measurement.
Krasno and Green: the measurement problem • In the literature on legislative elections, incumbents are difficult to defeat: yet, it is easier to predict the success of challengers if we know their “quality.” • In the abstract, quality means a lot of things: charisma, ability to raise money, ability to debate, experience, etc. • Most of these things are difficult to measure
Krasno and Green (cont.) • In the past (and indeed, still now) most studies use a single proxy for quality: prior office-holding. • Krasno and Green suggest that we can get closer to the concept of quality by collecting additional data: famous names, specific office held, etc. • Of course, collecting this data is costly and difficult. This finding may be worthwhile without being all that practicable.
Measurement problems • Although we can often justify our measurement decisions through argument, there are more formal criteria for assessing how good one’s measures are. Most importantly, our goal is to minimize measurement error. • Measurement error is any difference between the recorded value and the “true” value. • Error can be systematic or random • If the measure creates systematic error, it has a validity problem • If the measure creates random error, it has a reliability problem
Validity • Validity: Do our measures accurately capture the concepts in our theory? • Obviously, this first and foremost requires proper operationalization. • It also, however, requires that there are no systematic measurement problems. For example, measuring social class based on home ownership would ignore the reality that home ownership is easier, even for the working class, in some parts of the country. People’s social class would be systematically overstated in some areas and understated in other areas.
Reliability • Reliability: Do our measures provide stable measurement of the concept? • If a measure is not reliable, it cannot be valid either. The reverse does not hold. • Random error can come from a variety of sources (for a fairly complete list, see pp. 74-75 of Manheim) • Important examples include different interpretation of questions, recording of data, and different settings and temporal contexts.
Validity and reliability (cont.) • How do we assess reliability? • Mostly objective criteria • Test/retest methods. An example would be a scale. A reliable scale would weigh a person the same on two consecutive occasions. • Sub-sample methods are more statistically complex and rely on large samples. The basic idea is that if we cut the sample randomly in half, there will be no major differences between the two halves on the measure in question. • Reliability can only be estimated, not calculated.
Validity and reliability (cont.) • How to assess validity? • Criteria are somewhat more subjective, but nonetheless involve statistical analysis. • Please note: my terms here differ slightly from the book, to avoid double usage of the terms internal and external validity (which we’ll discuss next time). • Two ways of assessing validity: • Face validity: Is it self-evident that the measure represents the concept well? Put another way, would your fellow scholars and experts accept your measure? • Construct validity: Is the measure correlated with other measures that the concept is related to?
Construct validity: an example • Let’s say you are asking individuals a question about their ideology, and you want to assess its validity. In other words, is this question doing a good job of identifying conservatives as conservatives, and liberals as liberals? • To measure construct validity, we might also ask the individuals who they have voted for in recent elections. If there’s not at least a modest correlation between ideology and vote choice, the ideology measure probably has a validity problem, even if it makes sense conceptually.
Validity and reliability: a few final comments • IMPORTANT: The two types of validity on the previous slides refer to measures. • Later, we will discuss internal and external validity, which are criteria for research designs. • Even if a single measure has less than perfect validity or reliability, multiple measures can salvage the research design. • Also, validity problems may be mitigated depending on the scope of our study. If our study only covered a single city, for example, the above use of home ownership as a proxy for social class might be less problematic.
Krasno and Green • So how do Krasno and Green do on these assessments? • Examining the elements of the challenger quality scale, each characteristic appears to have face validity. • We might also say that the weighting decision has face validity because the authors justify the 4-point award for office-holding in terms of the four elements it implies: political contacts, name recognition, candidate skills, establishing occupational qualification.
Krasno and Green (cont.) • In addition, the measure’s construct validity is strong based on the examination of year-by-year scores: • 1974 was a horrible year for Republicans (Watergate) and this was the Democrats’ best year in terms of relative candidate quality scores. • 1980 was a bad year for Democrats and likewise, Republicans had their best candidate quality scores that year.
Krasno and Green (cont.) • The reliability of this measure is not necessarily high because it depends on the authors’ judgments: • What is a celebrity? • Where do you draw the cutoff for important offices? • What qualifies a person as “professional”? • If you or I took the same raw data and created the same scores, we might get different results • For measures such as this, coding should always use multiple coders and researchers should measure intercoder reliability. This offsets the problems associated with subjectivity. We’ll talk more about this later in the quarter when we discuss expert surveys.
Class exercise • Using the concept assigned to your group, you are going to choose one hypothesis in which your concept is the dependent variable. It may be a hypothesis from last time, or one you create today. • Follow the operationalization process outlined today, including: • A working definition of the two key concepts and a discussion of its important features (concept analysis). Most importantly, is it a multidimensional concept?
Class exercise • Brainstorm some possible measures for each concept. Can you use one measure, or are multiple measures preferable? Which one are you going to choose? • What will the data generated by this measure (or measures) look like, in terms of level of measurement, range, and indicator values? • As before, please turn in a written copy of your group’s notes.
Next time • Next class, we will lay out a framework for the examination of research techniques. This entails a number of considerations: • How do we choose cases, and what implications do those decisions have? • How do we know that we can trust our results? (internal validity) • When can we generalize from our study to all instances of the phenomenon? (external validity) • What ethical considerations come into play? • As you read Geddes, think not only about the questions she poses, but also about the theory, hypothesis, and the measurement decisions made in her study.