Analyzing & Measuring Categorical Data

Analyzing & Measuring Categorical Data Gui Shichun

What is Categorical Data?

Scale of Measurement • Categorical Data can be arranged in either • Nominal (unordered) scale, or • Ordinal (ordered scale) Tips: Make sure that your ordinal data are really rank ordered!

Now let’s look at nominal data first

Explanatory, Response, Control Variables Categorical variables are called by the characteristics of the data. On the other hand, explanatory (independent, or x ), response (dependent, or y), and control (z) variables are named after the role of the data in the study.

The question of interest is whether the pass probabilities for CAI and Tutoring are the same. So the H0 hypothesis is the two variables are independent, and the H1 hypothesis is they are not independent. Explanatory (Independent Variable) Response (Dependent variable)

In a statistical term, we can address this question by investigating whether: • there is a statistical association between the instruction methods (explanatory) and the performance of the exam (response variable). The hypotheses of this test are • H0:There is no association between the instruction methods and the performance of the exam. • H1: There is an association between the instruction methods and the performance of the exam. • The chi-square is one way to test the hypotheses. The significant chi-square statistic, i.e., p<.05, only tells the existence of the association between the explanatory variable and the response variable. The chi-square value is .934, and p=.334,>.05, indicating that there is no association between the two variables.

The strength of the association is measured by • the difference of proportions, • the relative risk, and • the odds ratio.

Difference of Proportions • The difference of proportions compares the yes (success, pass) probability between two row groups (explanatory variable). It is defined to be Difference of Proportions (DP) = p1 - p2, where p1 = n11/n1, p2 = n21/n2. • In our example, the difference of proportions calculates the difference of the pass probability between CAIgroup and Tutoring group. The difference of proportion is 0.846 - 0.914 = -0.068. The yes probability of Tutoring group is larger than that of CAIgroup by 0.068.

Relative Risk • The relative risk is the ratio of the yes probability for the two row groups (explanatory variable). It is defined to be Relative Risk (RR) = p1 / p2. • In our example, the relative risk is 0.846/0.914=0.926. The proportion of passing the exam in CAI group is .926 times that in Tutoring group.

Odds Ratio • The odds ratio is a ratio of two odds. The first odds of success in the first row are odds1 and the second odds of success in the second row are odds2. Each odds is defined to be odds1 = p1 / (1-p1)=.846/(1-0.846) = 5.494 odds2 = p2 / (1-p2)=0.914/(1-0.914)=10.628 The odds ratio is 5.494/10.628=0.517. • the odds of passing the exam in Tutoring group is almost twice as high as CAI group

Now let’s consider ordinal data!

An Example of Ordinal Data(Employment*salary) The question of interest is how these two ordinal variables are related to each other.

Another example of ordinal data (employee*smoking)

Consideration of the kind of relationship that may exist between two ordered variables leads to the notion of direction of relationship and to the concept of correlation. • For directional measures, we need to look at Somer’s d, which reflects the level of association between the two ordered variables. It’s 0.689 (against 0.169) in this case. • For correlational measures, we need to look at Gamma, which is 1 (0.978 here, against 0.236) if all observations are concentrated in the upper-left to the lower-right diagonal of the table. Pearson R , which is the interval-to-interval correlation, is 0.742.

Now let’s see how these categorical data can be arranged and computed by Excel and SPSS!

Analyzing & Measuring Categorical Data