Examining Data Constructing Variable: Unidimensionality and PCA Analysis

Examining Data

Constructing a variable • Assemble a set of items that might work together to define a construct/ variable. • Hypothesize the hierarchy of these items along that construct. • Choose a response format • Investigate how well the hierarchy holds for members of your response frame. • Ensure that your scale is unidimensional.

Unidimensionality • Always Remember – Unidimensionality is never perfect. It is always approximate. • Need to ask: "Is dimensionality in the data big enough to merit dividing the items into separate tests, or constructing new tests, one for each dimension?“ • It may be that two or three off-dimension items have been included in your item instrument and should be dropped. • The question then becomes "Is the lack of unidimensionality in my data sufficiently large to threaten the validity of my results?"

Do my items fall along a unidimensional scale? • We can investigate through • Person and Item Fit Statistics • The Principal Components Analysis of Residuals

A Rasch Assumption • The Rasch model is based on the specification of "local independence". • Meaning that after the contribution of the measures to the data has been removed, all that will be left is random, normally distributed, noise. • When a residual is divided by its model standard deviation, it will have the characteristics of being sampled from a unit normal distribution.

Residual-based Principal Components Analysis • This is not a typical factor analysis • PCAR intention is to explain variance. Specifically, it looks for the factor in the residuals that explains the most variance. • If factor is at the "noise" level, then no shared second dimension. • If factor is above the “noise” level, then it is the "second" dimension in the data. • Similarly, a third dimension is investigated, etc.

Example: Table 23 Table of STANDARDIZED RESIDUAL variance (in Eigenvalue units) Empirical Total variance in observations = 127.9 100.0% Variance explained by measures = 102.9 80.5% Unexplained variance (total) = 25.0 19.5% (100%) Unexpl var explained by 1st factor = 4.6 3.6% (18.5) • The Rasch dimension explains 80.5% of the variance in the data. Is this good? • The largest secondary dimension, "the first factor in the residuals" explains 3.6% of the variance. What do you think?

Table of STANDARDIZED RESIDUAL variance • Empirical: variance components for the observed data • Model: variance components expected for the data if exactly fit the Rasch model • Total variance in observations: total variance in the observations around their Rasch expected values in standardized residual units • Variance explained by measures: variance explained by the item difficulties, person abilities and rating scale structures. • Unexplained variance (total): variance not explained by the Rasch measures • Unexplained variance (explained by 1st, 2nd, ... factor): size of the first, second, ... component in the principal component decomposition of residuals

Unexplained variance explained by 1st factor • The eigenvalue of the biggest residual dimension is 4.6. • Indicating it has the strength of almost 5 items • In other words, the contrast between the strongly positively loading items and the strongly negatively loading items on the first factor in the residuals has the strength of about 5 items. • Since positive and negative loading is arbitrary, it is necessary to look at the items at the top and the bottom of the factor plot. • Are those items substantively different? To the point they merit the construction of two separate tests?

How Big is Big? Rules of Thumb • A "secondary dimension" must have the strength of at least 3 items. If the first factor has an eigenvalue less than 3, then the test is probably unidimensional. • Individual items may still misfit. • Simulation studies indicate that an eigenvalue less than 1.4 is at the random level; larger values indicate there is some structure present (R. Smith). • No established criteria for when a deviation becomes a dimension. • PCA is only indicative, but not definitive.

Consider Liking for Science Output… • Do the items at the top differ substantively from those at the bottom?

If still in doubt… • Split your items into two subtests, based on positive and negative loadings on the first residual factor. • Measure everyone on the two subtests and cross-plot the measures. • What is their correlation? • Do you see two versions of the same story about the persons? • If only a few people are noticeably off-diagonal, then you have a substantively unidimensional test.

Examining Data Constructing Variable: Unidimensionality and PCA Analysis