520 likes | 774 Vues
Goodness of fit in structural equation models. Jose M. Cortina Tiffany M. Bludau George Mason University. SEM and Fit. SEM is an analysis technique We need to know whether data and model are consistent with one another The assessment of “Model Fit” is the assessment of this consistency.
E N D
Goodness of fit in structural equation models Jose M. Cortina Tiffany M. Bludau George Mason University
SEM and Fit • SEM is an analysis technique • We need to know whether data and model are consistent with one another • The assessment of “Model Fit” is the assessment of this consistency
Outline • General discussion of fit • Observed vs. reproduced matrices • Identification • The role of chi-squared • Alternative fit indices • Which to use • Pitfalls
The Two Faces of Fit • The term “Model Fit” is often used to denote overall fit • But assessment of fit comes more directly from consideration of the individual path coefficients and endogenous errors • If the coeffs linking constructs to one another are small, then the data and model are inconsistent!
Overall Fit • As a whole, are the linkages in the model consistent with the relationships among the observed variables? • In one way or another, this is the question addressed by model fit indices • Specifically, fit indices compare observed and reproduced correlation matrices
Reproduced matrices • Same form as the observed matrix • Contains r’s that are implied by the model • Consider a simple mediation model .20 .20 X M Y
X M Y X 1 M .20 1 Y .10 .20 1 X M Y X 1 M .20 1 Y .04 .20 1 Observed vs. Reproduced Lack of fit in this model stems from the discrepancy between the red numbers
Overidentification • The discrepancy is only possible because the model is “overidentified” • There are more knowns (i.e., observed r’s) than unknowns (i.e., coeffs to be estimated) • In this example, there are three knowns and two unknowns • What if we add a third path?
Just identified model .062 .20 .19 Here there are the same number of knowns and unknowns. Observed and reproduced matrices will be identical “Fit” is perfect
Underidentified model In this model, there is one known and there are two unknowns. There are an infinite number of solutions that would reproduce the observed r perfectly
To summarize • In order for a unique solution to exist, a model must be at least just identified • In order for “fit” to be relevant, a model must be overidentified • The closer a model is to being just identified, the better (and less relevant) fit will be
Fit Basics: Chi-squared • Begins with the model R-squared • R2m =1 - (1 - R21)(1 - R22)...(1 - R2E) • This value is computed for both the hypothesized (overidentified) model AND the just identified model • We then use these R-squareds to compute Q=(1-R2saturated)/(1-R2hypo) • X2 = -(N-d)lnQ
Worksheet For the hypothesized or overidentified model: R2var.M = .04, R2var.Y=.04, R2Model = .078 For the saturated or just identified model: R2var.M = .04, R2var.Y=.044, R2Model = .082 Q=(1-.082)/(1-.078)=.9956 Assuming N=101, Χ2 = -(101-1)ln(.9956) = .44 with 1 df
Why isn’t X2 used? • It is a test statistic for badness of fit, and as such, rewards poor designs (i.e, small N) • It is sample size dependent, which means that it doesn’t give effect size information • It does serve as a basis for other indices
What are the alternatives? • There are dozens, among them • NFI = (χ2n - χ2t )/ χ2n or (Fn - Ft)/Fn • PNFI = (dft/dfn)NFIt • GFI = 1 - .5tr(S - Σ)2 • AGFI = 1 - (1 - GFIt) * {[(p+q)(p+q+1)]/2dft} • PGFI = (dft/dfn)GFIt • RMR = SQRT [SUM(S - Σ)2/[(p+q)(p+q+1)]] • IFI = (Fn - Ft)/[Fn – (dft / N-1)]
A few things about that slide • Note that chi-squared appears often • Note the simiplicity of RMR • Note the F’s • Note that df are used to adjust AGFI, PGFI, and PNFI
RMR • RMR stands for Root Mean Squared Residual • It is the square root of the average of the squared differences between the values in the observed and reproduced correlation matrices • The smaller, the better • RMSEA is computed differently, but is very similar and more commonly used • Ranges from 0-1, with .08 being the conventional cutoff
Other indices • For other common indices, the larger, the better • Note the F’s, which stand for Fit Function • For GLS estimators (e.g., ULS, ML, WLS), the fit function is F(Θ) = (S-Σ)’W-1 (S-Σ)
Adjusted fit indices • One problem with most indices is that they reward lack of parsimony • This is true of GFI, for example • The AGFI includes a penalty for lack of parsimony • The PGFI includes a large penalty for lack of parsimony
Other ways to distinguish • Degree of penalty for lack of parsimony is one dimension on which indices differ • There are others • As a set, these dimensions can be used to choose a set of indices that are maximally diagnostic
Tanaka’s (1993) dimensions Population-based/sample-based Parsimony Normed/non-normed Absolute vs. relative Reliance on estimation method Sample size dependence
Theoretical work • A number of theoretical papers, e.g. • Mulaik, James, Van Alstine, & Bennett, 1989 • Medsker, Williams, & Holahan, 1994 • Hu & Bentler, 1999 • Lack of empirical work • What has been done often uses simulated datasets (e.g. Marsh, Balla, & McDonald, 1988)
Little guidance • The literature offers little in the way of guidance with regard to which indices should be reported • Reviewers and editors do no better • So, authors tend to report the indices that are most flattering to their models • We sought to combine Tanaka’s work with empirical work to generate the best set of indices
Specifically • We conducted a meta-analysis of correlations among fit indices • We compiled studies that reported at least two indices, then computed the correlation between each pair of indices • Those indices that are least redundant with other indices offer most unique info
Studies used • Multiple disciplines • Keywords: Structural equation modeling, SEM, covariance structures model, and causal model • Currently have 400+ articles collected • Eliminated articles that: • Were theoretical in nature • Did not report results
Coding of the studies • Two co-authors coded all articles • Coded for: • Discipline, software used, estimation method • Sample size, degrees of freedom • Various fit indices • Coded only the final model
Correlations among indices p < .001,p < .05 RMSEA = Root Mean Square Error of Approximation, NFI = Normed Fit Index, TLI = Tucker-Lewis Index, CFI = Comparative Fit Index, SRMR = Standardized Root Mean Square Residual, GFI = Goodness of Fit, AGFI = Adjusted Goodness of Fit.
Results from factor analysis Ran analysis on 5 indices Dropped the NFI and TLI 2 factor structure 83% of variance r = -.46
Regressions Regressed each dimension onto the remaining dimensions GFI R2= .91 AGFI R2= .94 SRMR R2= .65 RMSEA R2= .61 CFI R2= .46
Recommendations • Select one index from each factor • CFI rather than the GFI or AGFI (Bentler & Bonnett, 1980; Marsh, Balla, & McDonald, 1988) • RMSEA or SRMR • CFI and RMSEA • Tanaka’s dimensions • Formulas • Other information
Recommendations cont’d • Our study could only focus on indices that are commonly reported, and parsimony indices are not among them • We would suggest that PGFI or PNFI also be reported
Tanaka’s dimensions RMSEA Population based Accounts for parsimony Normed Absolute Not estimation method specific Sample size dependent CFI Population based Does not account for parsimony Normed Relative Not estimation method specific Sample size dependent
Formulas of the RMSEA and CFI Note the comparative nature of CFI Note how RMSEA does not account for the null model
Other information • CFI • Works well with ML estimation • Works well will small sample sizes • As # of variables in a model increase, tends to worsen • RMSEA • Does not include comparison with a null model • Tends to improve as # of variables increase • Known distribution • CI’s • Stable across estimation methods and sample sizes
Still in progress • Not able to account for all indices in each study coded • Plan to replicate the correlation matrices and generate missing values for coded studies • Reporting tendencies of researchers • Our plan is to show how patterns ACROSS this set of indices are diagnostic of particular plusses and minuses in a model
Pitfalls • Reward for lack of parsimony • Overemphasis on overall fit • Overemphasis on absolute fit • Fit driven by measurement model • Specification searches
Lack of parsimony • Models with few df generate very good values for almost all fit indices regardless of the quality of the model • In such cases, it is better to focus on the individual path coefficients
Overall fit • Regardless of the magnitude of fit indices, individual path coefficients are very important • It is entirely possible to generate good indices for a bad model • For any given data set, there are many very different models that “fit”
Absolute Fit • Knowledge of fit in an absolute sense is helpful but insufficient • Also helps to know how a model compares to alternatives • Relative fit indices help, but generally involve comparisons against a straw man (e.g., the null model) • Better to evaluate hypothesized model against plausible alternatives (e.g., additive model in MSEM)
Decoupling measurement and structural models • Consider the following model Excluding latent variances and correlated errors, there are 21 path coefficients to be estimated. Only 1 of these is part of the structural model
What happens when the ratio of meas. to struct. linkages is large? • Fit is driven largely by the measurement model • Thus, good fit can be achieved even if the the latent vars. are unrelated to one another • Good fit can be impossible even if the latent vars. are strongly related to one another
Anderson & Gerbing • These authors suggested a two step approach • Evaluate the measurement model in the first step (i.e., CFA). • Once a measurement model is settled upon, its values are fixed. • Only then is the structural model evaluated • Fit indices will then give a better picture of the degree to which hypotheses are supported
Specification searches • If the fit of the hypothesized model is inadequate, one can conduct a specification search • This is an attempt to identify the sources of lack of fit • Modification indices are used most often
Modification indices • MIs give the reduction in chi-squared that would be achieved with the addition of a given path • In many models, inferior fit is due to omission of a small number of paths • So, perhaps we should simply add these paths and move on
Not so fast! • Often, the largest MI values are attached to paths for which there is no theoretical basis • A path should only be added if a theoretical case can be made for it, albeit post hoc
What about correlated errors? • Often, the largest MI are attached to paths AMONG errors (i.e., off-diag elements of theta or psi matrices) • There is seldom (but not never) any theoretical basis for these, so they should not be added • Exceptions include errors attached to isomorphic variables separated by time and errors attached to variables that share components
Cross-validation • Regardless of justification, spec. searches are post hoc • If N is adequate, plan for cross-validation • Separate sample into two parts at the outset • Test hypotheses on the larger part • Conduct spec search • Test modified model on the holdout sample • This reduces capitalization on chance
Overall recommendations • Base conclusions on path coefficients as well • Ignore fit for models with few df • Choose fit indices wisely, for yourself and for others! • Beware the pitfalls • Preempt objections to spec search with cross validation • But most important….