General Structural Equations (LISREL)

General Structural Equations(LISREL) Week 3 #4 Mean Models Reviewed Non-parallel slopes Non-normal data

Models for Means and Intercepts (continued) Multiple Group Models: For “zero order” latent variable mean differences: • “free” individual measurement equation intercepts but constrain them to equality across groups • Fix the latent variable means to 0 in group 1 • Free the latent variable means in groups 2->k • If the latent variables of interest are endogenous and if there are exogenous latent variables in the model, constrain construct equation path coefficients to zero.

Models for Means and Intercepts (continued) For “zero order” latent variable mean differences: • “free” individual measurement equation intercepts but constrain them to equality across groups • Fix the latent variable means to 0 in group 1 • Free the latent variable means in groups 2->k • If the latent variables of interest are endogenous and if there are exogenous latent variables in the model, constrain construct equation path coefficients to zero. • Individual LV mean parameters represent contrast with (differerence from) “reference group” (group with LV mean set to zero; LR tests requested for joint hypotheses (e.g, constrain means to zero in all groups vs. model with groups 2->k freed) • Check modification indices on measurement equation intercepts to verify “proportional indicator differences” assumption holds (or at least holds approximately)

AMOS Programming • Check off “means and intercepts” • Means and intercepts will now appear on diagram. Where variances used to appear, there will now be two parameters (mean + variance); where the variable is dependent, one parameter (intercept) will appear. • Impose appropriate parameter constraints • [insert brief demonstration here!]

Review yesterday’s slides from slide 52 • Uses World Values Study 1990 data for an example • We’ll use an updated version (new data, some difference in countries) today • Refer to handout (slides not reproduced)

Means1a.LS8 - tau-x elements allowed to vary between countries. Must fix kappa (mean of ksi’s) to 0 since otherwise not identified. Chi-square=233.65 df=42 United States: TAU-X A006 F028 F066 F063 F118 F119 -------- -------- -------- -------- -------- -------- 1.6191 3.6383 2.2287 8.5530 4.7504 2.9739 (0.0263) (0.0688) (0.0563) (0.0733) (0.0941) (0.0749) 61.4969 52.8937 39.5980 116.6334 50.4717 39.6838 TAU-X F120 F121 -------- -------- 4.3443 5.9000 (0.0883) (0.0757) 49.2263 77.9553 CANADA: TAU-X A006 F028 F066 F063 F118 F119 -------- -------- -------- -------- -------- -------- 2.1202 4.7402 3.2042 7.4657 5.4974 3.3091 (0.0236) (0.0612) (0.0551) (0.0706) (0.0812) (0.0646) 89.9232 77.4453 58.1887 105.7780 67.7035 51.2445 TAU-X F120 F121 -------- -------- 4.4986 6.0079 (0.0713) (0.0636)

Means1b.ls8 Measurement model like means1a, but now we are expressing group 1 versus group 2 differences in means by 2 parameters (1 for each latent variable) as opposed to calculating them for each indicator using, e.g., TX 1 [1] – TX1 [2]. Chi-square=276.27 df=48 KAPPA in Group 2 (Canada) [Kappa in Group 1 is fixed to zero] KSI 1 KSI 2 -------- -------- 1.0712 0.3236 (0.0731) (0.0948) 14.6538 3.4138 Above provides significance tests for: Canada-U.S. differences in religiosity (z=14.6538, p<.001) Canada-U.S. differences in sex/morality attitudes (z=3.4138, p<.001) For a joint significance test to see if both the means for Religiosity and Sex/morality are different (null hypothesis, differences both = 0), see program Means1c.ls8. Chi-square = 512.9661 df=50 for this model; subtract chi-squares (512-276) for test (df=2).

Diagnostics for this model: See Modification Indices for TX vectors: USA Modification Indices for TAU-X A006 F028 F066 F063 F118 F119 -------- -------- -------- -------- -------- -------- 0.6495 0.2995 2.8724 8.2808 27.0494 2.0749 Modification Indices for TAU-X F120 F121 -------- -------- 12.1313 5.2727 CANADA Modification Indices for TAU-X A006 F028 F066 F063 F118 F119 -------- -------- -------- -------- -------- -------- 0.6495 0.2995 2.8725 8.2808 27.0495 2.0749 Modification Indices for TAU-X F120 F121 -------- -------- 12.1312 5.2728 Expected Change for TAU-X A006 F028 F066 F063 F118 F119 -------- -------- -------- -------- -------- -------- 0.0164 0.0238 0.0593 0.1261 0.3003 0.0637 Expected Change for TAU-X F120 F121 -------- -------- -0.1981 -0.1015

Means2a Model with exogenous single-indicator variables. Single indicator ksi-variables: gender, age, education. Specification GA=IN in group 2 implies a parallel slopes model. Thus, the AL parameters in group 2 can be interpreted as “group 1 vs. group 2 differences, controlling for differences in sex, education and age”. TAU-X GENDER AGE EDUC -------- -------- -------- 0.4217 42.3840 4.5365 (0.0146) (0.4750) (0.0413) 28.9360 89.2300 109.8409 ALPHA ETA 1 ETA 2 -------- -------- 1.2272 0.5898 (0.0714) (0.0954) 17.1899 6.1819 KAPPA GENDER AGE EDUC -------- -------- -------- -0.0196 3.3360 -0.4151 (0.0187) (0.6297) (0.0504) -1.0482 5.2977 -8.2333

Diagnostics: Test of equal slopes (GA=IN) assumption: Modification Indices for GAMMA GENDER AGE EDUC -------- -------- -------- ETA 1 7.7083 6.9705 0.2122 ETA 2 3.1923 0.1765 9.3836 A global test will require the estimation of a separate model (Means2b) with GA=PS (parallel slopes assumption relaxed). Chi-square df CFI Chi-square comparisons Means2a: 699.807 90 .9635 Means2b: 669.594 84 .9649

Means2b ALPHA CANADA (FIXED TO 0 IN US) ETA 1 ETA 2 -------- -------- 1.2545 0.6371 (0.0725) (0.0968) 17.3057 6.5809 GAMMA - USA GENDER AGE EDUC -------- -------- -------- ETA 1 0.6845 -0.0170 0.0817 (0.1003) (0.0031) (0.0352) 6.8230 -5.5398 2.3209 ETA 2 0.0624 -0.0144 0.3074 (0.1462) (0.0045) (0.0520) GAMMA-Canada GENDER AGE EDUC -------- -------- -------- ETA 1 0.9597 -0.0308 0.1525 (0.0931) (0.0028) (0.0389) 10.3099 -11.1125 3.9173 ETA 2 -0.0936 -0.0246 0.5333 (0.1200) (0.0036) (0.0521)

Expressing effects when parallel slope assumption is relaxed: is pattern diverging, converging, crossover? Equations: Eta1 = alpha1 + gamma1 Ksi 1 + gamma2 Ksi2 + gamma3 Ksi 3 + zeta1 Hold constant at the 0 values of all Ksi variables except one. Not quite the overall mean (Ksi=0 in group 1, but in group 2 it’s 0 + kappa), but close enough. In group 1, alpha1 = 0, equation is: Eta1 = gamma1 [1]Ksi1 [+alpha1=0 + gamma2 Ksi2=0 + gamma3 Ksi3=0 + zeta1 where E(zeta1)=0 In group 2, alpha1 = alpha1[2] Eta1 = alpha1[2] + gamma1[2] Ksi1 [+ other terms =0] Now, the question is, at what values do we evaluate the equation? 1. Ksi1=0 This is the Ksi1 mean in group 1. (we could, alternatively use something like kappa1[2]/2, which is half way between the group 1 and the group 2 mean of kappa1 … or even a weighted version) 2. Ksi1 = 0 + k standard deviations, where k can be any reasonable number 1? 1.5? 2.0? 3. Ksi1 = 0 – k standard deviations.

How do we find the standard deviation of Ksi? Look at the PHI matrix to obtain variances, and take the square root of these! PHI USA GENDER AGE EDUC -------- -------- -------- GENDER 0.2441 (0.0102) 23.9687 AGE -0.4381 259.2400 (0.2350) (10.8158) -1.8642 23.9687 EDUC 0.0251 1.7457 1.9599 (0.0204) (0.6670) (0.0818)

For education, if we had a pooled estimate (Canada + US) we could use it, otherwise, we can be approximate 1.9599, 1.4733 ~ 1.72 sqrt(1.72) = 1.3. So we will want to evaluate at EDUC=0, EDUC=+1.3 (or perhaps +2.6?), EDUC=-1.3 (or perhaps -2.6?). At Educ=0, Canada-US difference is 1.2545 (see alpha parameter, above) USA=0 Canada=1.2545 At Educ=-2.6, USA= 0 + (-2.6 * .0817) [usa gamma for educ = .0817] = -.2124 Canada = 1.2545 + (-2.6 * .1525) [Canadian gamma for educ = .1525] = 858 At Educ = +2.6, USA = 0 + (2.6 * .0817) = .2124 Canada = 1.2545 + (2.6 * .1525) = 1.651

For age, approximate variance is sqrt (270) = 16.43. We could thus use 0 ± 16.43 or 0 ≠ 32.86 (or 0 ≠ (1.5 * 16.43) or if we knew that the mean was approximately 42 (see tau-x parameter), we could simply do something like ± 20 years (more intuitive)

Models for Four Groups • Canada • U.S.A. • Germany • U.K. Means3a GA=PS Chi-square = 1892.25 df=180 Means3b GA=IN Chi-square = 1986.94 df=198

Formulas: USA: =0.0738*B8 Canada: =1.087+(B8*0.1457) UK : =2.4339+(B8*-0.1167) Germany: =1.8139+(B8*0.0957) [B8 refers to the first education row. Formula becomes B9, B10 For rows below]

Dealing with data that are not normally distributed within the traditional LISREL framework Questions: how bad is it if our data are not normally distributed? what can we do about it? are there easy “fixes”?

Non-Normal Data How about just ignoring the problem? Early 1980s: Robustness studies. • Major findings: • In almost all cases, using LV models better than OLS even if data non-normal • (assumes multiple indicators available) • some discussion of conditions under which parameters might not be accurate (e.g., low measurement coefficient models)

Non-Normal Data • Early articles: • A. Boomsa, On the Robustness of LISREL • Johnson and Creech, American Sociological Review, 48(3), 1983, 398-403 • Henry, ASR, 47: 299-307 • (related: Bollen and Barb, ASR, 46: 232-39) • See a good summary of early and later simulation studies: West, Finch andCurran in Hoyle.

Non-Normal Data • See a good summary of early and later simulation studies: West, Finch and Curran in Hoyle. • Formal properties:

Non-Normal Data • Many of the studies have involved CFA models • E.g., Curran, West, Finch, Psych. Methods, 1(1), 1996. • General findings (non-normal data): • ML, GLS produce X2 values too high • Overestimated by 50% in simulations • GLS, ML produce X2 value slightly larger when sample sizes small, even when data are normally distributed • Underestimation of NFI, TLI, CFI • Also underestimated in small samples esp. NFI • Moderate underestimation of std. errors (phi 25%, lambda 50%)

Non-Normality • Detection: • ur = E(x – ur)r kurtosis  4th moment • Mean of 3 standardized: u4 / u22 • Standardized 3rd moment u3/ (u2)3/2 • Tests of statistical significance usually available (Bollen, p. 421) b1, b2 (skew,kurt) • N(0,1) test statistic for Kurtosis (H0: B2 – 3 = 0) • Different tests (one approx. requires N>1000) • Joint test κ2 Approx. distr. as X2, df=2 • Mardia’s multivariate test: skewedness, kurtosis, joint.

Non-Normality An alternative estimator: Fwls (also called Fagls): [s –σ(θ)’ w-1 [s –σ(θ)] Browne, British Journal of Mathematical and Statistical Psychology, 41 (1988) 193ff. also 37 (1984), 62-83 Optimal weight matrix? asymptotic covariance matrix of sij Acov(sij,sgh) = N-1 (σijgh - σij σgh) Sijgh = 1/N Σ (zi)(zj)(zg)(zh) where zi is the mean-deviated value If multinomial: σijgh = σij σgh + σjg σjh + σjh σjg (reduces to GLS) W-1 is ½ * (k)(k+1) + ½ (k)(K+1)

Non-Normality An alternative estimator: Fwls (also called Fagls): [s – σ(θ)’ w-1 [s – σ(θ)] W-1 is ½ * (k)(k+1) + ½ (k)(K+1) Computationally intense: 20 variables: 22,155 distinct elements To be non-singular, N must be > p + ½ (p)(p+1) 20 variables: minimum 230 30 variables: minimum 495 Older versions of LISREL used to impose higher restrictions (refused to run until thresholds well above the minima shown above were reached)

Non-Normality An alternative estimator: Fwls (also called Fagls): [s – σ(θ)’ w-1 [s – σ(θ)] W-1 is ½ * (k)(k+1) + ½ (k)(K+1) The AGLS estimator is commonly available in SEM software • LISREL 8 • AMOS • SAS-CALIS • EQS • Be careful! Not really suitable for small N problems • Good idea to have sample sizes in the thousands, not hundreds.

Non-Normality An alternative estimator: Fwls (also called Fagls): [s – σ(θ)’ w-1 [s – σ(θ)] W-1 is ½ * (k)(k+1) + ½ (k)(K+1) The AGLS estimator is commonly available in SEM software • LISREL 8: ME=WL in OU statement; must also provide asymptotic covariance matrix generated by PRELIS • AC FI= statement follows CM FI= statement • AMOS: check box on analysis options Again, the problem is that this estimator can be unstable given the size of the matrix (acov) that needs to be inverted (especially in moderate sample sizes)

Non-Normality Sample program in LISREL with adf estimator: LISREL model for religiosity and moral conservatism Part 2: ADF estimation DA NI=14 NO=1456 CM FI=h:\icpsr2003\Week4Examples\nonnormaldata\relmor1.cov ACC FI=h:\icpsr2003\Week4Examples\nonnormaldata\relmor1.acc SE 1 2 3 4 5 6 7 8 9 10 11 12 13 14/ MO NY=11 NX=3 NE=2 Nk=3 fixedx ly=fu,fi ga=fu,fr c ps=sy,fr te=sy va 1.0 ly 1 1 ly 8 2 fr ly 2 1 ly 3 1 ly 4 1 ly 5 1 fr ly 6 2 ly 7 2 ly 9 2 ly 10 2 ly 11 2 fr te 2 1 te 11 10 te 7 6 ou me=ml se tv sc nd=3 mi

Non-Normality Generating asymptotic covariance matrix in PRELIS

Non-Normality Generating asymptotic covariance matrix in PRELIS Resultant matrix will be much larger than covariance matrix

Non-Normality ADF estimation LISREL model for religiosity and moral conservatism Part 2: ADF estimation DA NI=14 NO=1456 CM FI=h:\icpsr99\nonnorm\relmor1.cov ACC FI=h:\icpsr99\nonnorm\relmor1.acc SE 1 2 3 4 5 6 7 8 9 10 11 12 13 14/ MO NY=11 NX=3 NE=2 Nk=3 fixedx ly=fu,fi ga=fu,fr c ps=sy,fr te=sy va 1.0 ly 1 1 ly 8 2 fr ly 2 1 ly 3 1 ly 4 1 ly 5 1 fr ly 6 2 ly 7 2 ly 9 2 ly 10 2 ly 11 2 fr te 2 1 te 11 10 te 7 6 ou me=wl se tv sc nd=3 mi

Non-Normality ML, scaled statistics LISREL model for religiosity and moral conservatism Part 2: ADF estimation DA NI=14 NO=1456 CM FI=h:\icpsr2003\Week4Examples\nonnormaldata\relmor1.cov ACC FI=h:\icpsr2003\Week4Examples\nonnormaldata\relmor1.acc SE 1 2 3 4 5 6 7 8 9 10 11 12 13 14/ MO NY=11 NX=3 NE=2 Nk=3 fixedx ly=fu,fi ga=fu,fr c ps=sy,fr te=sy va 1.0 ly 1 1 ly 8 2 fr ly 2 1 ly 3 1 ly 4 1 ly 5 1 fr ly 6 2 ly 7 2 ly 9 2 ly 10 2 ly 11 2 fr te 2 1 te 11 10 te 7 6 ou me=ml se tv sc nd=3 mi

Non-Normality Low tech solutions: For variables that are continuous, TRANSFORMATION • See classic regression texts such as Fox • Common transformations: • X  log(X) (usually natural log) • X  sqrt (X) • X  X2 • X  1/ X (even harder to interpret since this will result in sign reversal) • Transforming to remove skewedness often/usually removes kurtosis, but this is not guaranteed • “Normalization” as an extreme option (e.g., map rank-ordered data onto N(0,1) distribution).

Non-Normality Generally, if kurtosis between +1 and -1, not considered too problematic (See Bollen, 1989) From this…….

Transformations AMOS: Transformations must be performed on SPSS dataset. Save new dataset, and work from this. (e.g, COMPUTE X1 = LOG(X1).) • LISREL: Transformations can be performed in PRELIS. • PRELIS already provides distribution information on variables as a “check” PRELIS “compute” dialogue box under transformations Remember to SAVE the Prelis dataset after each transformation. Use of stat package (SPSS, Stata, SAS) may be preferable

Transformations All the usual caveats apply: • If a variable only has 4-5 values, transformation will not normalize a variable (at the very least, will still have tucked-in tails) – though it could help bring it closer to within the +1  -1 range (Kurtosis) • If a categorized variable has one value with a majority of cases, then no transformation will work • If the variable has negative values, make sure to add a constant (“offset”) before logging

Other solutions: • Robust test statistics (Bentler) Implementation: EQS, LISREL 2. Muthen has recently developed a WLSM (mean-adjusted) and WLSMV (mean and variance adjusted) estimator Implementation: MPLUS only 3. Bootstrapping Implementation: AMOS (easy to use) LISREL (awkward) 4. CATEGORICAL VARIABLE MODELS (CVM).

Bootstrapping • Computationally intensive • Sampling with replacement; from resampling space R draw bootstrap sample S*n,j where j=# of samples, n=bootstrap n • Typically, bootstrap N = sample N • Repeat resampling B times, get set of values • Issue: what if, across 200 resamples, 2 of them have ill-defined matrices? • Usually, these are discarded

Bootstrapping • Computationally intensive • Sampling with replacement; from resampling space R draw bootstrap sample S*n,j where j=# of samples, n=bootstrap n • Typically, bootstrap N = sample N • Repeat resampling B times, get set of values • Issue: what if, across 200 resamples, 2 of them have ill-defined matrices? • Usually, these are discarded • Tests: 5% confidence intervals (want large # of samples… confidence intervals do not need to be symmetric (can look to value at 95th percentile and at 5th among bootstrapped samples). • More common to compute standard errors

Bootstrapping • Overall model X2 correction (available in AMOS).. Bollen and Stine. • Yang and Bentler (chapter in Marcoulides & Schumacker): • “faith” in bootstrap based on its appropriateness in other app’s • Simulation study, 1995, if explor. factor analysis … rotated solutions close, but not so with unrotated solutions • “It seems that in the present stage of development, the use of the bootstrap estimator in covariance structure analysis is still limited. It is not clear whether one can trust the bias estimates.”

Bootstrapping • Ichikawa and Konishi, 1995 • When data multinormal, bootstrap se’s not as good as ML • Bootstrap doesn’t seem to work when N<150 consistent overestimation (at N=300, not a problem though).

The Categorical Variable Model Conceptual background: We observe y interested in latent y* with C discrete values Yi = Ci – 1 if vi,ci-1 < yi* where v is a threshhold Yi = Ci – 2 if vi,ci-2 < yi* ≤ vi,ci-1 Yi = Ci – 3 if vi,ci-3 < yi* ≤ ≤ vi,ci-2 ….. • If v1,1 if vi,1 < yi* ≤ vi,2 0 if yi* ≤ vi,1 v’s are threshhold parameters to be estimated.

The Categorical Variable Model Observed and Latent Correlations X-variable scale y-variable scale Observed correl. Latent corr. Continuous continuous pearson pearson Contiuous categorical pearson polyserial Continuous dichtoomous point-biserial biserial Categorical categorical pearson polychoric Dichotomous dichotomous phi tetrachoric If it is reasonable to assume that continuous and normally distributed y* variables underlie the categorical y variables… a variety of latetn correlations can be specified.

The Categorical Variable Model If it is reasonable to assume that continuous and normally distributed y* variables underlie the categorical y variables… a variety of latetn correlations can be specified. First step: estimate thresholds using ML Second step: latent correlations estimated Third step: obtain a consistent estimator of the asymptotic covariance matrix of the latent correlations (for use in a weighted least squares estimator in the SEM model). Extreme case: ability to recover y* model when variables split into 25%/75% dichotomies: promising (though X2 underestimated)

General Structural Equations (LISREL)