1 / 20

Structure Equation With Nonnormal Variables

Structure Equation With Nonnormal Variables. Presented in DHPR, NHRI 2004.5. Major Source of Inappropriate Use of SEM. Fail to satisfy the scaling and normality assumption Many measurements are dichotomous or ordered categories, e.g. “agree” “no preference” “disagree”

Télécharger la présentation

Structure Equation With Nonnormal Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure Equation With Nonnormal Variables Presented in DHPR, NHRI 2004.5.

  2. Major Source of Inappropriate Use of SEM • Fail to satisfy the scaling and normality assumption • Many measurements are dichotomous or ordered categories, e.g. “agree” “no preference” “disagree” • Some are continuous, but depart from normal dramatically, e.g. amount of cigarettes smoked by females per day • In 1990, 72 articles published in personality and psychology journals used SEM, only 19% acknowledged normality assumption, less than 10% explicitly considered whether the assumption had been violated

  3. Review of Normal Theory Estimation • Estimation: minimize the difference between each element in S and the corresponding elements in • S is the sample covariance matrix based on observed data • is the covariance matrix implied by a set of parameters for the hypothesized model

  4. Most Commonly Used Estimation Techniques 切面(maximize the function) • Maximum likelihood (ML)

  5. Generalized Least Squares(GLS) Normal Function (used in ML) • Therefore, both ML and GLS are based on normal assumption

  6. More on GLS • Minimize • W-1 is the weight function, most common choice is S –1 • Sample covariance between xiand xj • The large sample distribution of the elements of S is assumed to be multivariate normal W-1

  7. Problems Very Large Sample Continuous Assumptions Multivariate Normal Statistical Properties ? Robustness of the Estimators?

  8. Effects and Detection • The observed variables do not have multivariate normal • The X2 goodness-of-fit test is an accurate assessment of fit, rejecting too many (>5%) true models • Tests of all parameter estimates are expected to be biased, yielding too many significant results • Categorical variables assumed continuous • Correlation is stronger than it should be

  9. Studies on the Effect of Non-Normality • Olsson, Foss, Troye and Howell (Structure Equation Modeling, 2000) REALITY TURE MODEL POPULATION (TURE) STATES AND COVARIANCE Mtrue and  true Theoretic fit True Empirical fit Empirical fit COVARIANCE IMPLIED BY THEORETICAL MODEL  () SAMPLE COVARIANCE S Mtheory and theory THEORETICAL DOMAIN EMPIRICAL DOMAIN

  10. Theoretical fit: the degree of isomorphism between structure and parameter values of a theoretical models and of the “true” model that generates the data. • Empirical fit: the discrepancy between the observed covariance structure and the one implied by a theoretical model. • “True” empirical fit: the correspondence between the population covariance matrix () and the covariance structure implied by the theoretical model ( ())

  11. Comparisons Among ML, GLS, and WLS • The performance in terms of empirical and theoretical fit of the three models is differentially affected by sample size, specification error, and kurtosis. • ML is considerably more insensitive than the others to variation in sample size and kurtosis. Only empirical fit is affected by specification error. In general, ML tends to be more stable, high accuracy • GLS requires well-specified models, but allows small sample sizes. Its appealing performance in terms of empirical fit can be misleading • WLS requires well-specified models as well as large sample sizes.

  12. Detecting Departure From Normal • Skewness and Kurtosis • Skewness ? Kurtosis (+ vs. -) • SAS PROC UNIVARIATE • Univariate vs. Multivariate • When univariate normal is violated in each variable, then multivariate normal (joint distribution) cannot be true. But the converse is not true. • Mardia (1970) measures • Outliers • Checking errors, leverage statistics, etc.

  13. Remedies for Multivariate Nonnormalilty • Alternative Estimation Techniques • Asymptotically Distribution Free Estimator (ADF) • Optimal weight matrix consisting of a combination of second- and fourth- order terms • It has many more elements than the normal theory GLS weight matrix (S-1) • Computation demanding: e.g. 15 measured variables, it has ½*15*16=120 unique elements, the matrix has 120*120=14,400 elements. Inversing the matrix can be difficult. • GLS only take the diagonal of the matrix (120 elements).

  14. SCALED 2 statistic and standard errors (Satorra, 1990) • Corrected or rescaled the 2 • The 2 from ML or GLS is divided by a constant k, whose value is a function of the model-implied residual weight matrix, the multivariate kurtosis, and the degree of freedom for the model. • k as kurtosis adjusted 2 • Its available in EQS program

  15. Bootstrapping • Taking repeated samples from a population of interest • Calculate the parameter estimates of interests resulting in an empirical sampling distribution of the estimates. • Repeated samples of the same sample size are taken from the original sample with replacement. • For example, the original sample consists (1, 2, 3, 4), possible bootstrap samples are (1,4, 1,1), (2,3,1,3), or (4,2,2,4).

  16. Re-expression of Variables • Item Parcels: sum or mean of several items that measure the same domain. • Potential complication in the interpretation of relationships and structure. • Use of too few measured variables as indicators of a domain yields less stringent tests of the proposed structure of confirmatory factor models • Identification problems are more likely to occur

  17. Transformation of variables • Linear transformations (e.g. standardization) have no effect on either the distributions of variables or the results of simple structural equation models • Non-linear transformations potentially alter the distribution of the measured variables as well as the relationships among measured variables, potentially eliminating some forms of curvilinear effects and interactions between variables.

  18. Selecting an Appropriate Transformation • Power function • Positively skewed: generally, raising the scores on the measured variable to a power less than 1.0, e.g. log, squared root, reciprocal • Negatively skewed: raising raw scores to a power greater than 1.0. • Box-Cox transformation: when scattered plots show a possible non-linear relationship between pairs of variables.

  19. About the Transformation • Examine the univariate skewness and kurtosis of the transformed data • Examine the multivariate skewness and kurtosis of the transformed data using Mardia measures • y y*, so the covariance the y* should be computed, not the original • Box-Cox transformation can result in considerable confusion in the interpretation of the variables.

  20. Choice Among Remedies • In large samples (1000 to 5000), ADE and SCALED 2 and standard errors for continuous nonnormal data perform well. • In median samples (200 to 500) depend on the degree of nonnormality • Small samples (nonnormality is not severe) SCALED 2 begin to depart from normality (e.g skewness=2; kurtosis=7) • Variable re-expression is recommended.

More Related