Factor Analysis Example

Factor Analysis Example Qian-Li Xue Biostatistics Program Harvard Catalyst | The Harvard Clinical & Translational Science Center Short course, October 28, 2016

Example: Frailty • Frailty is “a biologic syndrome of decreased reserve and resistance to stressors, resulting from cumulative declines across multiple physiologic systems, and causing vulnerability to adverse outcomes” (Fried et al. 2001) • Common phenotypes of “frailty” in geriatrics include “weakness, fatigue, weight loss, decreased balance, low levels of physical activity, slowed motor processing and performance, social withdrawal, mild cognitive changes, and increased vulnerability to stressors” (Walston et al. 2006)

Example: Frailty Manifest Variables of Frailty: Body composition: Arm circumference Body mass index Tricep skinfold thickness Slowed motor processing and performance: Speed of fast walk Speed of Pegboard test Speed of usual walk Time to do chair stands Muscle Strength: Grip strength Knee extension Hip extension

Recap of Basic Characteristics of Exploratory Factor Analysis (EFA) • Most EFA extract orthogonal factors, which may not be a reasonable assumption • Distinction between common and unique variances • EFA is underidentified (i.e. no unique solution) • Remember rotation? Equally good fit with different rotations! • All measures are related to each factor

Major steps in EFA • Data collection and preparation • Choose number of factors to extract • Extracting initial factors • Rotation to a final solution • Model diagnosis/refinement • Derivation of factor scales to be used in further analysis

Step 1. Data collection and preparation • Factor analysis is totally dependent on correlations between variables. • Factor analysis summarizes correlation structure v1……...vk F1…..Fj v1……...vk v1 . . . vk O1 . . . . . . . . On v1 . . . vk Correlation Matrix Factor pattern Matrix Data Matrix

Example: Frailty (N=547) bmi arm skin grip knee hip uslwalk fastwk chrstand peg ---------------------------------------------------------------------------- bmi 1.00 arm 0.89 1.00 skin 0.65 0.72 1.00 grip 0.25 0.32 0.23 1.00 knee -0.41 -0.36 -0.12 0.01 1.00 hip -0.34 -0.34 -0.10 0.00 0.62 1.00 uslwalk -0.11 -0.03 0.09 0.14 0.26 0.12 1.00 fastwk -0.10 0.01 0.13 0.17 0.29 0.15 0.89 1.00 chrstand 0.04 0.02 -0.08 -0.09 -0.26 -0.14 -0.41 -0.41 1.00 peg 0.05 0.10 0.18 0.24 0.13 0.08 0.33 0.35 -0.29 1.00 ------------------------------------------------------------------------------------------------------------------------ Observed Data Correlation Matrix

Step 2. Choose number of factors • Intuitively: The number of uncorrelated constructs that are jointly measured by the Y’s. • Only useful if number of factors is less than number of Y’s (recall “data reduction”). • Estimability: Is there enough information in the data to estimate all of the parameters in the factor analysis? May be constrained to a certain number of factors.

Step 2. Choosing number of factors Use Principal Components Analysis (PCA) to help decide • Similar to “factor” analysis, but conceptually quite different! • number of “factors” is equivalent to number of variables • each “factor” or principal component is a weighted combination of the input variables Y1 …. Yn: P1 = a11Y1 + a12Y2 + …. a1nYn • Principal components ARE NOT latent variable • Does not differentiate between common and unique variances

Choosing Number of Factors /* Principal Components analysis */ Procfactordata=frailty METHOD=PRIN outstat=abc.pca_all plots=(scree); var bmi arm skin grip knee hip uslwalk fastwk chrstand peg; %parallel(data=frailty, niter=1000, statistic=Median); run;

SAS PCA Output

Step 2. Choosing number of factors • To select how many factors to use, evaluate eigenvalues from PCA • Two interpretations: • eigenvalue  equivalent number of variables which the factor represents • eigenvalue  amount of variance in the data described by the factor. • Criteria to go by: • number of eigenvalues > 1 (Kaiser-Guttman Criterion) • scree plot • parallel analysis • % variance explained • comprehensibility

Choosing Number of Factors

Parallel Analysis(Hayton, Allen, & Scarpello (2004) • Eigenvalues (EV) that would be expected from random data are compared to those produced by the data • If EV(random data) > EV(real data), the derived factors are mostly random noise • How to do this in SAS http://www2.sas.com/proceedings/sugi28/090-28.pdf • How to do this in STATA Type “findit fapara” in STATA to locate the program for free download Reference:http://www.ats.ucla.edu/stat/stata/faq/parallel.htm

Choosing Number of Factors

Accuracy of Retention Criteria • EV > 1 • Tends to always over estimate number of factor • Accuracy increase with small number variables & communalities are high • Scree Test • More accurate than EV>1 • Subjective and sometimes ambiguous • Parallel Test • Most accurate • Becoming the standard

Step 3. Extracting initial factors Using MLE Procfactordata=frailty METHOD=ML priors=smc msa residual rotate=varimax reorder outstat=abc.fa_all plots=(scree initloadings loadings); var bmi arm skin grip knee hip uslwalk fastwk chrstand peg; run;

Step 3. Extracting initial factors Using MLE

Step 4. Factor Rotation • Steps 2 and 3 determines the minimum number of factors needed to account for observed correlations • After obtaining initial orthogonal factors, we want to find more easily interpretable factors via rotations • While keeping the number of factors and communalities of Ys fixed!!! • Rotation does NOT improve fit!

Step 4. Factor Rotation • All solutions are relatively the same • Goal is simple structure • Most construct validation assumes simple (typically rotated) structure. • Rotation does NOT improve fit!

Step 4. Factor Rotation (Varimax)

Step 4. Factor Rotation

Step 4. Factor Rotation (Promax)

Step 4. Factor Rotation (Promax) Varimax Promax

Pattern vs. Structure Matrix

Step 5. Model Diagnostics:Goodness-of-Fit

Step 5. Model Diagnostics: Residual Correlations

Step 5. Model Diagnostics: Partial Correlations

Step 6. Model Refinement: Analysis of Cronbach Alpha /* Cronbach Alpha */ proccorrdata=frailty nomiss alpha plots; var grip knee hip; run; proccorrdata=frailty nomiss alpha plots; var uslwalk fastwk chrstand2 peg; run;

Step 6. Model Refinement: Item Deletion? Uniqueness of Chair Stand = 0.77827 Uniqueness of Grip = 0.85577

Step 7. Derivation of Factor Scores • Each object (e.g. each person) gets a factor score for each factor: • The factors themselves are variables • “Object’s” score is weighted combination of scores on input variables • These weights are NOT the factor loadings! • Different approaches exist for estimating (e.g. regression method) • Factor scores are not unique • Using factors scores instead of factor indicators can reduce measurement error, but does NOT remove it. • Therefore, using factor scores as predictors in conventional regressions leads to inconsistent coefficient estimators!

Step 7. Derivation of Factor Scores Procfactordata=frailty method=ML scoreoutstat=fact priors=smc msa residual rotate=varimax reorder outstat=abc.fa_all plots=(scree initloadings loadings); var bmi arm skin grip knee hip uslwalk fastwk chrstand peg; run; /* Calculate factor scores */ procscoredata=frailty score=fact out=abc.scores; var bmi arm skin grip knee hip uslwalk fastwk chrstand peg; run;

Exploratory vs. Confirmatory Factor Analysis • Exploratory: • summarize data • describe correlation structure between variables • generate hypotheses • Confirmatory • Testing correlated measurement errors • Redundancy test of one-factor vs. multi-factor models • Measurement invariance test comparing a model across groups • Orthogonality tests

CFA: Conceptual Model Usual Walk Body Composition Motor Processing/ Sepeed Muscle Strength Fast Walk Pegboard Hip Strength Knee Strength Arm Circumference Skinfold Thickness BMI

SAS Code /* Confirmatory factor analysis */ proccalisdata=frailty modification; factor Body_Factor ---> bmi arm skin = load1-load3, Speed_Factor ---> uslwalk fastwk peg = load4-load6, Strength_Factor ---> knee hip = load7-load8; pvar Body_Factor Speed_Factor Strength_Factor = 3*1; cov Body_Factor Speed_Factor = 0.; run;

SAS Output: Standardized Loadings

SAS Output: Factor Correlations

Model Fit Statistics • Goodness-of-fit tests based on predicted vs. observed covariances: • 2 tests • d.f.=(# non-redundant components in S) – (# unknown parameters in the model) • Null hypothesis: lack of significant difference between () and S • Sensitive to sample size • Sensitive to the assumption of multivariate normality • 2 tests for difference between NESTED models • Root Mean Square Error of Approximation (RMSEA) • A population index, insensitive to sample size • Test a null hypothesis of poor fit • Availability of confidence interval • <0.10 “good”, <0.05 “very good” (Steiger, 1989, p.81) • Standardized Root Mean Residual (SRMR) • Squared root of the mean of the squared standardized residuals • SRMR = 0 indicates “perfect” fit, < .05 “good” fit, < .08 adequate fit 

Model Fit Statistics • Goodness-of-fit tests comparing the given model with an alternative model • Comparative Fit Index (CFI; Bentler 1989) • compares the existing model fit with a null model which assumes uncorrelated variables in the model (i.e. the "independence model") • Interpretation: CFI×100=% of the covariation in the data can be explained by the given model • CFI ranges from 0 to 1, with 1 indicating a very good fit; acceptable fit if CFI>0.9 • The Tucker-Lewis Index (TLI)or Non-Normed Fit Index (NNFI) • Relatively independent of sample size (Marsh et al. 1988, 1996) • NNFI >= .95 indicates a good model fit, <0.9 poor fit • More about these later

Model Fit Assessment

For comparison of nested models Only requires fitting of the restricted model Based on the score s(u)=logL(u)/ u, where L(u) is the unrestricted likelihood function s(u)=0 when evaluated at the MLE of u The Idea: substitute the MLE of r, assess departure from 0 LM ~ 2 with d.f.=difference in the d.f. of the two nested models Modification index (MI): expected drop in chi-square if the parameter that is fixed or constrained to be equal to other parameters is freely estimated Lagrangian Multiplier Test (LMT)

SAS Output: Modification Indices

Factor Analysis Example

Factor Analysis Example

Presentation Transcript

Factor Analysis

Factor Analysis

Shape Factor Example

Factor Analysis

Factor Analysis

Factor Analysis

Factor Analysis

Factor Analysis

Factor Analysis

Factor Analysis

Factor analysis example

Factor Analysis

Factor Analysis

Factor Analysis

FACTOR ANALYSIS

Factor Analysis

Factor Analysis:

Factor Analysis

Factor Analysis

FACTOR ANALYSIS

Factor Analysis