Peter Congdon, Queen Mary University of London Dept of Geography & Centre for Statistics

Bayesian factor and structural equation models in spatial applications. Specification, identification and model assessment, with case study illustrations Peter Congdon, Queen Mary University of London Dept of Geography & Centre for Statistics

Outline • Background: Bayesian approaches to LV models, advantages & disadvantages • Computational options including WINBUGS • Wider application contexts of Bayesian LV & SEM models • Spatial Priors; Common Spatial Factors

Outline (continued) • Different sorts of spatial factor model (depending on form of manifest variables) and possible identification issues • Assessing models, model fit & model choice. Possible variable/model choice approaches • Case studies

Case Studies • Social capital & mental health, multilevel model using Health Survey for England (HSE) • Multilevel model for joint prevalence of obesity & diabetes, BRFSS respondents nested within US counties & states (CDC Behavioral Risk Factor Surveillance System) • Suicide & self-harm, ecological study for small areas (wards) in Eastern England

Background • SEM and factor models originate in (& still most widely used) in psychological, educational & behavioural applications. • Recent Bayesian applications to psychological & education testing data include SEM (e.g. Lee & Song, 2003), LCA, item analysis, and factor analysis per se (e.g. Aitkin & Aitkin, 2005; Press & Shigemasu, 1998). • Also some work on automated Bayesian model choice in normal linear factor model

Advantages of Bayesian Approach • Straightforward to depart from standard assumptions often built into classical estimation methods (e.g. factor scores multivariate normal & independent over subjects) • Advantage in generalizations such as nonlinear factor effects, multiplicative factor schemes

Advantages of Bayesian Approach (continued) • Random effect models (of which factor/SEM models are subclass) can be fitted without relying on numerical methods to integrate out random effects • Potential for Bayesian model choice procedures (e.g. stochastic search variable selection) in factor/SEM models

Disadvantages of Bayesian Approach • Identification issues (re “naming” of factors): can have label switching for latent constructs during MCMC updating if there aren’t constraints to ensure consistent labelling. • Slow convergence of model parameters or global model fit measures (e.g. DIC and effective parameter estimate) in large latent variable applications (e.g. 1000 or 10000 subjects)

Disadvantages of Bayesian Approach • Formal Bayes model assessment (marginal likelihoods/Bayes factors) difficult for large realistic applications • Sensitivity to priors on hyperparameters (e.g. priors for factor covariance matrix) • Bayesian approach may need sensible priors when applied to factor models (“diffuseness“ not necessarily suitable)

Bayesian Computing • Many Bayesian applications to SEM and factor analysis facilitated by WINBUGS package. • See Congdon (Applied Bayesian Modelling, 2003); Lee (Structural Equation Modeling: a Bayesian Approach, 2007) • Alternative is R…more programming involved • BayesX can’t model common factors

WINBUGS • Despite acronym, WINBUGS employs Metropolis-Hastings updating where necessary as well as Gibbs sampling • Program code is essentially a description of the priors & likelihood, but can monitor model-related quantities of interest

Computing Illustration: a Normal SEM • Wheaton Study: 3 latent variables, each measured by two indicators. Alienation67 measured by anomia67 (1967 anomia scale) and powles67 (1967 powerlessness scale). • Alienation71 is measured in same way, but using 1971 scales. • Third latent variable, SES (socio-economic status) measured by years of schooling and Duncan's Socioeconomic Index, both in 1967.

Structural model relates alienation in 1971 (F2) to alienation in 1967 (F1) and SES (G) F2i = βF1i + g2Gi+u2i F1i = g1Gi + u1i Measurement model for alienation yji=aj +ljF1i j=1,2 yji=aj +ljF2i j=3,4 Measurement model for SES xji=dj +kjGi j=1,2

WINBUGS for Wheaton study model { for (i in 1:n) { # structural model F2[i] ~ dnorm(mu.F2[i],1); mu.F2[i] <- beta* F1[i]+gam[2]*G[i] F1[i] ~ dnorm(mu.F1[i],1); mu.F1[i] <- gam[1]*G[i]} # priors (normal uses inverse variance) for (j in 1:2) {gam[j] ~ dnorm(0,0.001)} beta ~ dnorm(0,0.001)

# measurement equations for alienation for (i in 1:n) { for (j in 1:4) { y[i,j] ~ dnorm(mu[i,j],tau[j])} mu[i,1] <- alph[1]+lam[1]* F1[i]; mu[i,2] <- alph[2]+lam[2]* F1[i] mu[i,3] <- alph[3]+lam[3]* F2[i]; mu[i,4] <- alph[4]+lam[4]* F2[i]} # PRIORS for (j in 1:4){ alph[j] ~ dnorm(0,0.001); # gamma prior on precisions tau[j] ~ dgamma(1,0.001) # alternative prior starts with s.d. of residuals # sd.y[j] ~ dunif(0,100); tau[j] <- 1/(sd.y[j]*sd.y[j]) # identifiability constraint on loadings to ensure # positive alienation measure lam[j] ~ dnorm(1,1) I(0,)}

# measurement of SES (G[i]) for (i in 1:n) { G[i] ~ dnorm(0,1) for (j in 1:2) { x[i,j] ~ dnorm(mu.x[i,j],tau.x[j])} mu.x[i,1] <- del[1]+kappa[1]* G[i]; mu.x[i,2] <- del[2]+kappa[2]* G[i]} for (j in 1:2) {del[j] ~ dnorm(0,0.001); # gamma prior on precisions tau.x[j] ~ dgamma(1,0.001) # identifying constraint ensures +ve SES scale kappa[j] ~ dnorm(1,1) I(0,)}}

Monitoring model related quantities • Suppose one were interested in posterior probs that F2i > F1i (alienation increasing for ith subject) • Add code for (i in 1:n) {delF[i] <- step(F2[i]-F1[i])} • Then posterior means of delF provide required probabilities

Widening Applications of Latent Variable Methods • In particular: application contexts of Bayes SEM/factor models now include ecological (area level) studies of health variations. • Usually no longer valid to assume units (i.e. areas) are independent. • Instead spatial correlation in latent variable(s) (common spatial factors) over the areas should be considered

Multi-Level Latent Variable Models • Latent variable methods also more widely applied in multilevel health studies • Such models consider joint impact of individual level and area level risk factors on health status • With several outcomes (data both multivariate & multilevel) can model area effects using common factor(s)

SOME SPATIAL PRIORS: THE BASIS FOR COMMON SPATIAL FACTORS

Priors incorporating spatial structure: basis for common spatial factors • May be specified over continuous space (geostatistical models often used for “kriging”) • OR for discrete sets of areas with irregular boundaries (“lattices” or “polygons”) • Major classes: • Simultaneous Autoregressive (SAR) or Conditional Autoregressive (CAR) priors

Spatial Priors • My focus: CAR priors for “lattices” (e.g. administrative areas) • These are priors for “structured” effects (where labels of area units are important) as opposed to unstructured effects (unaffected or exchangeable over different labelling scheme for areas)

Substantive Basis • Generally taken to represent unmeasured area level risk factors for health that vary relatively smoothly over space (regardless of arbitrary administrative boundaries that may define units of analysis) • Substantive grounding: increased recognition of genuine spatial effects on health (“contextual” effects)

DIFFERENT TYPES OF COMMON SPATIAL FACTOR

(A) Manifest health variables • Manifest variables are health outcomes yij (areas i, variable j) • Common residual factor si, expresses spatial clustering recurring over several outcomes j • Interpretable as index of common health risks over outcomes • Example: Wang & Wall 2003

(B) Census Indicator Confirmatory Model. • Common Spatial Socioeconomic Factor or Factors (deprivation, rurality, etc) based on relevant indicators Zik (k=1,..,K) such as unemployment, low income etc. • Often census indicators form bulk of manifest variables • Example: Hogan & Tchernis JASA 2004

(C) Two Classes of Manifest Variable • Common factor(s) used to explain variations in observed Y variables (health outcomes). • But factors mainly measured by socioeconomic indicators Z (e.g. census data) • Example: my Eastern region suicide study • Partly confirmatory, partly exploratory

MANIFEST VARIABLES: AREA HEALTH VARIABLES

(A) Shared Spatial Residual Effects • Unobserved area effects common to several health outcomes modelled by shared spatial effect • Typical scenario: area counts yij for areas i and outcomes j. Poisson or binomial likelihood

Types of Event • May be deaths, hospitalizations, incidence counts for different cancer types, prevalence counts, etc • Expected events (offset) Eij based on standard age rates applied to area populations: yij ~ Poisson(Eijrij) • Can also have populations at risk: yij ~ Poisson(Nirij) or yij ~ Bin(Ni,pij)

Multivariate Spatial Effects • One option for such data: no reduction • Multivariate residual effects log(rij)=aj+sij (or log(rij)=aj+jxi+sij) • For sij could use multivariate version of conditional autoregressive prior

Multivariate Spatial Effects • Multivariate normal CAR Prior is example of Markov Random Field (Rue & Held, 2005). • Easily applied in WINBUGS using mv.car prior. • May fit well but proliferation of parameters (more parameters than data points)

Alternative : common spatial factor • log(rij)=aj+ljsi • Parsimonious and provides interpretable summary measure of health risk • si is univariate CAR (or some other prior with spatial dependence) • Correlation between outcomes within areas modelled via loadings lj.

Identification: Location & Scale • Need isi=0 for location identification. Centre effects at each MCMC iteration. • Scale identifiability: EITHER set var(s)=1 and all lj are free loadings (fixed scale), OR leave var(s) unknown and constrain a loading, e.g. l1=1.0 (anchoring constraint)

Labelling Problems in Repeated Sampling • Even in simple model, labelling may be an issue. • Consider fixed variance identification option, var(s)=1, loadings all unknown. • Suppose diffuse priors are taken on loadings in log(rij)=aj+ljsi without directional constraint.

Labelling Problems (continued) • Then can have: a) lj all positive combined with si acting as positive measure of health risk (higher si in areas with higher cancer rates) OR b) lj all negative combined with si acting as negative measure of health risk (si higher in areas with lower cancer rates)

Identifying constraints for consistent labelling • For unambiguous labelling advisable to constrain one or more lj to be positive (e.g. truncated normal or gamma prior) • Note that anchoring constraint with var(s) unknown, and preset loading (e.g. l1=1.0), may be intrinsically better identified – steers remaining unknown l coefficients to consistent labelling

Loadings and Labellings • May not be sufficient just to rely on constraining one loading (e.g. assume +ve) to ensure consistent labelling • Sometimes said that constraining direction on one loading ensures consistent identification… • What if indicator chosen for constrained loading (e.g. lii> 0) is poor measure for construct

Loadings and Labellings • If twenty indicators are measuring a construct, the 19 unconstrained loadings may “fit” a different label (e.g. deprivation) to that implied by the remaining constrained loading (e.g. affluence) • Personal View: Much depends on suitable selection of manifest indicators and which (and how many, maybe >1 ) are chosen to have constrained loadings

WINBUGS Code for manifest variable scenario A

Extensions of Spatial Common Factors • Product schemes. Consider health outcomes arranged by area i and age x. Populations at risk Nix yix ~ Poisson(Nixnix) log(nix)=ax+xxsi • xx show which age groups are most sensitive to spatial variations in risk represented by si • Variation on Lee-Carter (JASA 1992) mortality forecasting model

Random Effect Loadings • xx potentially random, rather than fixed effects. • Identified using sum to 1 or averaging to 1 constraint, e.g. xx multinomial, or xx~Gamma(h,h)

Nonlinear effects of common factor • One possibility: just take powers of si, e.g. log(rij)=aj+ljsi+kjs2i • Or: spline for nonlinear effects in common factor score si. • e.g. under fixed variance var(s)=1 option, locate knots wk at selected quantiles on cumulative standard normal.

Linear Spline • Then linear spline log(rij)=aj+ljsi+Skbjk(si- wk)+ • bjk might be random effects, but raises identification issues…?

INDICATOR BASED SPATIAL CONSTRUCTS

(B) Indicator Based Spatial Constructs • Many studies use latent constructs to analyze population health variations. • Such constructs (e.g. deprivation) not directly observed • Instead derived from a collection of relevant indicator variables that are observed, using multivariate techniques or other “composite variable” methods • Many health outcomes show “deprivation gradient”

Latent Constructs in Population Health • Example: Townsend deprivation score based on summing standardized census area values for 4 input variables (sum of “z scores”) • % unemployed, % with no car, % households overcrowded, % households not owner occupiers

Other area constructs • Other examples of latent constructs relevant to area health variations: rurality/urbanicity, social fragmentation • Social fragmentation scores used to analyze variations in area suicide rates and psychiatric hospitalization rates

Peter Congdon, Queen Mary University of London Dept of Geography & Centre for Statistics