Modeling longitudinal data Sanja Franić Vrije Universiteit Amsterdam

Modeling longitudinal data Sanja Franić Vrije Universiteit Amsterdam

Introduction • Frequently, researchers are faced with the question of how to optimally utilize longitudinal data • E.g., one may have collected data on children’s cognitive abilities, at ages 10, 12, 14, 16, and 18 • Some of the possible questions:

Introduction • How large is the role of factors that act in concert across different time points to cause the observed stability of the variable of interest over time? • How large is the role of those factors that cause individual differences specific to a certain time point? • Is there a stabile driving force behind the growth or decline of a trait over time, or do novel factors relevant to the trait emerge at different time points? If so, how to detect and quantify them?

Introduction • How well can a trait at a certain time point be predicted from a measurement at the preceding time point? • How much do individuals differ in the starting level of the variable of interest (e.g., in mathematical skills prior to formal education)? • How much do individuals differ in their speed of growth or decline over time? • Does the development of a skill follow a linear curve, or is there non-linear change?

Introduction • In today’s workshop, we will cover several types of models aimed at addressing the above questions

Overview • Cholesky decomposition • Simplex model • Latent growth curve model

Example • You’ve collected longitudinal data on IQ. You applied the Wechsler Intelligence Scale for Children (WISC) at ages 10, 12, 14, and 16.

Example • Structure of the data at each time point:

Example • Subscale scores:

Example • Subscale scores: VCI PRI WMI PSI Age 10

Example • Subscale scores: VCI VCI PRI PRI WMI WMI PSI PSI Age 10 Age 12

Example • Subscale scores: VCI VCI VCI PRI PRI PRI WMI WMI WMI PSI PSI PSI Age 10 Age 12 Age 14

Example • Subscale scores: VCI VCI VCI VCI PRI PRI PRI PRI WMI WMI WMI WMI PSI PSI PSI PSI Age 10 Age 12 Age 14 Age 16

Example • Let us assume all subscale scores at all time points are continuous normally distributed variables (for IQ scores this is a reasonable assumption; with real data, one can test it) • We will demonstrate each of the three methods as applied to this example dataset; we will use path-diagrammatic representations of the data (as presented in the SEM workshops by Dylan Molenaar) • A brief explanation of path diagrams:

Path diagrams • Squares = observed (measured) variables VCI PRI WMI PSI

Path diagrams • Circles = latent variables VCI PRI g WMI PSI

Path diagrams • Single-headed arrows = causal relations in the model VCI PRI g WMI PSI

Path diagrams • Path coefficients = strength of the relationship between variables VCI .5 PRI .6 g .7 WMI .8 PSI

Path diagrams • Double-headed arrows: variances and covariances VCI .5 1 PRI .6 g .7 WMI .8 PSI

Path diagrams • Double-headed arrows: variances and covariances VCI .5 .1 1 PRI .6 g .7 WMI .8 PSI

Path diagrams • Residuals VCI .5 .1 1 PRI .6 g .7 WMI .8 PSI

Overview • Cholesky decomposition • Simplex model • Latent growth curve model

Cholesky decomposition VCI12 VCI14 VCI16 VCI10 PRI12 PRI14 PRI16 PRI10 WMI12 WMI14 WMI16 WMI10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition 1 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition 1 1 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition 1 1 1 PS14 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • - the first factor (PS10) captures all of the variation in perceptual speed at age 10 (PSI10) and the variation in the other three observed variables (PSI12, PSI14, and PSI16) which they share with PSI10 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition - this factor (PS10) represents what is common to all four observed variables 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • →factor that causes stability of the observed measure (perceptual speed) across all four ages 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • - this factor (PS12) represents what is common only to the last three observed variables 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • →factor that causes stability of the observed measure (over and above the stability caused by the first factor, PS10) across the last three ages 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • - the third factor (PS14) represents what is common only to the last two observed variables 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition - the last factor (PS16) represents the variation that is unique to the last variable (PSI16) 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition If we specify a model like this (e.g., in Mplus) and fit it to observed data, we will get estimates of parameters in the model – in this case, the loadings of the observed variables on the latent factors. 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition If we specify a model like this (e.g., in Mplus) and fit it to observed data, we will get estimates of parameters in the model – in this case, the loadings of the observed variables on the latent factors. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ31 λ41 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition If we specify a model like this (e.g., in Mplus) and fit it to observed data, we will get estimates of parameters in the model – in this case, the loadings of the observed variables on the latent factors. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ22 λ31 λ32 λ41 λ42 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition If we specify a model like this (e.g., in Mplus) and fit it to observed data, we will get estimates of parameters in the model – in this case, the loadings of the observed variables on the latent factors. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ22 λ31 λ32 λ41 λ33 λ42 λ43 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition If we specify a model like this (e.g., in Mplus) and fit it to observed data, we will get estimates of parameters in the model – in this case, the loadings of the observed variables on the latent factors. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ22 λ31 λ32 λ41 λ33 λ42 λ43 λ44 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition These loadings (i.e., path coefficients in a path diagram) represent the strength of the relationship between the observed variables and the latent factors. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ22 λ31 λ32 λ41 λ33 λ42 λ43 λ44 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition Knowing that the variability that is common to processing speed at ages 10, 12, 14, and 16 is represented by the first latent factor, and the paths between that factor and each of the observed variables... 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ31 λ41 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition ... and knowing the estimates of the path loading parameters (λs)... 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ31 λ41 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • ... we can quantify the proportion of variance in processing speed at a given time point that is due to factors that cause temporal stability across all four ages. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ31 λ41 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • E.g., at age 12 this proportion is: λ21*var(PS10)*λ21 • = λ21*1*λ21 • = λ212. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ31 λ41 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • E.g., at age 12 this proportion is: λ21*var(PS10)*λ21 • = λ21*1*λ21 • = λ212. • We assume here that the • variance of the observed • variable is 1. Otherwise, to • obtain the proportion, we • have to divide the λ212by the • variance of the observed • variable. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ31 λ41 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • At age 14 it is: λ31*var(PS10)*λ31 • = λ31*1*λ31 • = λ312. 1 1 1 1 PS14 PS16 PS12 PS10 λ11 λ21 λ31 λ41 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • E.g., if these are the factor loading estimates (below), then the factors that cause stability of processing speed (PS) across all ages explain .52=.25 of the variance in PS at age 12, .32=.09 of the variance in PS at age 14, and .22=.04 of the variance in PS at age 16. 1 1 1 1 PS14 PS16 PS12 PS10 1 .5 .3 .2 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • If there are additional sources of stability arising after the initial age of measurement (age 10), those are quantified by the other path coefficients in the model. 1 1 1 1 PS14 PS16 PS12 PS10 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Cholesky decomposition • For instance, if a factor emerges at age 12, which causes additional temporal stability in processing speed across the ages 12-16, over and above the stability caused by factors present at age 10, the strength of influence of that factor is quantified by the coefficients λ22, λ32, and λ42. 1 1 1 1 PS14 PS16 PS12 PS10 λ22 λ32 λ42 PSI12 PSI14 PSI16 PSI10 Age 10 Age 12 Age 14 Age 16

Modeling longitudinal data Sanja Franić Vrije Universiteit Amsterdam

Modeling longitudinal data Sanja Franić Vrije Universiteit Amsterdam

Presentation Transcript

In response to the paradigma shift FROM SURVIVAL TO QUALITY OF LIFE (the greatest ever) : a progressive evolutionary wo

Conceptual Data Modeling Chapter 9

Multilevel Modeling

K-12 Statewide Longitudinal Data System (SLDS)

Process Modeling and Data Flow Diagrams

Chapter 6: The Relational Data Model

MSS Modeling

2-D FEM Modeling of Woodrow Wilson Bridge

Discrete Choice Modeling

Spray G Modeling

Modeling Data in Formal Verification Bits , Bit Vectors, or Words

Scientific Data MINING for Global Modeling of Sustainable Development

Data Management: Databases and Organizations Richard Watson

Universal Modeling: Introduction to ‘Modern’ MDL

Dr Katy Wolstencroft myGrid University of Manchester Vrije Universiteit, Amsterdam

Chapter 7: Principles of Dimensional Modeling and Data Warehousing Database Design

Alloy: A Modeling Language

Chapter 2, Modeling with UML

Lectures on B-physics 19-20 April 2011 Vrije Universiteit Brussel