Multivariate models for fMRI data

Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU)Institute for Biomedical EngineeringUniversity of Zurich & ETH Zurich Multivariate models for fMRI data

Why multivariate? Univariate approaches are excellent for localizing activations in individual voxels. n.s. * v1 v2 v1 v2 reward no reward

Why multivariate? Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels. n.s. n.s. v1 v2 v1 v2 v2 v1 orange juice apple juice

Why multivariate? modulatoryinput t drivinginput t Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths. signal signal signal observed BOLD signal activity activity activity hidden underlying neural activity and coupling strengths Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage

Overview 1 Modelling principles 2 Classification 3 Multivariate Bayes 4 Generative embedding

Encoding vs. decoding encoding model decoding model condition stimulus response prediction error context (cause or consequence) BOLD signal

Regression vs. classification • Regression model independentvariables (regressors) continuousdependent variable • Classification model independentvariables (features) categoricaldependent variable (label) vs.

Univariate vs. multivariate models A univariate model considers a single voxel at a time. A multivariate model considers many voxels at once. context BOLD signal context BOLD signal Spatial dependencies between voxels are only introduced afterwards, through random field theory. Multivariate models enable inferences on distributed responses without requiring focal activations.

Prediction vs. inference The goal of prediction is to find a generalisable encoding or decoding function. • The goal of inference is to decide between competing hypotheses. • comparing a model that links distributed neuronal activity to a cognitive state with a model that does not • weighing the evidence forsparse vs. distributed coding predicting a cognitive state using abrain-machine interface predicting asubject-specific diagnostic status predictive density marginal likelihood (model evidence)

Goodness of fit vs. complexity • Goodness of fit is the degree to which a model explains observed data. • Complexity is the flexibility of a model (including, but not limited to, its number of parameters). 1 parameter 4 parameters 9 parameters truth data model underfitting optimal overfitting • We wish to find the model that optimally trades off goodness of fit and complexity. Bishop (2007) PRML

Constructing a classifier A principled way of designing a classifier would be to adopt a probabilistic approach: that which maximizes In practice, classifiers differ in terms of how strictly they implement this principle. • Generative classifiers • use Bayes’ rule to estimate • Gaussian naïve Bayes • Linear discriminant analysis • Discriminative classifiers • estimate directlywithout Bayes’ theorem • Logistic regression • Relevance vector machine • Gaussian process classifier • Discriminant classifiers • estimate directly • Fisher’s linear discriminant • Support vector machine

Common types of fMRI classification studies Searchlight approach A sphere is passed across the brain. At each location, the classifier is evaluated using only the voxels in the current sphere map of t-scores. Whole-brain approach A constrained classifier is trained on whole-brain data. Its voxel weights are related to their empirical null distributions using a permutation test map of t-scores. Nandy & Cordes (2003) MRMKriegeskorte et al. (2006) PNAS Mourao-Miranda et al. (2005) NeuroImage

Support vector machine (SVM) Linear SVM Nonlinear SVM v2 v1 Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press

Stages in a classification analysis Feature extraction Classification using cross-validation Performance evaluation Bayesian mixed-effects inference mixed effects

Feature extraction for trial-by-trial classification We can obtain trial-wise estimates of neural activity by filtering the data with a GLM. data design matrix coefficients Boxcar regressor for trial 2 Estimate of this coefficient reflects activity on trial 2

Cross-validation The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation: examples 1 ? training example 2 ? test examples ? 3 ? ... ... 99 ? 100 ? 1 2 folds performance evaluation

Cross-validation A more commonly used variant is leave-one-out cross-validation. examples 1 ? training example 2 ? test example ? 3 ? ... ... ... ... ... ... 99 ? 100 ? 1 98 99 2 100 folds performance evaluation

Performance evaluation Single-subject study with trials The most common approach is to assess how likely the obtained number of correctly classified trials could have occurred by chance. subject Binomial test In MATLAB: p = 1 - binocdf(k,n,pi_0) number of correctly classified trials total number of trials chance level (typically 0.5) binomial cumulative density function trial 1 … trial - + - - +

Performance evaluation population subject 1 subject 2 subject 3 subject 4 subject … trial 1 … trial … - + - - +

Performance evaluation Group study with subjects, trials each In a group setting, we must account for both within-subjects (fixed-effects) and between-subjects (random-effects) variance components. Binomial test on concatenated data Binomial test on averaged data t-test onsummary statistics Bayesian mixed-effects inference available for MATLAB and R fixed effects fixed effects random effects mixed effects sample mean of sample accuracies chance level (typically 0.5) sample standard deviation cumulative Student’s -distribution Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (2012) JMLR Brodersen, Daunizeau, Mathys, Chumbley, Buhmann, Stephan (2013) NeuroImage

Research questions for classification Overall classification accuracy Spatial deployment of discriminative regions accuracy 80% 100 % 50 % Healthy or ill? Left or right button? Truthorlie? 55% classification task Temporal evolution of discriminability Model-based classification accuracy Participant indicates decision 100 % { group 1, group 2 } 50 % Accuracy rises above chance within-trial time Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection

Potential problems • Multivariate classification studies conduct group tests on single-subject summary statistics that • discard the sign or direction of underlying effects • do not necessarily take into account confounding effects (e.g. correlation of task conditions with difficulty etc.) • Therefore, in some analyses confounds rather than distributed representations may have produced positive results. Todd et al. 2013, NeuroImage

Potential problems • Simulation: Experiment condition (rule A vs. rule B) does not affect voxel activity, but difficulty does. Moreover, experiment condition and difficulty are confounded at the individual-subject level in random directions across 500 subjects. Todd et al. 2013, NeuroImage

Potential problems • Empiricalexample: searchlightanalysis: fMRI study on rule representations (flexible stimulus–responsemappings • Standard MVPA: rule representations in prefrontal regions. • GLM: no significant results. • Controlling for a variable that is confounded with rule at the individual-subject level but not the group level (reaction time differences across rules) eliminates the MVPA results. Todd et al. 2013, NeuroImage

Multivariate Bayes SPM brings multivariate analyses into the conventional inference framework of hierarchical Bayesian models and their inversion. Mike West

Multivariate Bayes Multivariate analyses in SPM rest on the central notion that inferences about how the brain represents things can be reduced to model comparison. vs. decoding model some cause or consequence sparse coding in orbitofrontal cortex distributed coding in prefrontal cortex

From encoding to decoding Encoding model: GLM Decoding model: MVB

Multivariate Bayes Friston et al. 2008 NeuroImage

MVB: details • Encoding model: • neuronal activity = linear mixtureofcauses: • A=X𝛽 • BOLD datamatrixis a temporal convolutionofunderlying neuronal activity: • 𝑌=𝑇A+𝐺𝛾+𝜀 • Decoding model: • mental stateis a linear mixtureofvoxel-wiseactivity: 𝑋=A𝛽 • thusA=X𝛽-1 • substitutioninto 𝑌 gives: • Y=𝑇X𝛽-1 +𝐺𝛾+𝜀 ⇒ Y𝛽=𝑇X+𝐺𝛾𝛽+𝜀𝛽 ⇒ 𝑇X= Y𝛽-𝐺𝛾𝛽-𝜀𝛽 • Importantly, wecanremovetheinfluenceofconfoundsbypremultiplyingwiththeappropriate residual formingmatrix: • RTX= RY𝛽+𝜍

MVB: details • Tomakeinversiontractable: • imposingpriors on 𝛽 (i.e., assumptionabouthow mental statesarerepresentedbyvoxel-wiseactivity) • Scientific inquiry: • modelselection: comparing different priorstoidentifywhich neuronal representationof mental statesismost plausible

Specifying the prior for MVB To make the ill-posed regression problem tractable, MVB uses a prior on voxel weights. Different priors reflect different anatomical and/or coding hypotheses. For example: patterns Voxel 2 is allowed to play a role. Voxel 3 is allowed to play a role, but only if its neighbours play similar roles. voxels Friston et al. 2008 NeuroImage

Example: decoding motion from visual cortex photic motion attention const MVB can be illustrated using SPM’s attention-to-motion example dataset. This dataset is based on a simple block design. There are three experimental factors: • photic– display shows random dots • motion– dots are moving • attention – subjects asked to pay attention During these scans, for example, subjects were passively viewing moving dots. scans Büchel& Friston 1999 Cerebral CortexFriston et al. 2008 NeuroImage

Multivariate Bayes in SPM Step 1 After having specified and estimated a model, use the Results button. Step 2 Select the contrast to be decoded.

Multivariate Bayes in SPM Step 3 Pick a region of interest.

Multivariate Bayes in SPM Step 5 Here, the region of interest is specified as a sphere around the cursor. The spatial prior implements a sparse coding hypothesis. anatomical hypothesis coding hypothesis Step 4 Multivariate Bayes can be invoked from within the Multivariate section.

Multivariate Bayes in SPM Step 6 Results can be displayed using the BMS button.

Model evidence and voxel weights log BF = 3

Summary: research questions for MVB How does the brain represent things? Evaluating competing coding hypotheses Where does the brain represent things? Evaluating competing anatomical hypotheses

Model-based classification A step 1 — modelling step 2 — embedding A → B A → C B → B B → C B C measurements from an individual subject subject-specificgenerative model subject representation in model-based feature space 1 A step 3 — classification step 4 — evaluation step 5 — interpretation accuracy B C 0 discriminability of groups? jointly discriminative connection strengths? classification model Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImageBrodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol

Model-based classification: model specification planumtemporale planumtemporale Heschl’sgyrus(A1) Heschl’sgyrus(A1) L R medialgeniculatebody medialgeniculatebody y = –26 mm stimulus input anatomicalregions of interest

Model-based classification * 100% 90% 80% balanced accuracy patients controls 70% 60% n.s. n.s. 50% a c s p m e z o f l r activation- based correlation- based model- based

Generative embedding patients controls Voxel-based activity space Model-based parameter space generative embedding Voxel 3 Parameter 3 Voxel 2 Parameter 2 Voxel 1 Parameter 1 classification accuracy75% classification accuracy98%

Model-based classification: interpretation PT PT HG(A1) HG(A1) L R MGB MGB stimulus input

Model-based classification: interpretation PT PT HG(A1) HG(A1) L R MGB MGB stimulus input highly discriminative somewhat discriminative not discriminative

Model-based clustering 42 patients diagnosedwith schizophrenia fMRI data acquired during working-memory task & modelled using a three-region DCM WM PC dLPFC 41 healthy controls VC stimulus Deserno, Sterzer, Wüstenberg, Heinz, & Schlagenhauf (2012) J Neurosci

Model-based clustering model selection interpretation validation Brodersen et al. 2014, NeuroImage: Clinical

Multivariate models for fMRI data