The statistical analysis of fMRI data

The statistical analysis of fMRI data Keith Worsley12, Chuanhong Liao1, John Aston123, Jean-Baptiste Poline4, Gary Duncan5, Vali Petre2, Frank Morales6, Alan Evans2, Tom Nichols7, Satoru Hayasaki7 1Department of Mathematics and Statistics, McGill University, 2Brain Imaging Centre, Montreal Neurological Institute, 3Imperial College, London, 4Service Hospitalier Frédéric Joliot, CEA, Orsay, 5Centre de Recherche en Sciences Neurologiques, Université de Montréal, 6Cuban Neuroscience Centre 7University of Michigan

First scan of fMRI data Highly significant effect, T=6.59 1000 hot 890 rest 880 870 warm 500 0 100 200 300 No significant effect, T=-0.74 820 hot 0 rest 800 T statistic for hot - warm effect warm 5 0 100 200 300 Drift 810 0 800 790 -5 0 100 200 300 Time, seconds fMRI data: 120 scans, 3 scans each of hot, rest, warm, rest, hot, rest, … T = (hot – warm effect) / S.d. ~ t110 if no effect

Choices … • Time domain / frequency domain? • AR / ARMA / state space models? • Linear / non-linear time series model? • Fixed HRF / estimated HRF? • Voxel / local / global parameters? • Fixed effects / random effects? • Frequentist / Bayesian?

More importantly ... • Fast execution / slow execution? • Matlab / C? • Script (batch) / GUI? • Lazy / hard working … ? • Why not just use SPM? • Develop new ideas ... • FMRISTAT: Simple, general, valid, robust, fast analysis of fMRI data

Temporal components (sd, % variance explained) 1 0.68, 46.9% 2 0.29, 8.6% Component 3 0.17, 2.9% 4 0.15, 2.4% 0 20 40 60 80 100 120 140 Frame Spatial components 1 1 0.5 2 Component 0 3 -0.5 4 -1 0 2 4 6 8 10 12 Slice (0 based) PCA_IMAGE: PCA of time  space: 1: exclude first frames 2: drift 3: long-range correlation or anatomical effect: remove by converting to % of brain 4: signal?

FMRILM: fits a linear model for fMRI time series with AR(p) errors • Linear model: • ? ? • Yt = (stimulust * HRF) b + driftt c + errort • AR(p) errors: • ? ? ? • errort = a1 errort-1 + … + ap errort-p + s WNt unknown parameters

FMRIDESIGN example: pain perception Alternating hot and warm stimuli separated by rest (9 seconds each). 2 warm warm hot hot 1 0 -1 0 50 100 150 200 250 300 350 Hemodynamic response function: difference of two gamma densities 0.4 0.2 0 -0.2 0 50 Responses = stimuli * HRF, sampled every 3 seconds 2 1 0 -1 0 50 100 150 200 250 300 350 Time, seconds

FMRILM first step: estimate the autocorrelation ? AR(1) model: errort = a1errort-1 + s WNt • Fit the linear model using least squares • errort = Yt – fitted Yt • â1 = Correlation ( errort , errort-1) • Estimating errort’schanges their correlation structure slightly, so â1 is slightly biased: which_stats = ‘_cor’ Raw autocorrelation Smoothed 12.4mm Bias corrected â1 ~ -0.05 ~ 0

Effective df depends on smoothing • Variability in acor lowers df • Df depends • on contrast • Smoothing acor brings df back up: dfacor = dfresidual(2 + 1) 1 1 2 acor(contrast of data)2 dfeffdfresidualdfacor FWHMacor2 3/2 FWHMdata2 = + Hot-warm stimulus Hot stimulus FWHMdata = 8.79 Residual df = 110 Residual df = 110 100 100 Target = 100 df Target = 100 df Contrast of data, acor = 0.79 50 Contrast of data, acor = 0.61 50 dfeff dfeff 0 0 0 10 20 30 0 10 20 30 FWHM = 10.3mm FWHM = 12.4mm FWHMacor FWHMacor

T > 4.93 (P < 0.05, corrected) FMRILM second step: refit the linear model Pre-whiten: Yt* = Yt – â1 Yt-1, then fit using least squares: Hot - warm effect, % ‘_mag_ef’ Sd of effect, % ‘_mag_sd’ 1 0.25 0.2 0.5 0.15 0 0.1 -0.5 0.05 -1 0 which_stats = ‘_mag_ef _mag_sd _mag_t’ T = effect / sd, 110 df ‘_mag_t’ 6 4 2 0 -2 -4 -6

Higher order AR model? Try AR(3): ‘_AR’ No correlation biases T up ~12% → more false positives a a a 1 2 3 0.3 0.2 AR(1) seems to be adequate 0.1 0 … has little effect on the T statistics: -0.1 AR(1), df=100 AR(2), df=98 AR(3), df=98 5 0 -5

Results from 4 runs on the same subject Run 1 Run 2 Run 3 Run 4 1 Effect, 0 E i ‘_mag_ef’ -1 0.2 Sd, S 0.1 i ‘_mag_sd’ 0 5 T stat, 0 E / S i i ‘_mag_t’ -5

MULTISTAT: mixed effects linear model for combining effects from different runs/sessions/subjects: • Ei = effect for run/session/subject i • Si = standard error of effect • Mixed effects model: Ei = covariatesi c + Si WNiF +  WNiR }from FMRILM ? ? Usually 1, but could add group, treatment, age, sex, ... ‘Fixed effects’ error, due to variability within the same run Random effect, due to variability from run to run

REML estimation using the EM algorithm • Slow to converge (10 iterations by default). • Stable (maintains estimate 2 > 0 ), but • 2 biased if 2 (random effect) is small, so: • Re-parameterize the variance model: Var(Ei) = Si2 + 2 = (Si2 –minj Sj2) + (2 +minj Sj2) = Si*2 + *2 • 2 = *2 –minj Sj2 (less biased estimate) ^ ^ ? ? ^ ^

 Problem: 4 runs, 3 df for random effects sd  ... Run 1 Run 2 Run 3 Run 4 MULTISTAT 1 Effect, 0 E i -1 0.2 Sd, S 0.1 i 0 5 T stat, 0 E / S i i -5 ‘_mag_ef’ … very noisy sd: ‘_mag_sd’ … and T>15.96 for P<0.05 (corrected): ‘_mag_t’ … so no response is detected …

Solution: Spatial regularization of the sd • Basic idea: increase df by spatial smoothing (local pooling) of the sd. • Can’t smooth the random effects sd directly, - too much anatomical structure. • Instead, • random effects sd • fixed effects sd • which removes the anatomical structure before smoothing.  ) sd= smooth  fixed effects sd

^ Average Si  Random effects sd, 3 df Fixed effects sd, 440 df Mixed effects sd, ~100 df 0.2 0.15 0.1 0.05 0 divide multiply Random sd / fixed sd Smoothed sd ratio ‘_sdratio’ 1.5 random effect, sd ratio ~1.3 1 0.5

400 300 200 100 0 0 20 40 Infinity Effective df depends on smoothing FWHMratio2 3/2 FWHMdata2 e.g. dfrandom = 3, dffixed = 4  110 = 440, FWHMdata = 8mm: = + dfratio = dfrandom(2 + 1) 1 1 1 dfeffdfratiodffixed fixed effects analysis, dfeff = 440 dfeff FWHM = 19mm Target = 100 df random effects analysis, dfeff = 3 FWHMratio

Run 1 Run 2 Run 3 Run 4 MULTISTAT 1 Effect, 0 E i -1 0.2 Sd, S 0.1 i 0 5 T stat, 0 E / S i i -5 Final result: 19mm smoothing, 100 effective df … ‘_mag_ef’ ‘_ef’ … less noisy sd: ‘_sd’ ‘_mag_sd’ … and T>4.93 for P<0.05 (corrected): ‘_t’ ‘_mag_t’ … and now we can detect a response!

FWHM – the local smoothness of the noise voxel size (1 – correlation)1/2 FWHM = (2 log 2)1/2 (If the noise is modeled as white noise smoothed with a Gaussian kernel, this would be its FWHM) Volume FWHM3 P-values depend on Resels: Resels = Local maximum T = 4.5 Clusters above t = 3.0, search volume resels = 500 0.1 0.1 0.08 0.08 0.06 0.06 P value of cluster P value of local max 0.04 0.04 0.02 0.02 0 0 0 500 1000 0 0.5 1 1.5 2 Resels of search volume Resels of cluster

Non-isotropic data (spatially varying FWHM) • fMRI data is smoother in GM than WM • VBM data is highly non-isotropic • Has little effect on P-values for local maxima (use ‘average’ FWHM inside search region), but • Has a big effect on P-values for spatial extents: smooth regions → big clusters, rough regions → small clusters, so • Replace cluster volume by cluster resels = volume / FWHM3

‘_fwhm’ ‘_fwhm’ FWHM (mm) of scans (110 df) FWHM (mm) of effects (3 df) 20 20 Resels=1.90 P=0.007 15 15 10 10 Resels=0.57 P=0.387 5 5 0 0 FWHM of effects (smoothed) effects / scans FWHM (smoothed) 20 1.5 15 10 1 5 0 0.5

STAT_SUMMARY Low FWHM use Bonferroni In between use Discrete Local Maxima (DLM) High FWHM use Random Field Theory Bonferroni 4.7 4.6 4.5 True T, 10 df 4.4 Random Field Theory 4.3 T, 20 df Discrete Local Maxima (DLM) 4.2 Gaussianized threshold 4.1 Gaussian 4 3.9 Bonferroni, N=Resels 3.8 3.7 0 1 2 3 4 5 6 7 8 9 10 FWHM of smoothing kernel (voxels)

STAT_SUMMARY Low FWHM use Bonferroni In between use Discrete Local Maxima (DLM) High FWHM use Random Field Theory 0.12 Gaussian T, 20 df T, 10 df 0.1 Random Field Theory Bonferroni 0.08 DLM can ½ P-value when FWHM ~3 voxels 0.06 P-value 0.04 Discrete Local Maxima True 0.02 Bonferroni, N=Resels 0 0 1 2 3 4 5 6 7 8 9 10 FWHM of smoothing kernel (voxels)

STAT_SUMMARY example: single run, hot-warm Detected by BON and DLM but not by RFT Detected by DLM, but not by BON or RFT

T>4.86

T>4.86 T > 4.93 (P < 0.05, corrected)

T>4.86

Conjunction: Minimum Ti > threshold Minimum of Ti ‘_conj’ Average of Ti‘_mag_t’ For P=0.05, threshold = 1.82 For P=0.05, threshold = 4.93 Efficiency = 82%

Efficiency : optimum block design Sd of hot stimulus Sd of hot-warm 0.5 0.5 20 20 0.4 0.4 15 15 Magnitude Optimum design 0.3 0.3 10 10 Optimum design 0.2 0.2 X 5 5 0.1 0.1 0 0 X 0 0 5 10 15 20 5 10 15 20 InterStimulus Interval (secs) (secs) (secs) 1 1 20 20 0.8 0.8 15 15 Delay 0.6 0.6 Optimum design X Optimum design X 10 10 0.4 0.4 5 5 0.2 0.2 (Not enough signal) (Not enough signal) 0 0 0 5 10 15 20 5 10 15 20 Stimulus Duration (secs)

Efficiency : optimum event design 0.5 (Not enough signal) uniform . . . . . . . . . ____ magnitudes ……. delays random .. . ... .. . 0.45 concentrated : 0.4 0.35 0.3 Sd of effect (secs for delays) 0.25 0.2 0.15 0.1 0.05 0 5 10 15 20 Average time between events (secs)

+ + How many subjects? • Largest portion of variance comes from the last stage i.e. combining over subjects: sdrun2sdsess2sdsubj2 nrunnsessnsubj nsessnsubj nsubj • If you want to optimize total scanner time, take more subjects. • What you do at early stages doesn’t matter very much!

0.6 0.4 0.2 0 -0.2 -0.4 -5 0 5 10 15 20 25 t (seconds) Estimating the delay of the response • Delay or latency to the peak of the HRF is approximated by a linear combination of two optimally chosen basis functions: delay basis1 basis2 HRF shift • HRF(t + shift) ~ basis1(t)w1(shift) + basis2(t)w2(shift) • Convolve bases with the stimulus, then add to the linear model

3 2 1 0 -1 -2 -3 -5 0 5 shift (seconds) • Fit linear model, estimate w1andw2 • Equate w2/w1to estimates, then solve for shift (Hensen et al., 2002) • To reduce bias when the magnitude is small, use • shift / (1 + 1/T2) • where T = w1/ Sd(w1) is the T statistic for the magnitude • Shrinks shift to 0 where there is little evidence for a response. w2 /w1 w1 w2

Shift of the hot stimulus T stat for magnitude ‘_mag_t’ T stat for shift ‘_del_t’ Shift (secs) ‘_del_ef’ Sd of shift (secs) ‘_del_sd’

Shift of the hot stimulus T stat for magnitude ‘_mag_t’ T stat for shift ‘_del_t’ T>4 T~2 Shift (secs) ‘_del_ef’ Sd of shift (secs) ‘_del_sd’ ~1 sec +/- 0.5 sec

Run 1 Run 2 Run 3 Run 4 MULTISTAT 4 2 Effect, 0 E i -2 -4 2 Sd, 1 S i 0 5 T stat, 0 E / S i i -5 Combining shifts of the hot stimulus (Contours are T stat for magnitude > 4) ‘_ef’ ‘_del_ef’ ‘_sd’ ‘_del_sd’ ‘_t’ ‘_del_t’

Shift of the hot stimulus Shift (secs) ‘_del_ef’ T stat for magnitude ‘_mag_t’ > 4.93

Comparison: • Different slice acquisition times: • Drift removal: • Temporal correlation: • Estimation of effects: • Rationale: • Random effects: • Map of the delay: • SPM’99: • Adds a temporal derivative • Low frequency cosines (flat at the ends) • AR(1), global parameter, bias reduction not necessary • Band pass filter, then least-squares, then correction for temporal correlation • More robust, • but lower df • No regularization, • low df, no conjuncs • No • fmristat: • Shifts the model • Splines • (free at the ends) • AR(p), voxel parameters, bias reduction • Pre-whiten, then least squares (no further corrections needed) • More accurate, higher df • Regularization, high df, conjuncs • Yes

References • http://www.math.mcgill.ca/keith/fmristat • Worsley et al. (2002). A general statistical analysis for fMRI data. NeuroImage, 15:1-15. • Liao et al. (2002). Estimating the delay of the response in fMRI data. NeuroImage, 16:593-606.

Functional connectivity Activation only Correlation only • Measured by the correlation between residuals at every pair of voxels (6D data!) • Local maxima are larger than all 12 neighbours • P-value can be calculated using random field theory • Good at detecting focal connectivity, but • PCA of residuals x voxels is better at detecting large regions of co-correlated voxels Voxel 2 + + Voxel 2 + + + + + + + Voxel 1 + Voxel 1 + +

|Correlations| > 0.7, P<10-10 (corrected) First Principal Component > threshold

The statistical analysis of fMRI data

The statistical analysis of fMRI data

Presentation Transcript

the statistical analysis of data

Statistical Data Analysis

fMRI Data Analysis

The statistical analysis of fMRI data

Statistical Data Analysis

Statistical Analysis of Microarray Data

Statistical Data Analysis

Statistical Data Analysis

Statistical Analysis of Microarray Data

Statistical Analysis of Data

fMRI Data Analysis in Slicer

A general statistical analysis for fMRI data

STATISTICAL DATA ANALYSIS

Statistical Analysis of Decay Data

Statistical methods of data analysis

FMRI Data Analysis: II. Advanced Data Analysis

Data Analysis for fMRI

A general statistical analysis for fMRI data

fMRI Data Analysis

Statistical Data Analysis

Batch Programming of fMRI Data Analysis

Statistical Analysis of Microarray Data