Mind Reading with fMRI

Mind Reading with fMRI Ken Norman Department of Psychology Princeton University May 1, 2007

Brain Scanning • Today’s topic: Applying pattern classifiers to brain scanning data, to decode the information represented in a person’s brain at a particular point in time • This is NOT the standard approach • Standard approach: • Stick someone in the scanner • Have them perform a cognitive task • Explore which brain regions are engaged by the cognitive task

Brain Scanning • If you’re interested in memory retrieval: • Scan people while they’re retrieving memories • Scan people during a control condition • Look at which brain regions respond differentially • This approach has been very productive for cognitive neuroscience

Brain Scanning • Alternative approach to analyzing brain scanning data: • Use pattern classification algorithms, applied to distributed patterns of neural activity, to identify the neural signatures of particular thoughts and memories • Once we have trained the classifier to recognize a particular thought, we can use the classifier to track the comings and goings of those thoughts over time

Motivation • Why pattern classification? • Reason #1: Improve the interface between fMRI and cognitive theories • Cognitive neuroscientists have developed very detailed theories of how information is processed in the brain • What information is represented in different brain structures? • How is it represented? • How is that information transformed at different stages of processing? • To directly test these theories, we need a way of decoding the informational contents of the subject’s brain state

Motivation • Reason #2: We aren’t doing as good a job of “data mining” fMRI data as we could... • We collect several GB of information from each subject • There is a lot of information about subjects’ thoughts buried in these big data files; the challenge is how to extract this information • Machine learning researchers have developed tremendously powerful algorithms for extracting meaningful regularities from large data sets • These algorithms are not routinely used in fMRI data analysis…

Outline • 3 minute overview of functional MRI • Brief overview of existing research on fMRI pattern classification • Technical challenges & machine learning issues

Brain Scanning 101

Brain Scanning 101 • How do we image neural activity with functional MRI? • Brain regions that are active use up more metabolic resources • In particular, they use up more oxygen from the blood • The MRI machine can be tuned to detect the difference between oxygenated and deoxygenated blood • By looking at which brain areas have deoxygenated vs. oxygenated blood, we can get a sense of which brain areas are active at a particular moment

Brain Scanning 101 • it takes approx. 2 seconds for the MRI machine to take a snapshot of blood flow (across the entire brain)

fMRI images • Big cube, made out of a grid of little cubes • Pixel = one square in a 2D grid (“picture element”) • Voxel = one of the tiny little cubes in an fMRI image (like a volumetric pixel) • Voxels are approx. 3 millimeters on each side • Neuron size ~ 10 micrometer • Each voxel reflects the aggregate activity of a very large number of neurons • We aren’t directly measuring activity, we are measuring blood flow! • Blood flow response is smeared out in time (peak response = ~6 sec after neural activity)

Patterns in the brain • Key idea: Cognitive states correspond to distributed patternsof brain activity • What do these “patterns in the brain” look like?

The Eight Categories Study(Haxby et al. 2001) Faces Cats Scissors Chairs Houses Bottles Shoes Scrambled Pictures slides courtesy of Jim Haxby

Accuracy of Category Identification Identification Accuracy ± SE Chance Overall Accuracy = 96% slides courtesy of Jim Haxby

Our Studies • We set out to extend the basic pattern classification method • The brain patterns from the Haxby study correspond to several minutes’ worth of brain activity • We wanted to see if we could classify cognitive states based on single brain images (reflecting ~2 seconds’ worth of neural activity)

Pattern Classification Method • General approach: Say that we want to be able to track the presence of two different cognitive states in the subject’s brain (e.g., viewing shoes vs. bottles) using fMRI

Pattern Classification Method • Acquire brain data while the subject is thinking about shoes or bottles

Pattern Classification Method • Acquire brain data • Convert each functional brain volume (~ 2 seconds worth of data) into a vector that reflects the pattern of activity across voxels at that point in time. • We typically do some kind of feature selection to cut down on the number of voxels

Pattern Classification Method • Acquire brain data • Generate brain patterns • 3. Label brain patterns according to whether the subject was viewing shoes vs. bottles (adjusting for lag in the blood flow response)

Pattern Classification Method • Acquire brain data • Generate brain patterns • Label brain patterns • Train a classifier to discriminate between bottle patterns and shoe patterns

vs Bottle Shoe Output layer Simple Neural Network Classifier (Logistic Regression) • To estimate how much subjects are thinking about bottles, compute a weighted sum of voxel activity values; do the same for shoes • Apply decision rule (e.g., sigmoid function) • To train the classifier, we use a learning algorithm that sets the weights to maximize decision performance (e.g., backpropagation) Input layer (voxels)

Pattern Classification Method • Acquire brain data • Generate brain patterns • Label brain patterns • Train the classifier • Apply the trained classifier to new brain patterns (not presented at training).

Free Recall & Mental Time Travel (Polyn et al., 2005) • How do we selectively retrieve memories from a particular event? • Intuitively: We try to recapture our mindset from that event • Concretely: We try to make our brain state during recall resemble our brain state during the original event • “Mental Time Travel” • Goal of the study: Use fMRI pattern-analysis to image this process of mental time travel as it happens...

Giza pyramids Jack Nicholson flask Imaging Mental Time Travel (Polyn et al., 2005) • Memory experiment: Subjects study 3 types of stimuli • Recall test: Recall items from all 3 categories, in any order • Hypothesis: To recall a particular category, subjects try to recapture their mindset from the study phase • In concrete terms: Subjects try to make their brain state at test resemble their brain state when they were studying that category • If subjects succesfully recapture their brain state from the study phase, this will trigger recall of specific studied items...

Analysis strategy • Step 1: Feed fMRI data from the study phase into a pattern classification algorithm • Train the pattern classifier to recognize the brain patterns associated with studying faces vs. locations vs. objects

Neural network classifier • Mapping from voxel activity values to output units (one per category)

Analysis strategy • Step 2: Apply the trained classifier to brain data from the retrieval phase • Use the classifier to track, second-by-second, how well the subject’s brain state at retrieval matches their brain state when they were studying faces vs. locations vs. objects

Predictions • As subjects try recall faces, locations, and objects, their brain state should come into alignment with the brain states associated with studying faces, locations, and objects • This neural measure of category-specific “mental reinstatement” should be predictive of recall

Final free recall - classifier output match to facestudy context match to locationstudy context match to object study context Classifier traces for Subject 9 during final free recall.

Other findings • Kamitani & Tong (2005): decode the orientation of a striped pattern that is being viewed by the subject (accurate to within 20 degrees)

2006 Pittsburgh competition • Subjects were scanned while they watched 3 episodes of “Home Improvement” • Time-varying ratings obtained for “amusement”, “food”, “tools”, “faces”... • Goal: predict ratings using brain data • Train a classifier using brain data + ratings from 2 episodes • Then, feed the trained classifier the brain data from the 3rd episode and use the classifier to predict (in a second-by-second fashion) the subject’s feature ratings

2006 competition • some representative correlation values: • Amusement: .46 • Faces: .67 • Language: .69 • Laughter: .58 • Motion: .49 • Music: .76 • Tools: .62

2007 Pittsburgh competition • www.braincompetition.org

Interim Summary • By applying classifiers to fMRI data, we can derive a time-varying estimate of the subject’s cognitive state, that relates in a meaningful way to their behavior • Technical challenges

Technical Challenges • From the perspective of machine learning, fMRI classification is a particularly difficult problem (Mitchell et al., 2004, Machine Learning) • Big patterns • Noisy patterns • Relatively few patterns • What can we do to improve classification?

Classifiers • We have tried lots of classifiers • Neural network, correlation-based classifiers, support vector machines, Gaussian Naive Bayes, boosting, k-nearest-neighbor, linear discriminant analysis... • The exact classifier that we use doesn’t seem to matter (much); nonlinear classifiers do not systematically outperform linear classifiers... • Regularization helps (e.g., ridge regression outperforms normal regression)

Feature Selection • Getting rid of noisy voxels greatly helps performance • Standard method: • Run a voxel-wise omnibus ANOVA on the conditions of interest (e.g., face vs. location vs. object) • Get rid of voxels that don’t vary significantly across conditions

Feature Selection • This ANOVA method helps, but it has several problems • Main benefit of linear classifiers is that they can aggregate weak signals across voxels • In light of this, it seems like a bad idea to discard individual voxels just because the voxel’s signal is weak...

Feature Selection • What we really want to do is to come up with multivariate means of voxel selection • we want to select sets of voxels that in aggregate carry useful information • Promising approach: Searchlights (Kriegeskorte et al., 2006, PNAS)

Dimensionality Reduction • We are also exploring different methods of re-coding the data • There is extensive redundancy across voxels (esp. spatially proximal voxels) • Is there a more efficient way to represent the input (i.e., with fewer dimensions) • manifold learning • Spatial wavelet decomposition • ICA

Dimensionality reduction algorithms • Generative models (David Weiss & David Blei) • Each brain state is made of a linear combination of “neural topics” • Each topic = a pattern of voxel activity across the whole brain (positive and negative values are OK) • To generate a brain state from topics, multiply each topic by a positive value • Topics are constrained to be spatially sparse (L2 regularization; trying L1 also)

Next steps • We know a lot about the brain (in general), the fMRI response, and cognition that we are not telling the classifier… • Currently: Each brain pattern is treated as a distinct observation • In actuality: There is massive correlation between adjacent time points • Knowing the information represented at time n tells you a lot about the information represented at time n + 1

Next steps • In addition to temporal correlation, there is extensive spatial correlation • Nearby voxels tend to represent similar things • One way to address this issue is by spatially smoothing the data (averaging together activity from nearby voxels) • However, you can lose information this way • A more sophisticated approach would be to directly measure pairwise correlations between voxels and incorporate this information in the model

Next steps • Currently, our analyses are focused on single subjects • Is there some way to leverage data from other subjects to help with classification • If you run 10 subjects in the Haxby 8-category experiment, none of the subjects will have the exact same “shoe” representation, but the shoe representations are not random either • It might be possible to draw on data from other subjects to set priors on which voxels will be involved in representing shoes

Next steps • Also, there is an enormous body of evidence relating to which brain structures are involved in a given cognitive task • “face area”, “place area” • We can use this information to set priors on voxel weights in the classification process

Next steps • The cognitive states that we are trying to classify often have a hierarchical structure • How you represent a stimulus depends on the task that you are performing • Informing the classifier about this hierarchical structure should boost classification

Mind Reading with fMRI