Understanding Learning Through Brain Changes in Humans

REWARD PREDICTION, LEARNING AND THE MESOLIMBIC DOPAMINE SYSTEM – TESTING A COMPUTATIONAL MODEL OF PAVLOVIAN LEARNING USING FMRI IN HUMANS J. Jensen1, A.J. Smith2, M. Willeit1, I. Vitcu1 & S. Kapur1,3 1Centre for Addiction and Mental Health: Clarke Division/ Schizophrenia and Continuing Care Program, Toronto; 2Department of Psychology, McMaster University; 3Department of Psychiatry, University of Toronto, Canada Abstract Background: Learning occurs according to the discrepancy or “prediction error”between the expected outcome and the actual outcome - computationally operationalized as a temporal difference (TD) learning algorithm. The firing of the midbrain dopamine neurons projecting to the ventral striatum (VS) conforms to the predictions of this algorithm during learning in monkeys. We wanted to test how the TD-model fares in understanding brain changes during learning in humans.Methods: Ten subjects were exposed to a conditioning paradigm in which a neutral visual stimulus predicted winning money (appetitive), while another predicted unpleasant cutaneous electrical stimulation (aversive). SPM analysis was used to assess whether brain activations, especially in the mesolimbic dopamine projections, conformed to the predictions of the TD algorithm.Results: As subjects learned to associate visual stimuli with appetitive and aversive events it led to an activation in the ventral striatum. During appetitive learning activity in the ventral striatum covaried as predicted by TD algorithm. However, aversive learning also led to changes in the VS that correlated positively with TD-algorithm.Conclusions: The activations of the VS in this paradigm are consistent with it representing a reward-prediction error signal for learning. However, while the standard TD-model predicts a positive response when things are ‘better than expected’ and a negative response when things are ‘worse than expected’ – we find a positive response whenever things are “different from expected.” These results suggest that the ventral striatum activations in humans reflect prediction of salience rather than the prediction of positive reward alone. Results Behavioral data As shown in Fig. 2 (left) there was a higher degree of uneasiness reported when the CSav was seen as compared to CSneut (2.3±0.8 vs. 0.5±0.85; Z=2.85, p<0.01) and to CSapp (2.3±0.8 vs. 0.4±0.7; Z=2.84, p<0.01). No significant differences were obtained in reported excitement between CS types although there was a trend for higher values reported for CSapp as compared to CSneut (p1-tailed=0.06). As presented in Fig. 2 (right) about 97% of trials including USav and 66% including USapp reached a GSR value above 0.05 μS. For the non-reinforced trials the magnitude was smaller but there were differences in GSR above threshold between CSav and CSneut (t(7)=4.23; p<0.01), CSav vs. CSapp (t(7)=3.77; p<0.01) and CSapp vs. CSneut (t(7)=3.08; p<0.05). ** ** ** ** * Introduction An animal’s survival depends on its ability to predict and respond to salient stimuli. Formal learning theory describes how animals learn when expectations about the world are violated. An influential formal learning theory is the temporal difference model (TD). This model has been successfully applied to describe neural activations in appetitive learning in both animals and humans in regions associated with the mesolimbic dopaminergic system, such as the ventral striatum. These activations can be captured in the hypothesis that this system is involved in encoding prediction errors (PE) where PE = [Predicted reward – Actual reward]. TD also make predictions for aversive events where the PE are negative. Thus, according to conventional TD, PE is valence dependent. This study aimed to investigate neural activations that covaried with TD using both appetitive and aversive events in the same run. Further, we wanted to investigate whether a valence dependent or independent model best would fit the data. Uneasy Excited Fig 2. The left panel show the reported values of uneasiness and excitement when seeing the CSs. The right panel shows the percentage of trials above threshold in GSR for the different CSs and also for USs for comparison. *p<0.05: **p<0.01 • Imaging data • Using the valence dependent regressor no significant clusters were obtained at the a-priori determined threshold (0.01) in the regions of interest. Using a more liberal threshold a cluster was found in the left prefrontal cortex (peak at coordinate -12, 63, 9; Z=3.34; pextent<0.05) (Fig. 1 left column bottom). • The valence independent regressor yielded a significant cluster in the right ventral striatum (peak at coordinate 6, 3, -3; Z=3.36; pextent <0.001) (Fig. 1 right column bottom). • Conclusions • A valence independent version of the TD model better describes activity in the ventral striatum as compared to a valence dependent. • The activations of the ventral striatum are consistent with it representing a prediction error signal. • These results also suggest that ventral striatum activations in humans reflect prediction errors of stimuli salience rather than coding for the distinction between appetitive and aversive events as well. Materials & Methods Procedure: 10 right-handed healthy subjects (7 females; age 34±9 years) participated in this experiment. The experiment was based on Pavlovian conditioning using a 33% partial reinforcement schedule with cutaneous electrical stimulations (CES) to the left index finger as aversive unconditioned stimuli (USav) and money ($5) as appetitive unconditioned stimuli (USapp). The subjects saw colored circles for 5s as conditioned stimuli (CS) where one of the colors (CSav) was followed by the CES in one third of the trials while another (CSapp) was followed by a $5 bill image in a third of the trials. The subjects were told that they were going to be paid the sum of the $5 images they saw after the experiment. A third color had no consequences (CSneut). Trials were randomized during the experiment. Further, to assess autonomic arousal Galvanic Skin Responses (GSR) were also sampled. Scanning protocol: MRI scans were acquired by a GE signa 1.5T scanner equipped with a head coil. In a single session 700 volumes (28 contiguous axial 4.4 mm thick slices) covering the whole brain where acquired using a T2*-sensitive spiral sequence (TR=2240 ms). For localization purposes T1-weighted anatomical images covering the whole brain were acquired. Analysis: The pre-processing of the data and analysis where done using a random effects analysis in SPM99 (http://www.fil.ion.ucl.ac.uk/spm). The regressors were obtained by convolving the TD generated PE with a haemodynamic response function. For the valence independent regressor the absolute values of PE were used. Acknowledgements We would like to thank Adrian Crawley, David Mikulis and Peter Bloomfield for assisting with technical expertise. Many thanks also to Suzi VanderSpek and Catherine Tenn for their creative expertise. Fig 1. The upper panels show the Prediction error values generated by TD to the left (valence dependent) for a part of the experiment (volumes 300-340) and the corresponding absolute values to the right (valence independent). The middle panels show the same values convolved with a haemodynamic response function used as regressors in the analysis. The bottom panels show Prefrontal activation that covaried with the valence dependent regressor (left) and the activation in ventral striatum that covaried with the valence independent regressor (right). The circles indicate the type of CS: yellow circle = aversive CS, blue circle = appetitive CS , grey circle = CS.

Understanding Learning Through Brain Changes in Humans

Understanding Learning Through Brain Changes in Humans

Presentation Transcript

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

ABSTRACT

Abstract

Abstract

Abstract

ABSTRACT THE ABSTRACT / TUTORIALOUTLETDOTCOM

Abstract