1 / 34

Show Me the Money!

Show Me the Money!. Dmitry Kit. Outline. Overview Reinforcement Learning Other Topics Conclusions. Learning Models. Hebbian Learning Strengthens the relationship between neurons that exhibit similar activity patterns and/or in close proximity

Télécharger la présentation

Show Me the Money!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Show Me the Money! Dmitry Kit

  2. Outline • Overview • Reinforcement Learning • Other Topics • Conclusions

  3. Learning Models • Hebbian Learning • Strengthens the relationship between neurons that exhibit similar activity patterns and/or in close proximity • Might Explain Topological Features of the Brain • Population Coding • Basis function learning • Area allocation to different functions • Reinforcement Learning (RL) • Strengthens the relationship between choices that are causally connected in obtaining some reward

  4. RL Framework and the Brain • Reward Signal Representation • Dopamine • Local Action selection structures • Lateral Intra-parietal Area (LIP) • Supplementary Eye Field (SEF) • Frontal Eye Field (FEF) • Global Mechanism for action selection • Basal Ganglia

  5. Outline • Overview • Reinforcement Learning • Reward Signal (Dopamine) • Decision Variables (SEF, LIP, Other) • Global Mechanism for Choice (Basal Ganglia) • Other Topics • Conclusions

  6. Reward Signal (Dopamine) • Located in Nigra Pars Compacta (SNc) • Modulates neurons in many different regions • Tonic low frequency activity • Only sends the error signal between expected and actual rewards

  7. RL and Dopamine

  8. Decision Variables in LIP • Contains neurons that code for: • expected gain • relative rewards between different actions • This activity was observed to be before the choices were actually presented and movement was made • Suggesting that these neurons were used to decide on appropriate action

  9. LIP Neuron Activity • Expectation of: • High reward produced high firing frequency (black line) • Low reward produced high firing frequency (gray line) • The firing rate was correlated with gain expectation early in the trial

  10. Overall Neural Activity in LIP • A large portion of examined neurons showed a significant activity related to gain expectation, outcome probability and estimated value • These were mostly exhibit in the early part of the trial • These neurons were also modulated by the actual movement

  11. Neural Features of SEF • Three types of neurons found • Active upon failure to perform task • Not responsible for executing actions • Not related to spatial stimuli • Possible error signal coding • Active upon success • Not a response to visual stimuli • Not responsible for motor control • Related to some internal coding of performance • Active before and during the delivery of reinforcement • Possibly interconnected to other regions of the brain • Seem to code expected reward versus actual reward received

  12. Function of SEF • Monitoring and controlling: • Perception and production systems during decision making • Error Correction • Production of responses that are not well-learned • Overcoming habitual responses • Evidence: • Neurons do not generate eye movements • Monitor performance and reward

  13. Reward Coding in Other Structures • Neurons in orbitofrontal cortex show: • Selectivity to the type of physical reward • Solid • Liquid • Etc., • Distinguish between rewards and punishers • Some neurons in amygdala respond to magnitude of reward

  14. Local Choices • Multiple areas in charge of decision making • Frontal Eye Field (FEF) • LIP • Supplementary Eye Field (SEF) • Etc., • Might have different goals • Need a global mechanism to arbitrate between these different goals

  15. Physiology (Basal Ganglia) • Located at the base of the cerebrum • Consists of: • Caudate Nucleus (CD) and putamen (PUT) (collectively called striatum) • Input from cerebral cortex and part of the thalamus • Globus pallidus • External Segment (GPe) • Internal Segment (GPi) • Subthalamic Nucleus (STN) • Receives direct input from cerebral cortex • Substantia Nigra • Pars reticulate (SNr) • Pars compacta (SNc) • Output Stations (GPi and SNr) • To thalamus and brain stem motor areas

  16. Anatomical Locations (BG)

  17. BG: Function • Controls • Thalamocortical networks • Mainly involved in hand or arm movements • Brain stem motor networks • Superior Colliculus • eye-head orienting • the pedunculopontine nucleus • locomotion • periaqueductal gray • vocalization • autonomic responses

  18. BG-SC connection • Exists in many lower mammals • Method of control • CD inhibits neural activity in SNr • SNr projects inhibitory connections to SC • Inhibition is the main method of control • Appropriate action is selected by inhibiting all except the desired action

  19. Neural Properties of BG • Contains memory-guided neurons • Contains neurons that code expectation of task specific events • SNr • Only effected by planned movements • Response fields of neurons is the same to those they connect to in SC

  20. Circuit Diagram

  21. Coordinated Activity Model • Use GPe to select just the activity you need (Focus) • Use STN to inhibit a planned future activity(Sequencing) • Might be an incorrect model if we emphasize the direct cortical input to the STN • Direct control over movement suppression

  22. Learning of Sequential Procedures • Frontoparietal association cortices and anterior part of the basal ganglia learn new sequences • Uses visuospatial coordinates • Motor-premotor cortices and the mid-posterior part of the basal ganglia exploit learned sequences • Uses motor coordinates

  23. BG and Decision Making • Ventral striatum receives input from neocortical areas (cognition) and limbic (emotional) areas • Speed of saccades are related to emotional or motivational state • As with SEF and LIP many BG neurons respond to the expectation of reward • Uses dopaminergic neurons to: • Modulate selectivity of individual neurons • Modulate response magnitude of individual neurons

  24. Circuit Diagram Revisited

  25. Consequences of BG Disorders • Involuntary movement • Random movement • Visually guided saccades • Shorter saccades • Problems with coordinated movements • Responds deficit to memory-guided saccades • Trouble holding fixation • Especially if STN is damaged • Inability to learn sequential procedures • Lack of motivation to perform actions

  26. Why Disinhibition? • Possibly an evolutionary by-product • Need a gating mechanism not an enhancement mechanism

  27. Outline • Overview • Reinforcement Learning • Other Topics • Attention Vs. Reward • Credit Assignment Problem • Conclusions

  28. Attention Or reward? • Attention is a more global concept than reward • Defined as the study of vigilance, selective processing of stimuli, and control systems for complex behavior • Attention can modulate neurons before the onset of stimuli, just like reward expectation neurons • Attention is dependant on task difficulty • How does one distinguish between reward expectation signal and attention to a particular stimuli at a single neuron? • Some studies of attention might have been looking at the same neural signal as those studying reward • Provide better definitions for reward and attention • Attention might be defined in terms of rewards

  29. The Credit Assignment Problem • What chain of actions resulted in reward? • Which of the action to the right got you your steak?

  30. Solution • Start from the point of receiving the reward • Recently shown that in rats the hippocampus replays their daily experiences backwards • Back propagate the reward at a discounted rate • In monkeys the neurons coding for reward expectation decreased their activity, when the delay between the cue and reward was increased • Converges to optimal policy if: • Any action has some positive probability of being chosen • Infinite time

  31. Infinite Time • Humans do not need an optimal policy • The inherent randomness in our environment, behaviors, and tasks make it unlikely that a set of truly unrelated actions coincide frequently

  32. Conclusion • Choose a set of actions (eg., SEF, LIP, etc.,) • Execute a subset of actions that do not violate physical limitations (Basal Ganglia) • Compare the final result against the expected result (Dopamine) • Try to do the task again • Almost every task that we execute uses the eyes to locate the target of interest and therefore it is not surprising that the eye is closely related to the current task. • Might be a huge oversimplification: “Correlation does not imply causation” • More experiments are needed to show these relationships

  33. The End Thank You

More Related