1 / 18

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009. for taking action, we need only the relevant features. y. z. x. unsupervised learning in cortex. actor. state space. reinforcement learning

penney
Télécharger la présentation

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17th March 2009

  2. for taking action, we need only the relevant features y z x

  3. unsupervised learning in cortex actor state space reinforcement learning in basal ganglia Doya, 1999

  4. 1-layer RL model of BG ... actor state space go left? go right? ... is too simple to handle complex input

  5. need another layer(s) to pre-process complex data actor action selection state space feature detection complex input (cortex)‏

  6. models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

  7. scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position sensory input action reward

  8. model that learns the relevant features top layer: SARSA RL lower layer: winner-take-all feature learning both layers: modulate learning by δ action RL weights feature weights input

  9. SARSA with WTA input layer

  10. Energy function: estimation error of state-action value identities used: note: non-negativity constraint on weights

  11. learning the ‘short bars’ data feature weights RL action weights data action reward

  12. short bars in 12x12 average # of steps to goal: 11

  13. learning ‘long bars’ data RL action weights feature weights data input reward 2 actions (not shown)‏

  14. WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

  15. Discussion - simple model: SARSA on winner-take-all network with δ-feedback - learns only the features that are relevant for action strategy - theory behind: derivation of value function estimation (approx.)‏ - non-negative coding aids feature extraction - link between unsupervised- and reinforcement learning - demonstration with more realistic data needed Sponsors Frankfurt Institute for Advanced Studies, FIAS Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3

  16. thank you ... Sponsors Frankfurt Institute for Advanced Studies, FIAS Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3

More Related