1 / 21

Goal-Directed Feature Learning Cornelius Weber and Jochen Triesch

Goal-Directed Feature Learning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies (FIAS) ‏ IJCNN, Atlanta, 17 th June 2009. for taking action, we need only the relevant features. y. z. x. unsupervised learning in cortex. actor. state space. reinforcement

Télécharger la présentation

Goal-Directed Feature Learning Cornelius Weber and Jochen Triesch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Goal-Directed Feature Learning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies (FIAS)‏ IJCNN, Atlanta, 17th June 2009

  2. for taking action, we need only the relevant features y z x

  3. unsupervised learning in cortex actor state space reinforcement learning in basal ganglia Doya, 1999

  4. reinforcement learning go up? go right? go down? go left?

  5. reinforcement learning action a weights input s

  6. reinforcement learning v(s,a)value of a state-action pair (coded in the weights)‏ action a weights input s minimizing value estimation error: d v(s,a) ≈0.9 v(s’,a’) - v(s,a)‏ d v(s,a) ≈ 1 - v(s,a)‏ repeated running to goal: in state s, agent performs best action a (with random), yielding s’ and a’ moving target value fixed at goal --> values and action choices converge

  7. reinforcement learning actor input (state space)‏ simple input complex input go right! go right? go left? can’t handle this!

  8. complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position sensory input action reward

  9. need another layer(s) to pre-process complex data a action Q weight matrix action selection encodes v sstate position of relevant bar feature detection W weight matrix feature detector I input network definition: s = softmax(W I)‏ P(a=1) = softmax(Q s)‏ v = a Q s

  10. a action Q weight matrix action selection s state feature detection W weight matrix I input network training: E = (0.9 v(s’,a’) - v(s,a))2 = δ2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε minimize error w.r.t. current target reinforcement learning δ-modulated unsupervised learning

  11. network training: minimize error w.r.t. target Vπ identities used: note: non-negativity constraint on weights

  12. SARSA with WTA input layer

  13. learning the ‘short bars’ data feature weights RL action weights data action reward

  14. short bars in 12x12 average # of steps to goal: 11

  15. learning ‘long bars’ data RL action weights feature weights data input reward 2 actions (not shown)‏

  16. WTA non-negative weights SoftMax no weight constraints SoftMax non-negative weights

  17. models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

  18. Discussion - two-layer SARSA RL performs gradient descent on value estimation error - approximation with winner-take-all leads to local rule with δ-feedback - learns only action-relevant features - non-negative coding aids feature extraction - link between unsupervised- and reinforcement learning - demonstration with more realistic data still needed Sponsors Frankfurt Institute for Advanced Studies, FIAS Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3

  19. thank you ... Sponsors Frankfurt Institute for Advanced Studies, FIAS Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3

More Related