1 / 19

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at:

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” Workshop at the Frankfurt Institute for Advanced Studies (FIAS), November 27-28, 2008.

parley
Télécharger la présentation

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” Workshop at the Frankfurt Institute for Advanced Studies (FIAS), November 27-28, 2008

  2. for taking action, we need only the relevant features y z x

  3. models’ background & overview: - unsupervised feature learning models are enslaved by bottom-up input - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏ - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ (model 3 presented here, extending to delayed reward)‏ - feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)‏

  4. purely sensory data, in which one feature type is linked to reward the action is not controlled by the network sensory input action reward

  5. model 1: obtaining the relevant features 1) build a feature detecting model 2) learn associations between features 3) register the average features’ reward 4) spread value along associative connections 5) check whether actions in-/decrease value 6) remove features where action doesn’t matter irrelevant relevant

  6. Weber & Triesch, Proc ICANN, 740-9 (2008); Witkowski, Adap Behav, 15(1), 73-97 (2007); Toussaint, Proc NIPS, 929-36 (2003); Weber, Proc ICANN, 1147-52 (2001)‏ Földiák, Biol Cybern 64, 165-70 (1990) selected features features lateral weights (decorrelation)‏ associative weights thresholds action effect → homogeneous activity distr. → relevant features indentified

  7. motor-sensory data (again, one feature type is linked to reward)‏ the network selects the action (to get reward)‏ sensory input reward irrelevant subspace relevant subspace

  8. model 2: removing the irrelevant inputs 1) initialize feature detecting model (but continue learning)‏ 2) perform actor-critic RL, taking the features’ outputs as state representation - works despite irrelevant features - challenge: relevant features will occur at different frequencies - nevertheless, features may remain stable 3) observe the critic: puts negative value on irrelevant features after long training 4) modulate (multiply) learning by critic’s value frequency value

  9. Lücke & Bouecke, Proc ICANN, 31-7 (2005) features critic value action weights → relevant subspace discovered

  10. model 3: learning only the relevant inputs 1) top level: reinforcement learning model (SARSA)‏ 2) lower level: feature learning model (SOM / K-means)‏ 3) modulate learning by δ, in both layers action RL weights feature weights input

  11. model 3: SARSA with SOM-like activation and update

  12. feature weights relevant subspace RL action weights subspace coverage

  13. learning ‘long bars’ data RL action weights feature weights data input reward 2 actions (not shown)‏

  14. learning the ‘short bars’ data feature weights RL action weights action input data: bars controlled by actions ‘up’, ‘down’, ‘left’, ‘right’ reward

  15. short bars in 12x12 average # of steps to goal: 11

  16. biological interpretation - no direct feedback from striatum to cortex - convergent mapping → little receptive field overlap, consistent with subspace discovery GPi (output of basal ganglia)‏ action selection striatum feature/subspace detection cortex

  17. Discussion - models 1 and 2 learn all features and identify the relevent features - either requires homogeneous feature distribution (model 1)‏ - or can do only subspace- (no real feature) detection (model 2)‏ - model 3 is very simple: SARSA on SOM with δ-feedback - learns only the relevant subspace or features in the first place - link between unsupervised- and reinforcement learning Sponsors Frankfurt Institute for Advanced Studies FIAS Bernstein Focus Neurotechnology EU project 231722 “IM-CLeVeR” call FP7-ICT-2007-3

  18. relevant features change during learning T - maze decision task (rat)‏ Jog et al, Science, 286, 1158-61 (1999)‏ early learning late learning units in the basal ganglia are active at the junction during early task acquisition but not at a later stage

  19. evidence for reward/action modulated learning in the visual system Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006)‏ Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)‏

More Related