1 / 62

Thomas Trappenberg

Learning : A modern review of anticipatory systems in brains and machines. Thomas Trappenberg. Outline. Universal Learning machines. 1961: Outline of a theory of Thought-Processes and Thinking Machines Neuronic & Mnemonic Equation Reverberation Oscillations

sinead
Télécharger la présentation

Thomas Trappenberg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning: A modern review of anticipatory systems in brains and machines Thomas Trappenberg

  2. Outline

  3. Universal Learning machines • 1961: Outline of a theory of Thought-Processes • and Thinking Machines • Neuronic & Mnemonic Equation • Reverberation • Oscillations • Reward learning Eduardo Renato Caianiello (1921-1993) But: NOT STOCHASTIC (only small noise in weights) Stochastic networks: The Boltzmann machine Hinton & Sejnowski 1983

  4. MultiLayerPerceptron (MLP) Universal approximator (learner) but Overfitting Meaningful input Unstructured learning Only deterministic (just use chain rule)

  5. Linear large margin classifiers Support Vector Machines (SVM) MLP: Minimize training error (here threshold Perceptron) VM: Minimize generalization error (empirical risk)

  6. Linear in parameter learning Linear hypothesis Non-Linear hypothesis Linear in parameters SVM in dual form } Kernel function Liquid/echo state machines Extreme learning machines Thanks to Doug Tweet (UoT) for pointing out LIP

  7. Goal of learning: Make predictions !!!!!!!!!!! learning vs memory Fundamental stochastisity Irreducible indeterminacy Epistemological limitations Sources of fluctuations  Probabilistic framework

  8. Plant equation for robot Distance traveled when both motors are running with Power 50 Goal of learning:

  9. Hypothesis: The hard problem: How to come up with a useful hypothesis Learning: Choose parameters that make training data most likely Assume independence of training examples Maximum Likelihood Estimation and consider this as function of parameters (log likelihood)

  10. How about building more elaborate multivariate models? and arguing with Causal (graphical) models (Judea Pearl) 10 parameters 31 Parameters of CPT usually learned from data!

  11. Hidden Markov Model (HMM) for localization • Integrating sensor information becomes trivial • Breakdown of point estimates in global localization (particle filters)

  12. Synaptic Plasticity Gradient descent rule for LMS loss function: … with linear hypothesis: Perceptron learning rule Hebb rule

  13. The organization of behavior (1949): Donald O. Hebb (1904-1985) see also Sigmund Freud, Law of association by simultaneity, 1888

  14. Classical LTP/LTD

  15. R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)

  16. R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)

  17. R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)

  18. R. Enoki, Y. Hu, D. Hamilton, and A. Fine, Neuron 62 (2009)

  19. Data from G.Q. Bi and M.M. Poo, J Neurosci 18 (1998) D. Standage, S. Jalil and T. Trappenberg, Biological Cybernetics 96 (2007)

  20. Population argument of `weight dependence’ Is Bi and Poo’s weight dependent STDP data an experimental artifact? - Three sets of assumptions (B, C, D) - Their data may reflect population effects … with Dominic Standage (Queen’s University)

  21. 2. Sparse Unsupervised Learning

  22. Horace Barlow Possible mechanisms underlying the transformations of sensory of sensory messages (1961) ``… reduction of redundancy is an important principle guiding the organization of sensory messages …” Sparsness & Overcompleteness The Ratio Club

  23. PCA minimizing reconstruction error and sparsity

  24. Deep believe networks: The stacked Restricted Boltzmann Machine Geoffrey E. Hinton

  25. sparse convolutional RBM … with Paul Hollensen & Warren Connors Sonar images Truncated Cone Side Scan Sonar Synthetic Aperture Sonar scRBM reconstruction scRBM/SVM mine sensitivity: .983±.024, specificity: .954±.012 SIFT/SVM mine sensitivity: .970±.025, specificity: .944±.008

  26. … with Paul Hollensen sparse and topographic RBM (rtRBM)

  27. Map Initialized Perceptron (MIP) …with Pitoyo Hartono

  28. Free-Energy-Based Supervised Learning: TD learning generalized to Boltzmann machines (Sallans & Hinton 2004) Paul Hollensen: Sparse, topographic RBM successfully learns to drive the e-puck and avoid obstacles, given training data (proximity sensors, motor speeds)

  29. RBM features

  30. 3. Reinforcement learning

  31. 2. Reinforcement learning -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 From Russel and Norvik

  32. Markov Decision Process (MDP) If we know all these factors the problem is said to be fully observable And we can just sit down and contemplate about the problem before moving

  33. Two important quantities policy: value function: Goal: maximize total expected payoff Optimal Control

  34. Calculate value function (dynamic programming) Deterministic policies to simplify notation Bellman Equation for policy p Solution: Analytic or Incremental Richard Bellman 1920-1984

  35. Policy Iteration: Chose one policy  calculate corresponding value function  chose better policy based on this value function Value Iteration: For each state evaluate all possible actions Bellman Equation for optimal policy

  36. Solution: But: Environment not known a priori Observability of states Curse of Dimensionality  Online (TD)  POMDP  Model-based RL

  37. What if the environment is not completely known ? Online value function estimation (TD learning) If the environment is not known, use Monte Carlo method with bootstrapping Expected payoff before taking step Expected reward after taking step = actual reward plus discounted expected payoff of next step =Temporal Difference This leads to the exploration-exploitation dilemma

  38. Online optimal control: Exploitation versus Exploration On-policy TD learning: Sarsa Off-policy TD learning: Q-learning

  39. Model-based RL: TD(l) Instead of tabular methods as mainly discussed before, use function approximator with parameters q and gradient descent with exponential eligibility trace e which weights updates with l for each step (Satton 1988): Free Energy-based reinforcement learning (Sallans & Hinton 2004) … Paul Hollensen

  40. Basal Ganglia … work with Patrick Connor

  41. Our questions • How do humans learn values that guide behaviour? (human behaviour) • How is this implemented in the brain? (anatomy and physiology) • How can we apply this knowledge? (medical interventions and robotics)

  42. Classical Conditioning Ivan Pavlov 1849-1936 Nobel Prize 1904 Rescorla-Wagner Model (1972)

  43. Reward Signals in the Brain Wolfram Schultz Stimulus ANo reward Stimulus B Stimulus A Reward

  44. Disorders with effects On dopamine system: Parkinson’s disease Tourett’s syndrome ADHD Drug addiction Schizophrenia Maia & Frank 2011

  45. Adding Biological Qualities to the Model Input Rescorla-Wagner Model Rescorla and Wagner, 1972 Striatum Dopamine and Reward Prediction Error Schultz, 1998

More Related