Goal Directed Reaching with the Motor Cortex Model

Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007

Introduction • Goal: A computational model for goal directed reaching movement with biologically plausible motor cortex model, which can explain • 1. neural coding in the motor cortex • 2. relationship between skill learning and map formation • 3. reorganization of the motor cortex after lesion with improvement of movement

Overview • Dual Map • Motor output map • Motor input map • Models • Arm model with Hill-type muscles • Cortex model • Reinforcement learning framework • Results • Discussion

Directional coding Georgeopoulos, 1986

Dual Map • Two views of neural coding in the motor cortex • Low-level, Muscle coding (Evarts…) • High-level, Kinematic coding (Georgeopoulos, 1986) • Both or Intermediate (joint) coding • We hypothesized • Motor cortex output map: mainly encodes low-level muscle coding • Motor input map: high-level kinematic coding

Learning goal directed movements with Actor-Critic Trajectory Planner Kinematic Coding Critic (Basal Ganglia) • Learning feed-forward controllerusing temporal difference learning and actor-critic architecture (Sutton, 1984, Barto et al.1983) • Biologically plausible (dopamine and/or acetylcholine modulation of LTP in motor cortex) • Continuous time and space (K Doya, 2000) • Similar approaches • Bissmarck et al, 2005. • Jun Izawa et al, 2004. Temporal Difference Learning (Dopamine neurons) Motor Cortex Model Competitive Hebbian Learning Motoneurons (Spinal Cord) Arm Model With muscles

Motor output map • ICMS may exhibit characteristics of corticospinal projections • Monosynapticprojections from some M1 neurons to motoneurons • Fetz and Cheney, 1980; Lemon et al., 1986 • Todorov(2003), Donoghue group’s Motor Cortex Model Motoneurons (Spinal Cord)

Motor input map • Motor cortex neural recording during voluntary movements (i.e. Georgeopoulos) • Activation level in the voluntary movement tends to be similar to the high level’s coding, kinematic coding Kinematic Coding Motor Cortex Model

Models • Motor output map • Competitive Hebbian learning with a motor cortex model • Reversed feature extraction • Motor input map • Temporal difference Reinforcement learning

Arm model • Arm model: 2 links on the horizontal plane • 6 muscles with Hill-type muscle model • Shoulder Extensor (E), Shoulder Flexor (F) • Elbow Extensor (O), Elbow Flexor (C) • Biarticular Extensor (B) and Flexor (T) • An accurate arm model is important • Todorov (2002) mentioned characteristic may be propagated from bottom to up. • Ning Lan(2002), Zajac (1989), Katayama (1993), Cheng et al., (2000), Spoelstra et al.(2000) (from Spoelstra et al., 20001)

Motor Cortex model • Chernjavsky and Moody, 1990 • 2 layer with GABA neurons. • Shunting inhibitory GABA neurons • Mexican Hat activation • Shunting inhibition (Douglas et al., 1995; Prescott et al., 2003) PYR GABA

Model Diagram Trajectory Generator Joint static Level Planning ACTOR CRITIC Inverse Dynamics Evaluator Of Mvmt • Our motor cortex model includes the inverse dynamics and the inverse muscle model. • How do we learn it in a biologically plausible manner? • Using reinforcement learning • Provides an evaluation of the movement • Implementation with temporal difference learning based on the actor-critic structure • Similar approaches • Bissmarck et al, 2005. • Jun Izawa et al, 2004. Joint “force” Level Planning TD error Inverse Muscle Model Muscle Level Planning Motoneurons Arm

Actor-Critic Model (Sutton, 1984) • “Actor” produces a motor command • The motor command feeds into the plant. • “Critic” evaluate how good the movement was, compared with previous expectations (TD error) • Update “Actor” based on Critic’s evaluation. • Update “Critic”. If the actor is improved, the critic can expect better movements. • The worse movement than what the critic expected is discarded. Trajectory Generator ACTOR CRITIC MOTOR CORTEX Evaluator Of Mvmt TD error Arm

Actor: compute the motor commands Kinematic planning • Example of Actor: Bissmarck et al, 2005. • Coding of kinematic variables • Distributed coding Action pool: preferred torques • The layer contains action unit which is tuned to “preferred torque” • Competition between these preferred torques using softmax. • Pi is the probability to be chosen, shown as bar in the diagram. • Modifiable weights w exist between kinematic planning signal and preferred torques • Exploration using action perturbation TD w Preferred Torque Layer pi Torque (Joint Force)

Critic: providing the reward prediction error for actor learning • Temporal Difference Learning • Critic learns the reward prediction error by the temporal difference learning • The reward is generally delayed • This prediction of reward helps to help generate correct action choice before the reward is received (temporal credit assignment problem). • K Doya, 2000. in continuous time and space • Critic: The Basal Ganglia and dopamine neurons • Dopamine neuron carry TD error (Schultz 1998) • Reward prediction error is learned in the basal ganglia (O’Doherty Science 2004)

Critic: immediate reward • A large reward is given at the goal. • The reward function over space does not have to be continuous. However, if it is continuous, it helps to find a good movement. • The reward function bellow is (Bissmarck et al., 2005)

Critic: Reward prediction error • The total predicted reward at the current state includes discounted future rewards • A critic learns this predicted error at the current state • Delta shows how much action made difference between expected reward and real reward. • If it’s positive, the action was good. t t0

Critic: Reward prediction error • Example: Dopamine neuron CS US A well trained critic produced Just before reward is expected to be given Reward given If there is no reward, because a well trained critic expected , delta become negative. No reward

Results (1): Motor output map • Motor output map of the cortex model • Map representation is the muscle coding

Results (2) : Motor output map • 50 msec random stimulation on the motor cortex • Motoneuron pattern shows ‘determined’ preferred direction. • Actually, motoneuron is tuned to preferred “torque”. However, at a fixed starting posture, preferred torque implies preferred direction

Results (3) : Motor input map • NOT FINISHED, NEED TUNING OF REINFORCEMENT LEARNING • Movement is not fully learned • Motor input map • Activation of the motor cortex during a voluntary movement. • Broad activation (on 20% of movement time) • Similar direction has similar pattern

Results (4) : Motor input map • Population code • During the first 20% of time • Excluded insignificantly tuned neurons (about half among 400 neurons)

Short Discussion • Neural coding and regression • Tuning curve over directions • Cosine • Sharper than cosine • Truncated cosine • Advantage of population coding • Two ways of neural coding

Neural coding and regression • Cricket detects wind direction with four neurons. ci is pre-tuned (preferred) wind direction of the ith neuron, and ri is its firing rate. • Regression error is the smallest where the preferred direction exists. (Its tuning curve is a truncated cosine function) • INFERENCE AND COMPUTATION WITH POPULATION CODES (Pouget A, Dayan P, Zemel RS. 2003)

Tuning Curve • If the tuning curve is cosine function as Georgeopoulos (1986) • Perfect reconstruction using basis • If the tuning curve is sharper than cosine function • Recently, sharper tuning curve has been reported (Paninski et al., 2004; Scott et al., 2001) • Distortion exists. (Regression error)

Advantage of population coding • Low regression error: • Ideally, if preferred direction exist for all different directions. (Pouget et al., 2003) • Strong to noisy input • Pouget et al., 2003 • Less variability in the motor control • Assumes signal dependent noise (SDN) • Use more muscles, less variability (Todorov, 2002)

Future work • Fine tuning of Reinforcement learning • Cerebellum • Concurrent Learningof Motor input map and Motor output map • Sensory cortex, which may be related to the feedback control • “Premotor cortex” for inverse kinematic coding (Action sensory coding, currently implemented with SOM)

Goal Directed Reaching with the Motor Cortex Model