1 / 15

Recurrent Networks

Recurrent Networks. Psych 419/719 Feb 22, 2001. Extending Backprop to Recurrent Networks. Consider activation of units at a given time t. Instead of using backprop equations for previous unit activity in space (e.g., previous layer), use previous activity in time. tn. Wba. t2. Waa. Wba.

aimon
Télécharger la présentation

Recurrent Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recurrent Networks Psych 419/719 Feb 22, 2001

  2. Extending Backprop toRecurrent Networks • Consider activation of units at a given time t. • Instead of using backprop equations for previous unit activity in space (e.g., previous layer), use previous activity in time

  3. tn Wba t2 Waa Wba Wab Wbb Waa Wbb t1 a Wab b Wba Wab Wbb Waa t0 a b “Unrolling” a Network ...

  4. The Math is Similar To Normal Backprop • Gradient term at time t is sum of error at time t (difference between output and target at time t), and error term propagated backwards to that unit from previous time slice. • Some units still don’t have targets. Or, might not have targets at all time samples. • Some units have both targets, and error propagated backwards.

  5. Useful Applications • Time varying behavior • Pressure for speed. Inject error in early samples. • Attractor networks. Settle to a stable state. • Maintain “memory” of events. Activity is a result of current input, and computations on previous inputs.

  6. Time Varying Behavior • Can build oscillators • Rhythmic behavior: walking, chewing… • Sequences of actions • Motor plans: reaching for a cup of coffee, writing, speech… • Higher level plans: making a cup of coffee, going to a movie...

  7. Inputs for Time Varying Behavior • Can be static (like, a plan) • Input is some code meaning “make cup of coffee” • Output: t0 get cup; t1 grind coffee; t2 get water • Can make input be last step done • Input: get cup; output: grind coffee • Input: grind coffee; output: get water….

  8. Pressure for Speed • Suppose network is run for 10 time samples • Inject error on samples 2-10. • Network is penalized (gets error) not only for producing wrong answer, but for not producing right answer rapidly • Works well with frequency weighted stimuli: • Frequent items really pressured to go faster

  9. the dog anvil Example error time

  10. Attractor Networks:Some Background (Autoencoder) • Input and Output reps have the same semantics • Train network to reproduce input in output • Hidden units compress representation • Can do pattern completion, repair noisy or incomplete data

  11. “cleanup” units Representation Attractor Networks:A Recurrent Implementation Can repair partial patterns over time

  12. Attractor Networks UsuallyUsed in Larger Networks • Plaut & Shallice (1991) deep dyslexia model: attractor in semantic space • Damage to attractor -> semantic errors • Harm & Seidenberg (1999) phonological dyslexia model: attractor in phonological space • Damage to attractor -> errors in generalization

  13. “Memory” of sequential events • The “Elman” network • For sentences: present words one at a time. Target at each step is next word. • Has “context” units, which are copies of hidden unit activity on previous time step • Learns to predict what kinds of words can follow, based on sequence we’ve seen “Prediction” “context” Current Input

  14. Extension to Continuous Time • Activity changes slowly in response to input, not instantly. • Approximate continuous time by taking small discrete samples output input time

  15. Two Ways to do Continuous Time • Time averaged outputs: the output of a unit is weighted sum of its previous output, and what it is being driven to. • Time averaged inputs: the effective input to a unit is weighted sum of previous effective input value, and what it is being driven to. Output is instantaneous result of current effective input.

More Related