1 / 22

Backpropagation learning

Backpropagation learning. Simple vs. multilayer perceptron. Hidden layer problem. Radical change for the supervised learning problem. No desired values for the hidden layer. The network must find its own hidden layer activations. Generalized delta rule.

chandler
Télécharger la présentation

Backpropagation learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Backpropagation learning

  2. Simple vs. multilayer perceptron

  3. Hidden layer problem • Radical change for the supervised learning problem. • No desired values for the hidden layer. • The network must find its own hidden layer activations.

  4. Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way of creating desired values for hidden layers

  5. Outline • The algorithm • Derivation as a gradient algoritihm • Sensitivity lemma

  6. Multilayer perceptron • L layers of weights and biases • L+1 layers of neurons

  7. Reward function • Depends on activity of the output layer only. • Maximize reward with respect to weights and biases.

  8. Example: squared error • Square of desired minus actual output, with minus sign.

  9. Forward pass

  10. Sensitivity computation • The sensitivity is also called “delta.”

  11. Backward pass

  12. Learning update • In any order

  13. Backprop is a gradient update • Consider R as function of weights and biases.

  14. Sensitivity lemma • Sensitivity matrix = outer product • sensitivity vector • activity vector • The sensitivity vector is sufficient. • Generalization of “delta.”

  15. Coordinate transformation

  16. Output layer

  17. Chain rule • composition of two functions

  18. Computational complexity • Naïve estimate • network output: order N • each component of the gradient: order N • N components: order N2 • With backprop: order N

  19. Biological plausibility • Local: pre- and postsynaptic variables • Forward and backward passes use same weights • Extra set of variables

  20. Backprop for brain modeling • Backprop may not be a plausible account of learning in the brain. • But perhaps the networks created by it are similar to biological neural networks. • Zipser and Andersen: • train network • compare hidden neurons with those found in the brain.

  21. LeNet • Weight-sharing • Sigmoidal neurons • Learn binary outputs

  22. Machine learning revolution • Gradient following • or other hill-climbing methods • Empirical error minimization

More Related