1 / 14

Research by: Bruce E Rosen and James M Goodwin Presentation by: Blake Adams

Training Hard to Learn Networks Using Advanced Simulated Annealing Methods Proceedings of the 1994 ACM symposium on Applied Computing. Research by: Bruce E Rosen and James M Goodwin Presentation by: Blake Adams. The Shortcomings of Backpropagation.

Télécharger la présentation

Research by: Bruce E Rosen and James M Goodwin Presentation by: Blake Adams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training Hard to Learn Networks Using Advanced Simulated Annealing MethodsProceedings of the 1994 ACM symposium on Applied Computing Research by:Bruce E Rosen and James M Goodwin Presentation by: Blake Adams

  2. The Shortcomings of Backpropagation • Backward Error Propagation works well for network training problems that have a single minimum in the error function. • Susceptible to convergence problems when a problem has one or more local minima. Once a system is trapped in a local minima, it cannot escape and search for a lower minimum elsewhere. • Incorporating a large number of hidden units can alleviate the convergence problem, but such networks become costly time-wise to train. Furthermore the process of finding the optimum number of hidden nodes for a network is unknown.

  3. Single/Multiple Minima

  4. Annealing • Real World – Metallurgy – Working with a material such as metal to create structure causes an effect known as work hardening. Annealing is a heat treatment applied to the material such that the microstructure is altered, usually returning the material to it’s original state. The process of heating the material tends to cause it to return back to its most natural state. Once cooled back down the material is typically more ductile. • Theory - Computer Science – By applying a simulated annealing process to the surface of a problem that a learning algorithm is being run on, the finding of a globally optimal solution can be guaranteed.

  5. Training with Simulated Annealing • Since simulated annealing methods are statistically guaranteed to find globally optimal solutions, they can be practical for minimizing or eliminating network paralysis. • Simulated Annealing has been used to train neural networks to learn difficult classification problems and while successful, they tend progress in a exceedingly slow fashion.

  6. Improving Annealing • Boltzmann Annealing is based on three functional relationships: • g(x): Probability density* of state-space of D parameters: • g(x) = (2 πT)−D/2 exp[−∆x2/(2T)] • h(x): Probability density for acceptance of new cost-function given the just previous value: • h(x) =1/1 + exp(∆E/T) • T(k): schedule of ‘‘annealing’’ the ‘‘temperature’’ T in annealing-time steps k: • T(k) = T0/ln k *A mathematical distribution of probability over space or time. Sum of all probabilities must equal one

  7. Improving Annealing • Lester Ingber’s improvements: • Developed an algorithm based on Cauchy distribution (Fast Annealing) which provided a unique method for finding an annealing schedule that is exponentially faster than the Boltzmann method. • Introduced re-annealing to the algorithm, which permits adaptation to the changing sensitivities in the multi-dimensional parameter-space where the global minima is being sought. In effect, it speeds up the learning process.

  8. Very Fast Simulated Re-annealing (VSFR) • A simulated annealing optimization method that stochastically searches a solution space for the global minima. • The exponential annealing schedule causes the algorithm to run faster than Boltzmann and Cauchy Annealing. • Main Advantages: • Fast Annealing Time • Statistical guarantee of convergence.

  9. Testing VFSR versus BEP • A network using only Backwards Error Propagation was compared with a network incorporating VFSR. • Architectures ranged from 2 input units, 1 hidden unit, and 1 output unit to 5 inputs, 5 hidden units, and 1 output unit. • For VFSR, reannealing was performed every 100 successful accepted tests. • Algorithms were tested on linearly separable problems and parity problems. • Both algorithms were terminated when more than the maximum number of pattern passes occurred or if the network succeeded in learning the training set to an acceptable level.

  10. Linearly Separable Problem Results

  11. Parity Problem Results

  12. Parity Problem Results

  13. Findings • No tuning was performed on the VFSR during the testing trials. The large number of successful completions attests to the robustness of the algorithm • VFSR networks appear to work best when the networks weight space is of moderate dimensionality. When larger numbers of weights are used, convergence can be slowed. • VFSR networks are especially well suited for training difficult input-output pattern mappings requiring large numbers of hidden units.

  14. Room for Improvement • Comparison of VFSR to Boltzmann and Cauchy Annealing. • More variety in problem type. • Explore which algorithms are best suited for which types of problems.

More Related