Research by: Bruce E Rosen and James M Goodwin Presentation by: Blake Adams

Training Hard to Learn Networks Using Advanced Simulated Annealing MethodsProceedings of the 1994 ACM symposium on Applied Computing Research by:Bruce E Rosen and James M Goodwin Presentation by: Blake Adams

The Shortcomings of Backpropagation • Backward Error Propagation works well for network training problems that have a single minimum in the error function. • Susceptible to convergence problems when a problem has one or more local minima. Once a system is trapped in a local minima, it cannot escape and search for a lower minimum elsewhere. • Incorporating a large number of hidden units can alleviate the convergence problem, but such networks become costly time-wise to train. Furthermore the process of finding the optimum number of hidden nodes for a network is unknown.

Single/Multiple Minima

Annealing • Real World – Metallurgy – Working with a material such as metal to create structure causes an effect known as work hardening. Annealing is a heat treatment applied to the material such that the microstructure is altered, usually returning the material to it’s original state. The process of heating the material tends to cause it to return back to its most natural state. Once cooled back down the material is typically more ductile. • Theory - Computer Science – By applying a simulated annealing process to the surface of a problem that a learning algorithm is being run on, the finding of a globally optimal solution can be guaranteed.

Training with Simulated Annealing • Since simulated annealing methods are statistically guaranteed to find globally optimal solutions, they can be practical for minimizing or eliminating network paralysis. • Simulated Annealing has been used to train neural networks to learn difficult classification problems and while successful, they tend progress in a exceedingly slow fashion.

Improving Annealing • Boltzmann Annealing is based on three functional relationships: • g(x): Probability density* of state-space of D parameters: • g(x) = (2 πT)−D/2 exp[−∆x2/(2T)] • h(x): Probability density for acceptance of new cost-function given the just previous value: • h(x) =1/1 + exp(∆E/T) • T(k): schedule of ‘‘annealing’’ the ‘‘temperature’’ T in annealing-time steps k: • T(k) = T0/ln k *A mathematical distribution of probability over space or time. Sum of all probabilities must equal one

Improving Annealing • Lester Ingber’s improvements: • Developed an algorithm based on Cauchy distribution (Fast Annealing) which provided a unique method for finding an annealing schedule that is exponentially faster than the Boltzmann method. • Introduced re-annealing to the algorithm, which permits adaptation to the changing sensitivities in the multi-dimensional parameter-space where the global minima is being sought. In effect, it speeds up the learning process.

Very Fast Simulated Re-annealing (VSFR) • A simulated annealing optimization method that stochastically searches a solution space for the global minima. • The exponential annealing schedule causes the algorithm to run faster than Boltzmann and Cauchy Annealing. • Main Advantages: • Fast Annealing Time • Statistical guarantee of convergence.

Testing VFSR versus BEP • A network using only Backwards Error Propagation was compared with a network incorporating VFSR. • Architectures ranged from 2 input units, 1 hidden unit, and 1 output unit to 5 inputs, 5 hidden units, and 1 output unit. • For VFSR, reannealing was performed every 100 successful accepted tests. • Algorithms were tested on linearly separable problems and parity problems. • Both algorithms were terminated when more than the maximum number of pattern passes occurred or if the network succeeded in learning the training set to an acceptable level.

Linearly Separable Problem Results

Parity Problem Results

Findings • No tuning was performed on the VFSR during the testing trials. The large number of successful completions attests to the robustness of the algorithm • VFSR networks appear to work best when the networks weight space is of moderate dimensionality. When larger numbers of weights are used, convergence can be slowed. • VFSR networks are especially well suited for training difficult input-output pattern mappings requiring large numbers of hidden units.

Room for Improvement • Comparison of VFSR to Boltzmann and Cauchy Annealing. • More variety in problem type. • Explore which algorithms are best suited for which types of problems.

Research by: Bruce E Rosen and James M Goodwin Presentation by: Blake Adams

Research by: Bruce E Rosen and James M Goodwin Presentation by: Blake Adams

Presentation Transcript

James Emms Adventure

biography of james dyson

Integers and Division

JAMES O. PAGE

Cross-Cultural Psychology Qualitative Research Approaches

James McDougal, Psy.D. Sheila Clonan, Ph.D. Michael LeBlanc, Ph.D. Syracuse University SUNY Oswego

Evaluation of Nutrient Levels in Children with ASD vs. Controls – Preliminary Results

Andrew Blake, Microsoft Research and Bill Freeman, MIT, ICCV 2003 and Andrew W. Moore, Carnegie Mellon University

Quentin Blake

Tumor Immunology (II): Cancer Immunotherapy

CRES D Review Game

The Military Health Service Population Health Portal (MHSPHP) 3G Training Summary

The First Continental Congress

This presentation runs automatically. No mouse clicks possible.

Solar Sail

Phil James

WILLIAM BLAKE

James Joyce (1882-1941)

Kevin T. Blake, Ph.D., P.L.C. Tucson, Arizona Cross Country Education Brentwood, Tennessee