Cutting-Edge Neural Network Training Algorithms Research

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal Vitaly Vodyanoy University Reader: Weikuan Yu

Outlines • Why Neural Networks • Network Architectures • Training Algorithms • How to Design Neural Networks • Problems in Second Order Algorithms • Proposed Second Order Computation • Proposed Forward-Only Algorithm • Neural Network Trainer • Conclusion & Recent Research

What is Neural Network • Classification: separate the two groups (red circles and blue stars) of twisted points [1].

What is Neural Network • Interpolation: with the given 25 points (red), find the values of points A and B (black)

What is Neural Network • Human Solutions • Neural Network Solutions

What is Neural Network • Recognition: retrieve the noised digit images (left) to original images (right) Noised Images Original Images

What is Neural Network • “Learn to Behave” • Build any relationship between input and outputs [2] Learning Process “Behave”

Why Neural Network • What makes neural network different Testing Patterns (41×41=1,681) Given Patterns (5×5=25)

Different Approximators • Test Results of Different Approximators Mamdani fuzzy TSK fuzzy Neuro-fuzzy SVM-RBF SVM-Poly Nearest Linear Spline Cubic Neural Network Matlab Function: Interp2

Comparison • Neural networks behave potentially as the best approximator

A Single Neuron • Two basic computations (1) (2)

Network Architectures • Multiplayer perceptron network is the most popular architecture • Networks with connections across layers, such as bridged multiplayer perceptron (BMLP) networks and fully connected cascade (FCC) networks are much powerful than MLP networks. • Wilamowski, B. M. Hunter, D. Malinowski, A., "Solving parity-N problems with feedforward neural networks". Proc. 2003 IEEE IJCNN, 2546-2551, IEEE Press, 2003. • M. E. Hohil, D. Liu, and S. H. Smith, "Solving the N-bit parity problem using neural networks," NeuralNetworks, vol. 12, pp1321-1323, 1999. • Example: smallest networks for solving parity-7 problem (analytical results) BMLP network FCC network MLP network

Error Back Propagation Algorithm • The most popular algorithm for neural network training • Update rule of EBP algorithm [3] • Developed based on gradient optimization • Advantages: • Easy • Stable • Disadvantages: • Very limited power • Slow convergence

Improvement of EBP • Improved gradient using momentum [4] • Adjusted learning constant [5-6]

Newton Algorithm • Newton algorithm: using the derivative of gradient to evaluate the change of gradient, then select proper learning constants in each direction [7] • Advantages: • Fast convergence • Disadvantages: • Not stable • Requires computation of second order derivative

Gaussian-Newton Algorithm • Gaussian-Newton algorithm: eliminate the second order derivatives in Newton Method, by introducing Jacobian matrix • Advantages: • Fast convergence • Disadvantages: • Not stable

Levenberg Marquardt Algorithm • LM algorithm: blend EBP algorithm and Gaussian-Newton algorithm [8-9] • When evaluation error increases, μ increase, LM algorithm switches to EBP algorithm • When evaluation error decreases, μ decreases, LM algorithm switches to Gaussian-Newton method • Advantages • Fast convergence • Stable training • Comparing with first order algorithms, LM algorithm has much more powerful search ability, but it also requires more complex computation

Comparison of Different Algorithms • Training XOR patterns using different algorithms

How to Design Neural Networks • Traditional design: • Most popular training algorithm: EBP algorithm • Most popular network architecture: MLP network • Results: • Large size neural networks • Poor generalization ability • Lots of engineers move to other methods, such as fuzzy systems

How to Design Neural Networks • B. M. Wilamowski, "Neural Network Architectures and Learning Algorithms: How Not to Be Frustrated with Neural Networks," IEEE Ind. Electron. Mag., vol. 3, no. 4, pp. 56-63, 2009. • Over-fitting problem • Mismatch between size of training patterns and network size • Recommended design policy: compact networks benefit generalization ability • Powerful training algorithm: LM algorithm • Efficient network architecture: BMLP network and FCC network 2 neurons 3 neurons 4 neurons 5 neurons 6 neurons 7 neurons 8 neurons 9 neurons

Problems in Second Order Algorithms • Matrix inversion • Nature of second order algorithms • The size of matrix is proportional to the size of networks • As the size of networks increases, second order algorithms may not as efficient as first order algorithms

Problems in Second Order Algorithms • Architecture limitation • M. T. Hagan and M. Menhaj, "Training feedforward networks with the Marquardt algorithm". IEEE Trans. on Neural Networks, vol. 5, no. 6, pp. 989-993, 1994. (citation 2474) • Only developed for training MLP networks • Not proper for design compact networks • Neuron-by-Neuron Algorithm • B. M. Wilamowski, N. J. Cotton, O. Kaynak and G. Dundar, "Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks", IEEE Trans. on Industrial Electronics, vol. 55, no. 10, pp. 3784-3790, Oct. 2008. • SPICE computation routines • Capable of training arbitrarily connected neural networks • Compact neural network design: NBN algorithm + BMLP (FCC) networks • Very complex computation

Problems in Second Order Algorithms • Memory limitation: • The size of Jacobian matrix J is P×M×N • P is the number of training patterns • M is the number of outputs • N is the number of weights • Practically, the number of training patterns is huge and is encouraged to be as large as possible • MINST handwritten digit database [10]: 60,000 training patterns, 784 inputs and 10 outputs. Using the simplest network architecture (1 neuron per output), the required memory could be nearly 35 GB. • Limited by most of the Windows compiler.

Problems in Second Order Algorithms • Computational duplication • Forward computation: calculate errors • Backward computation: error backpropagation • In second order algorithms, both Hagan and Menhaj LM algorithm and NBN algorithm, the error backpropagation process has to be repeated for each output. • Very complex • Inefficient for networks with multiple outputs

Proposed Second Order Computation – Basic Theory • Matrix Algebra [11] • In neural network training, considering • Each pattern is related to one row of Jacobian matrix • Patterns are independent of each other Memory comparison Row-column multiplication Computation comparison Column-row multiplication

Proposed Second Order Computation – Derivation • Hagan and Menhaj LM algorithm or NBN algorithm • Improved Computation

Proposed Second Order Computation – Pseudo Code • Properties: • No need for Jacobian matrix storage • Vector operation instead of matrix operation • Main contributions: • Significant memory reduction • Memory reduction benefits computation speed • NO tradeoff ! • Memory limitation caused by Jacobian matrix storage in second order algorithms is solved • Again, considering the MINST problem, the memory cost for storage Jacobian elements could be reduced from more than 35 gigabytes to nearly 30.7 kilobytes Pseudo Code

Proposed Second Order Computation – Experimental Results • Memory Comparison • Time Comparison

Traditional Computation – Forward Computation • For each training pattern p • Calculate net for neuron j • Calculate output for neuron j • Calculate derivative for neuron j • Calculate output at output m • Calculate error at output m

Traditional Computation – Backward Computation • For first order algorithms • Calculate delta [12] • Do gradient vector • For second order algorithms • Calculate delta • Calculate Jacobian elements

Proposed Forward-Only Algorithm • Extend the concept of backpropagation factor δ • Original definition: backpropagated from outputm to neuron j • Our definition: backpropagated from neuron k to neuron j

Proposed Forward-Only Algorithm • Regular Table • lower triangular elements: k≥j, matrix δ has triangular shape • diagonal elements: δk,k=sk • Upper triangular elements: weight connections between neurons

Proposed Forward-Only Algorithm • Train arbitrarily connected neural networks

Proposed Forward-Only Algorithm • Train networks with multiple outputs • The more outputs the networks have, the more efficient the forward-only algorithm will be 1 output 2 outputs 3 outputs 4 outputs

Proposed Forward-Only Algorithm • Pseudo codes of two different algorithms • In forward-only computation, the backward computation (bold in left figure) is replaced by extra computation in forward process (bold in right figure) Forward-only algorithm Traditional forward-backward algorithm

Proposed Forward-Only Algorithm • Computation cost estimation • Properties of the forward-only algorithm • Simplified computation: organized in a regular table with general formula • Easy to be adapted for training arbitrarily connected neural networks • Improved computation efficiency for networks with multiple outputs • Tradeoff • Extra memory is required to store the extended δ array MLP networks with one hidden layer; 20 inputs

Proposed Forward-Only Algorithm • Experiments: training compact neural networks with good generalization ability 8 neurons, FO SSETrain=0.0044, SSEVerify=0.0080 8 neurons, EBP SSETrain=0.0764, SSEVerify=0.1271 Under-fitting 12 neurons, EBP SSETrain=0.0018, SSEVerify=0.4909 Over-fitting

Proposed Forward-Only Algorithm • Experiments: comparison of computation efficiency Forward Kinematics [13] ASCII to Images Error Correction

Software • The tool NBN Trainer is developed based on Visual C++ and used for training neural networks • Pattern classification and recognition • Function approximation • Available online (currently free): http://www.eng.auburn.edu/~wilambm/nnt/index.htm

Parity-2 Problem • Parity-2 Patterns

Conclusion • Second order algorithms are more efficient and advanced in training neural networks • The proposed second order computation removes Jacobian matrix storage and multiplication. It solves memory limitation • The proposed forward-only algorithm simplifies the computation process in second order training: a regular table + a general formula • The proposed forward-only algorithm can handle arbitrarily connected neural networks • The proposed forward-only algorithm has speed benefit for networks with multiple outputs

Recent Research • RBF networks • ErrCor algorithm: hierarchical training algorithm • Network size increases based on the training information • No more trial-by-trial • Applications of Neural Networks (future work) • Dynamic controller design • Smart grid distribution systems • Pattern recognition in EDA software design

Cutting-Edge Neural Network Training Algorithms Research

Cutting-Edge Neural Network Training Algorithms Research

Presentation Transcript

Training Neural Networks

Defense Advanced Research Projects Agency

Supervised Training of Neural Networks

Quantum Algorithms for Neural Networks Daniel Shumow

Data mining in wireless sensor networks based on artificial neural-networks algorithms

Aug. 29 th 2011

Advanced Neural Networks

Learning Algorithms of Neural Networks and Applications

Defense Advanced Research Project Agency

Aug. 17

Kangfu Yu November 17, 2011

Elif Aslı Albayrak Ph.D. Thesis Defense October 7 th , 2011

Genetic Algorithms in Artificial Neural Networks

Neural Networks

Genetic Algorithms and Neural Networks

The Evolution of Learning Algorithms for Artificial Neural Networks

Presentation on Neural Networks.

Experiments with Distributed Training of Neural Networks on the Grid

Neural Networks

Ph.D. defense

Journalism Mon., Aug. 17 th

Neural networks – Hands on