450 likes | 454 Vues
Dimensions of Neural Networks. Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost. Outline. Motivation Neural Networks Power Kolmogorov Theory Cascade Correlation. Motivation. Consider you are an engineer and you know ANN
E N D
Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Motivation • Consider you are an engineer and you know ANN • You encounter a problem that can not be solved with common analytical approaches • You decide to use ANN
But… • Some questions • Is this problem solvable using ANN? • How many neurons? • How many layers? • …
Two Approaches • Fundamental Analyses • Kolmogrov Theory • Adaptive Networks • Cascade Correlation
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Single layer Networks • Limitations of the perceptron and linear classifiers
Network Construction (x,y)→(x^2, y^2,x*y) 1 2
Learning Mechanism • Using Error Function • Gradient Descent
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Kolmogorov theorem (concept) • An example: • Any continuous function of n dimensions can be completely characterized by a dimensional continuous functions
g r y x An Idea • Suppose we want to construct f (x, y) • A simple idea: find a mapping • (x, y) → r • Then define a function g such that: • g(r) = f(x, y)
An Example • Suppose we have a discrete function: • We choose a mapping • We define the 1-dimentional function • So
Kolmogrov theorem • In the illustrated example we had:
Universal Approximation • Neural Networks with a hidden layer can approximate any continuous function with arbitrary precision • Use independent function from main function • approximate the network with traditional networks
A kolmogorov Network • We have to define: • Mapping • Function g
Spline Function • Linear combination of several 3-dimensional functions • Used to approximate functions with given points
Mapping y x
X2=4.5 X1=2.5 Example 2.1 1.6 x1 2.5 3.2 2.5 1.4 x2 4.5
X2=4.5 X1=2.5 Function g • Now for each unique input value of a we should define a output value g corresponding to f • We choose the value of f in the center of the square
Reduce Error • Shifting defined patterns • N different patterns will be generated • Use avg y
Replace the function • With sufficiently large number of knots:
Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation
Cascade Correlation • Dynamic size, depth, topology • Single layer learning in spite of multilayer structure • Fast learning
Correlation • Residual error for output unit for pattern p • Average residual error for output unit • Computed activation for input vector x(p) • Z(p) • Average activation, over all patterns, of candidate unit
Correlation • Use Variance as a similarity criteria • Update weights similar to gradient descent
An Example • 100 Run • 1700 epochs on avg • Beats standard backprob with factor 10 with the same complexity
Results • Cascade Correlation is either better • Only forward pass • Many of epochs are run while the network is very small • Cashing mechanism
Another Example • N-input parity problem • Standard backprob takes 2000 epoches on N=8 with 16 hidden neurons
Discussion • There is no need to guess the size, depth and the connectivity pattern • Learns fast • Can build deep networks (high order feature detector) • Herd effect • Results can be cashed
Conclusion • A Network with a hidden layer can define complex boundaries and can approximate any function • The number of neurons in the hidden layer determines the amount of approximation • Dynamic Networks