Dimensions of Neural Networks

Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost

Outline • Motivation • Neural Networks Power • Kolmogorov Theory • Cascade Correlation

Motivation • Consider you are an engineer and you know ANN • You encounter a problem that can not be solved with common analytical approaches • You decide to use ANN

But… • Some questions • Is this problem solvable using ANN? • How many neurons? • How many layers? • …

Two Approaches • Fundamental Analyses • Kolmogrov Theory • Adaptive Networks • Cascade Correlation

Single layer Networks • Limitations of the perceptron and linear classifiers

A Solution

Network Construction (x,y)→(x^2, y^2,x*y) 1 2

Network Construction (con…)

Network Construction (Con…)

Learning Mechanism • Using Error Function • Gradient Descent

Kolmogorov theorem (concept) • An example: • Any continuous function of n dimensions can be completely characterized by a dimensional continuous functions

g r y x An Idea • Suppose we want to construct f (x, y) • A simple idea: find a mapping • (x, y) → r • Then define a function g such that: • g(r) = f(x, y)

An Example • Suppose we have a discrete function: • We choose a mapping • We define the 1-dimentional function • So

Kolmogrov theorem • In the illustrated example we had:

Applying to the neural networks

Universal Approximation • Neural Networks with a hidden layer can approximate any continuous function with arbitrary precision • Use independent function from main function • approximate the network with traditional networks

A kolmogorov Network • We have to define: • Mapping • Function g

Spline Function • Linear combination of several 3-dimensional functions • Used to approximate functions with given points

Mapping y x

X2=4.5 X1=2.5 Example 2.1 1.6 x1 2.5 3.2 2.5 1.4 x2 4.5

X2=4.5 X1=2.5 Function g • Now for each unique input value of a we should define a output value g corresponding to f • We choose the value of f in the center of the square

Function g (Con…)

Reduce Error • Shifting defined patterns • N different patterns will be generated • Use avg y

Replace the function • With sufficiently large number of knots:

Cascade Correlation • Dynamic size, depth, topology • Single layer learning in spite of multilayer structure • Fast learning

Architecture

Algorithm step 1

Adding Hidden Layer

Correlation • Residual error for output unit for pattern p • Average residual error for output unit • Computed activation for input vector x(p) • Z(p) • Average activation, over all patterns, of candidate unit

Correlation • Use Variance as a similarity criteria • Update weights similar to gradient descent

Algorithm step 2

Adding Hiding Neuron

Algorithm Step 3

Final Result

An Example • 100 Run • 1700 epochs on avg • Beats standard backprob with factor 10 with the same complexity

Results • Cascade Correlation is either better • Only forward pass • Many of epochs are run while the network is very small • Cashing mechanism

Network Steps

Network Steps (con..)

Another Example • N-input parity problem • Standard backprob takes 2000 epoches on N=8 with 16 hidden neurons

Discussion • There is no need to guess the size, depth and the connectivity pattern • Learns fast • Can build deep networks (high order feature detector) • Herd effect • Results can be cashed

Conclusion • A Network with a hidden layer can define complex boundaries and can approximate any function • The number of neurons in the hidden layer determines the amount of approximation • Dynamic Networks

Dimensions of Neural Networks

Dimensions of Neural Networks

Presentation Transcript

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

NEURAL NETWORKS

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks