Artificial Neural Networks

Artificial Neural Networks This is lecture 15 of the module `Biologically Inspired Computing’ An introduction to Artificial Neural Networks

Recall this from the first (overview) lecture: Some things that classical computing is not good at. Pattern Recognition Classification In week 1, we recognised that these were the same thing, and defined the problem of classification, which is to to see a complex pattern and assign the correct label to it. Well, classical computational methods are fine if we know the rules that underpin a classification task – but when we don’t, they are useless. `Brains’, however, seem to be very good at this task

Artificial Neural Networks An ANN is a bio-inspired machine learning technique very, very widely applicable in almost every area of industry and science. In this lecture we will look at the basic ideas involved, which are really quite simple. Understanding how they are usually `trained’ needs a certain level of maths, but we won’t go into that. Anyway, it turns out that we can train them with EAs instead (in fact PSO is particularly good at it …)

Real Neural Networks The business end of this is made of lots of these joined in networks like this Our own computations are performed in/by this network This type of computer is fabulous at pattern recognition

Real Neural Networks II Aarghhh !!! Black stripes Yellow stripes buzzing Mower Excitatory connection Inhibitory connection A part of my brain. High level of excitation leads to neuron firing An individual neuron receives electrical signals from other (excited) neurons. If the total input is enough, it will become active, and send signals out to those it is connected to.

Artificial Neural Networks An artificial neuron (node) An ANN (neural network) Nodes abstractly model neurons; they do very simple number crunching Numbers flow from left to right: the numbers arriving at the input layer get “transformed” to a new set of numbers at the output layer. There are many kinds of nodes, and many ways of combining them into a network, but we need only be concerned with the types described here, which turn out to be sufficient for any (consistent) pattern classification task.

A single node (artificial neuron) works like this 3 2 1 -2 2

A single node (artificial neuron) works like this 4 3 2 -3 1 -2 2 0 Numbers come along (inputs from us, or from other nodes)

A single node (artificial neuron) works like this 4x3=12 2 -3x1=-3 -2 0x2=0 They get multiplied by the strengths on the input lines …

A single node (artificial neuron) works like this 3 2 f(12-3+0) 1 -2 2 The node adds up its inputs, and applies a simple function to it

A single node (artificial neuron) works like this 3 2 x f(9) 1 -2 x f(9) 2 It sends the result out along its output lines, where it will in turn get multiplied by the line weights before being delivered …

Simple ANN Example 1 A 1 0.5 1 B -1 0.5 This one calculates XOR of the inputs A and B Each non-input node is an LTU (linear threshold unit), with a threshold of 1. Which means: if the weighted sum of inputs is >= 1, it fires out a 1, Otherwise it fires out a zero.

Computing AND with a NN A 0.5 0.5 B The blue node is the output node. It adds the weighted inputs, and outputs 1 if the result is >= 1, otherwise 0.

Computing OR with a NN A 1 1 B The blue node is the output node. It adds the weighted inputs, and outputs 1 if the result is >= 1, otherwise 0. With these weights, only one of the inputs needs to be a 1, and the output will be 1. Output will be 0 only if both inputs are zero.

Computing NOT with a NN A -1 1 Bias unit which always sends fixed signal of 1 This NN computes the NOT of input A The blue unit is a threshold unit with a threshold of 1 as before. So if A is 1, the weighted sum at the output unit is 0, hence output is 0; If A is 0, the weighted sum is 1, so output is 1.

So, an NN can compute AND, OR and NOT – so what? It is straightforward to combine ANNs together, with outputs from some becoming the inputs of others, etc. That is, we can combine them just like logic gates on a microchip. E.g. this one computes (A AND B) OR NOT(A OR C)… A 0.5 0.5 B

And you’re telling me this because … ? Imagine this. Image of handwritten character converted into array of grey levels (inputs) 26 outputs, one for each character a 7 2 b 0 c 0 d e 3 … … … 0 f Weights are the links are chosen such that the output corresponding to the correct letter emits a 1, and all the others emit a 0. This sort of thing is not only possible, but routine: Medical diagnosis, wine-tasting, lift-control, sales prediction, …

Getting the Right Weights Clearly, an application will only be accurate if the weights are right. An ANN starts with randomised weights And with a database of known examples for training 7 0 If this pattern corresponds to a “c” 2 0 We want these outputs 0 1 0 0 0 3 0 0 If wrong, weights are adjusted in a simple way which makes it more likely that the ANN will be correct for this input next time

Training an NN It works like this: Send Training Pattern in Crunch to outputs Adjust weights All correct STOP Some wrong Present a pattern as a series of numbers at the first layer of nodes. Each node in the next layer does its simple processing, and sends its results to the next layer, and so on, until numbers call out at the output layer Compare the NN’s output pattern with the known correct pattern for this input. If different, adjust the weights somehow to make it more likely to be correct on this pattern next time.

`Classical’ NN Training An algorithm called backpropagation BP is the classic way of training a neural network. Based on partial differentiation, it prescribes a way to adjust the weights so that the error on the latest pattern would probably be reduced next time. However, we can instead use an EA to evolve the weights for a NN. In this context, we can see BP as similar to a constructive heuristic approach; it will provide fast results, but these results will usually be at a poor local minimum. The first ever application of particle swarm optimisation, on the other hand, showed that it was faster than BP, with better results.

The ANN is Learning during its training phase When it is in use, providing decisions/classifications for live cases it hasn’t seen before, we expect a reasonable decision from it. I.e. we want it to generalise well. Generalisation Suppose a network was trained with the black As and Bs; here, the black line is a visualisation of it’s decision space; it will think anything on one side is an A, and anything on the other side is a B.the white A represents an unseen test case. In the third example, it thinks this is a B A A A A A A A B A B A B A A A B B B B B B Good generalisation Fairly poor generalisation Stereotyping? Coverage and extent of training data helps to avoid poor generalisaton Main Point: when an NN generalises well, its results seems sensible, intuitive, and generally more accurate than people

More next time There are certain things you need to know about ANNs example applications how to avoid poor generalisation other useful types of NNs, with applications You will have to wait until Thursday to find out

Artificial Neural Networks