Neural Networks Part 4

Neural NetworksPart 4 Dan SimonCleveland State University

Outline • Learning Vector Quantization (LVQ) • The Optimal Interpolative Net (OINet)

Learning Vector Quantization (LVQ) Invented by Tuevo Kohonen in 1981 Same architecture as the Kohonen Self Organizing Map Supervised learning x1 y1 w11 w1k w1m wi1 xi yk wik wim wn1 wnk xn ym wnm

LVQ Notation: x = [x1, …, xn] = training vector T(x) = target; class or category to which x belongs wk = weight vector of k-th output unit = [w1k, …, wnk] a = learning rate LVQ Algorithm: Initialize reference vectors (that is, vectors which represent prototype inputs for each class) while not (termination criterion) for each training vector x k0 = argmink || x – wk || if k0 = T(x) then wk0 wk0 + a(x – wk0) else wk0  wk0 – a(x – wk0) end if end for end while

We have three input classes. Training input x is closest to w2. If x  class 2, then w2 w2 + a(x – w2) that is, move w2 towards x. If x  class 2, then w2 w2 – a(x – w2) that is, move w2 away from x. LVQ Example w3 w2 x–w2 x LVQ reference vector initialization: Use a random selection of training vectors, one from each class. Use randomly-generated weight vectors. Use a clustering method (e.g., the Kohonen SOM). w1

LVQ Example: LVQ1.m (1, 1, 0, 0)  Class 1 (0, 0, 0, 1)  Class 2 (0, 0, 1, 1)  Class 2 (1, 0, 0, 0)  Class 1 (0, 1, 1, 0)  Class 2 Final weight vectors: (1.04, 0.57, 0.04, 0.00) (0.00, 0.30, 0.62, 0.70)

LVQ Example: LVQ2.m Training data from Fausett, p. 190. Four initial weight vectors are at the corners of the training data. Final classification results on the training data, and final weight vectors. 14 classification errors after 20 iterations.

LVQ Example: LVQ3.m Training data from Fausett, p. 190. 20 initial weight vectors are randomly chosen with random classes. Final classification results on the training data, and final weight vectors. Four classification errors after 600 iterations. In practice it would be better to use our training data to assign the classes of the initial weight vectors.

LVQ Extensions: • The graphical illustration of LVQ gives us some ideas for algorithmic modifications. • Always move the correct vector towards x, and move the closest p vectors that are incorrect away from x. • Move incorrect vectors away from x only if they are within a distance threshold. • Popular modifications are called LVQ2, LVQ2.1, and LVQ3 (not to be confused with the names of our Matlab programs). w3 w2 x–w2 x w1

LVQ Applications to Control Systems: • Most LVQ applications involve classification • Any classification algorithm can be adapted for control • Switching control – switch between control algorithms based on the system features (input type, system parameters, objectives, failure type, …) • Training rules for a fuzzy controller – if input 1 is Ai and input 2 is Bk, then output is Cik – LVQ can be used to classify x and y • User intent recognition – for example, a brain machine interface (BMI) can recognize what the user is trying to do

The optimal interpolative net (OINet) – March 1992 Pattern classification; M classes; if x  class k, then yi = ki The network grows during training, but only as large as needed. x1  y1 v11 w11 v12 w12 v1q w1m v21 w21 x2  y2 v22 w22 v2q w2m vN1 wq1 vN2 wq2 xN  yM vNq wqM q hidden neurons vi = weight vector to i-th hidden neuron; vi is N-dimensional vi = prototype; { vi }  { xi }

Suppose we have q training samples: y(xi) = yi, for i  {1, …, q}. Then: Note if xi= xk for some i  k, then G is singular

The OINet works by selecting a set of {xi} to use as input weights. These are the prototype vectors {vi}, i = 1, …, p. Choose a set of {xi} to optimize the output weights W. These are the subprototypes {zi}, i = 1, …, l. Include xi in {zi} only if needed to correctly classify xi. Include xi in {vi} only if G is not ill-conditioned, and only if it decreases the total classification error. Use l inputs for training. Use p hidden neurons. Gik = (||vizk||)

Suppose we have trained the network for a certain l and p. All training inputs considered so far have been correctly classified. We then consider xi, the next input in the training set. Is xi correctly classified with the existing OINet? If so, everything is fine, and we move on to the next training input. If not, then we need to add xi to the subprototype set {zi} and obtain a new set of output weights W. We also consider adding xi to the prototype set {vi}, but only if it does not make G ill-conditioned, and only if it reduces the error by enough.

Suppose we have trained the network for a certain l and p. Suppose xi, the next input in the training set, is not correctly classified. We need to add xi to the subprototype set {zi} and retrain W. This is going to get expensive if we have lots of data, and if we have to perform a new matrix inversion every time we add a subprototype. Note: Equation numbers from here on refer to those in Sin & deFigueiredo

(Eq. 10) We only need scalar division to compute the new inverse, because we already know the old inverse!

We can implement this equation, but we can also find a recursive solution.

We have a recursive equation for the new weight matrix.

Now we have to decide if we should add xi to the prototypeset {vi}. We will do this if it does not make G ill-conditioned, and if it reduces the error by enough. I wonder if we can think of something clever to avoid the new matrix inversion …

Homework: Derive this

We already have everything on the right side, so we can derive the new inverse with only scalar division (no additional matrix inversion). Small   ill conditioning So don’t use xi as a prototype if  < 1 (threshold) Even if  > 1 , don’t use xi as a prototype if the error only decreases by a small amount, because it won’t be worth the extra network complexity. Before we check e, let’s see if we can find a recursive formula for W …

We have a recursive equation for the new weight matrix.

Back to computing e (see Eq. 38): Suppose Ax = b, and dim(x) < dim(b), so system is over-determined Least squares: Now suppose that we add another column to A and another element to x. We have more degrees of freedom in x, so the approximation error should decrease.

Matrix inversion lemma: But notice:

We have a very simple formula for the error decrease due to adding a prototype. Don’t add the prototype unless (e / e1) > 2 (threshold)

The OINet Algorithm Training data {xi}  {yi}, i  {1, …, q} Initialize prototype set: V = {x1}, and # of prototypes = p = |V| = 1 Initialize subprototype set: Z = {x1}, and # of subprototypes = l = |Z| = 1 x1 y1 v11 w11 x2 v21 y2  w12 w1m What are the dimensions of these quantities? vN1 xN yM 1 hidden neuron

Begin outer loop: Loop until all training patterns are correctly classified. n = q – 1 Re-index {x2 … xq} & {y2 … yq} from 1 to n For i = 1 to n (training data loop) Send xi through the network. If correctly classified, continue loop. Begin subprototype addition:

Homework: • Implement the OINet using FPGA technology for classifying subatomic particles using experimental data. You may need to build your own particle accelerator to collect data. Be careful not to create any black holes. • Find the typo in Sin and deFigueiredo’s original OINet paper.

References • L. Fausett, Fundamentals of Neural Networks, Prentice Hall, 1994 • S. Sin and R. deFigueiredo, An evolution-oriented learning algorithm for the optimal intepolative net, IEEE Transactions on Neural Networks, vol. 3, no. 2, pp. 315–323, March 1992

Neural Networks Part 4

Neural Networks Part 4

Presentation Transcript

Neural Networks Chapter 4

Neural Networks

Chapter 4: Artificial Neural Networks

Neural Networks

Neural Networks

Neural Networks

Classification Part 3: Artificial Neural Networks

Neural Networks

PSY105 Neural Networks 4/5

Neural Networks

PSY105 Neural Networks 4/5

Neural Networks

Neural Networks

Neural networks

Neural Networks Part 3

Neural Networks

Neural Networks (NN) Part 1

Part 8 : Neural Networks

Chapter 7 (part 3) Neural Networks

Neural Networks

Neural Networks