Warm-up example (1)

Warm-up example (1) • How many hidden layers would you use? • How many hidden units per layer? • How many connections would your net have? • How would you select the initial weights of the connections? • When would you stop the iterations of the error back propagation algorithm? Having the well-known XOR problemand a NN for its approximation, answer the following questions:

Warm-up example (2) • If the updated weights after an iteration of the error back propagation procedure are almost identical to the weights before that iteration but the output is not the desired one? • If the number of iterations exceeds a pre-defined threshold? • If the output error seems to be increasing instead of decreasing? What would you do if the trained neural net does not generate the desired outputs and behaves as follows:

Item-by-item learning (sequential) for epoch = 1:num_epochs for t = 1:numSamples % forward pass % backward pass end end for epoch = 1:num_epochs % shuffle training data for t = 1:numSamples % forward pass % backward pass end end perm = randperm( numSamples ); x = x( perm ); d = d( perm );

Batch learning bs = # % batch size for epoch = 1:num_epochs for s = 1:bs:numSamples % zero in_batch sums here for b = 1:bs t = s + b - 1 % forward pass % backward pass % update in_batch sums based on BP (deltas) end % update weights and biases here Wi = Wi - LR * (sumWi / bs); % etc. end

Generalization • Overfitting, network pruning (c) The MathWorks (Matlab help)

Strategies • Regularization • 1) "trainbr" • The Bias/Variance Dilemma • 2) Specific adjustment of weights • many techniques suggested, e.g. net.performFcn=’msreg’ + corresponding parameters MSE_REG = A * MSE + (1-A) * MSW MSW = 1/N [SUM (W^2)] • Decreases weights and biases. • Early stopping • 3 sets (training, validation, testing; 40:30:30)

Early stopping • After some training, calculate the validation error • synaptic weights fixed • Continue either with training or testing MSE Validation sample Training sample Number of epoch Early stopping point

Bayesian regularization (c) The MathWorks (Matlab help)

Early stopping (c) The MathWorks (Matlab help)

Matlab example 1/4 The goal is to can determine serum cholesterol levels from measurements of spectral content of a blood sample. There are 264 patients for which we have measurements of 21 wavelengths of the spectrum. For the same patients we also have measurements of hdl, ldl, and vldl cholesterol levels, based on serum separation. load choles_all [pn,meanp,stdp,tn,meant,stdt] = prestd(p,t); [ptrans,transMat] = prepca(pn,0.001); [R,Q] = size(ptrans) [R = 4, Q = 264] iitst = 2:4:Q; iival = 4:4:Q; iitr = [1:4:Q 3:4:Q]; val.P = ptrans(:,iival); val.T = tn(:,iival); test.P = ptrans(:,iitst); test.T = tn(:,iitst); ptr = ptrans(:,iitr); ttr = tn(:,iitr);

Matlab example 2/4 net = newff(minmax(ptr),[5 3],{'tansig' 'purelin'},'trainlm'); [net,tr]=train(net,ptr,ttr,[],[],val,test); TRAINLM, Epoch 0/100, MSE 3.11023/0, Gradient 804.959/1e-10 TRAINLM, Epoch 15/100, MSE 0.330295/0, Gradient 104.219/1e-10 TRAINLM, Validation stop. plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf) legend('Training','Validation','Test',-1); ylabel('Squared Error'); xlabel('Epoch') an = sim(net,ptrans); a = poststd(an,meant,stdt); for i=1:3 figure(i) [m(i),b(i),r(i)] = postreg(a(i,:),t(i,:)); end

Matlab example 3/4 (c) The MathWorks (Matlab help)

Matlab example 4/4 ldl, R=0.862 hdl, R=0.886 vldl, R=0.563 (c) The MathWorks (Matlab help)

Cover’s separability theorem • A pattern classification cast in high dimensional space nonlinearly is more likely to be linearly separable than in a low dimension space X=(x1 , x2 ) X O X X X O X X O O O X X O O j = 2 2 j = ( X ) [ x , x , x , x , x x ] ( X ) [ x , x ] j = 2 2 ( X ) [ x , x , x , x ] 1 2 1 2 1 2 1 2 1 2 1 2 + + = a x a x a 0 + + + + + = 2 2 + + + + = 2 2 a x a x a x a x a x x a 0 a x a x a x a x a 0 1 1 2 2 0 1 1 2 2 3 1 4 2 5 1 2 0 1 1 2 2 3 1 4 2 0 (# of basis functions : 2) (# of basis functions : 4) ( # of basis functions : 5)

Radial Basis Function (RBF) networks Architecture: Gaussian basis function, s=0.5, 1.0, 1.5 radbas(n) = exp(-n^2)

Structure of RBF Networks • Input layer • Hidden layer • Hidden units provide a set of basis function • The higher dimension, the more linearly separable (meaning with the linear combination of basis functions) • Output layer • Linear combination of hidden functions

XOR example x x y ( x ) ( x ) y' j j 1 2 1 2 0 0 0.13 1 0 1 0.36 0.36 1 0 0.36 0.36 1 1 1 0.13 j2(x) x2 ? j1(x) x1

This makes the trick (c) The MathWorks (Matlab help)

RBF, well-estimated RBF in Matlab net = newrbe(P,T,SPREAD) net = newrb(P,T,GOAL,SPREAD)

RBF, too few BF

RBF, too small stdev

RBF, too large stdev

NN taxonomy 1/2 1) Paradigm • Supervised • Unsupervised 2) Learning Rule • Error-correction • Memory-based • Hebbian • Competitive • Boltzman According to: Jain,A.K. and Mao,J. (1996). Artificial Neural Networks: A Tutorial, IEEE Computer, vol.29, N: 3, pp.31-44.

NN taxonomy 2/2 3) Learning Algorithm • Perceptron • BP • Kohonen SOM, ... 4) Network Architecture • FF • REC 5) Task • Pattern classification • Time-series modeling, ....

Warm-up example (1)

Warm-up example (1)

Presentation Transcript

Warm Up

Warm Up

Warm up

Warm Up

Warm up

Warm Up

Warm Up

Warm up

Warm-up

Warm Up

Warm Up

Warm Up

Warm Up

Warm-up

Warm Up

Warm-Up

Warm Up

Warm Up

Warm Up