1 / 24

Warm-up example (1)

Warm-up example (1). How many hidden layers would you use? How many hidden units per layer? How many connections would your net have? How would you select the initial weights of the connections? When would you stop the iterations of the error back propagation algorithm?.

avian
Télécharger la présentation

Warm-up example (1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Warm-up example (1) • How many hidden layers would you use? • How many hidden units per layer? • How many connections would your net have? • How would you select the initial weights of the connections? • When would you stop the iterations of the error back propagation algorithm? Having the well-known XOR problemand a NN for its approximation, answer the following questions:

  2. Warm-up example (2) • If the updated weights after an iteration of the error back propagation procedure are almost identical to the weights before that iteration but the output is not the desired one? • If the number of iterations exceeds a pre-defined threshold? • If the output error seems to be increasing instead of decreasing? What would you do if the trained neural net does not generate the desired outputs and behaves as follows:

  3. Item-by-item learning (sequential) for epoch = 1:num_epochs for t = 1:numSamples % forward pass % backward pass end end for epoch = 1:num_epochs % shuffle training data for t = 1:numSamples % forward pass % backward pass end end perm = randperm( numSamples ); x = x( perm ); d = d( perm );

  4. Batch learning bs = # % batch size for epoch = 1:num_epochs for s = 1:bs:numSamples % zero in_batch sums here for b = 1:bs t = s + b - 1 % forward pass % backward pass % update in_batch sums based on BP (deltas) end % update weights and biases here Wi = Wi - LR * (sumWi / bs); % etc. end

  5. Generalization • Overfitting, network pruning (c) The MathWorks (Matlab help)

  6. Strategies • Regularization • 1) "trainbr" • The Bias/Variance Dilemma • 2) Specific adjustment of weights • many techniques suggested, e.g. net.performFcn=’msreg’ + corresponding parameters MSE_REG = A * MSE + (1-A) * MSW MSW = 1/N [SUM (W^2)] • Decreases weights and biases.  • Early stopping • 3 sets (training, validation, testing; 40:30:30)

  7. Early stopping • After some training, calculate the validation error • synaptic weights fixed • Continue either with training or testing MSE Validation sample Training sample Number of epoch Early stopping point

  8. Bayesian regularization (c) The MathWorks (Matlab help)

  9. Early stopping (c) The MathWorks (Matlab help)

  10. Matlab example 1/4 The goal is to can determine serum cholesterol levels from measurements of spectral content of a blood sample. There are 264 patients for which we have measurements of 21 wavelengths of the spectrum. For the same patients we also have measurements of hdl, ldl, and vldl cholesterol levels, based on serum separation. load choles_all [pn,meanp,stdp,tn,meant,stdt] = prestd(p,t); [ptrans,transMat] = prepca(pn,0.001); [R,Q] = size(ptrans) [R = 4, Q = 264] iitst = 2:4:Q; iival = 4:4:Q; iitr = [1:4:Q 3:4:Q]; val.P = ptrans(:,iival); val.T = tn(:,iival); test.P = ptrans(:,iitst); test.T = tn(:,iitst); ptr = ptrans(:,iitr); ttr = tn(:,iitr);

  11. Matlab example 2/4 net = newff(minmax(ptr),[5 3],{'tansig' 'purelin'},'trainlm'); [net,tr]=train(net,ptr,ttr,[],[],val,test); TRAINLM, Epoch 0/100, MSE 3.11023/0, Gradient 804.959/1e-10 TRAINLM, Epoch 15/100, MSE 0.330295/0, Gradient 104.219/1e-10 TRAINLM, Validation stop. plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf) legend('Training','Validation','Test',-1); ylabel('Squared Error'); xlabel('Epoch') an = sim(net,ptrans); a = poststd(an,meant,stdt); for i=1:3 figure(i) [m(i),b(i),r(i)] = postreg(a(i,:),t(i,:)); end

  12. Matlab example 3/4 (c) The MathWorks (Matlab help)

  13. Matlab example 4/4 ldl, R=0.862 hdl, R=0.886 vldl, R=0.563 (c) The MathWorks (Matlab help)

  14. Cover’s separability theorem • A pattern classification cast in high dimensional space nonlinearly is more likely to be linearly separable than in a low dimension space X=(x1 , x2 ) X O X X X O X X O O O X X O O j = 2 2 j = ( X ) [ x , x , x , x , x x ] ( X ) [ x , x ] j = 2 2 ( X ) [ x , x , x , x ] 1 2 1 2 1 2 1 2 1 2 1 2 + + = a x a x a 0 + + + + + = 2 2 + + + + = 2 2 a x a x a x a x a x x a 0 a x a x a x a x a 0 1 1 2 2 0 1 1 2 2 3 1 4 2 5 1 2 0 1 1 2 2 3 1 4 2 0 (# of basis functions : 2) (# of basis functions : 4) ( # of basis functions : 5)

  15. Radial Basis Function (RBF) networks Architecture: Gaussian basis function, s=0.5, 1.0, 1.5 radbas(n) = exp(-n^2)

  16. Structure of RBF Networks • Input layer • Hidden layer • Hidden units provide a set of basis function • The higher dimension, the more linearly separable (meaning with the linear combination of basis functions) • Output layer • Linear combination of hidden functions

  17. XOR example x x y ( x ) ( x ) y' j j 1 2 1 2 0 0 0.13 1 0 1 0.36 0.36 1 0 0.36 0.36 1 1 1 0.13 j2(x) x2 ? j1(x) x1

  18. This makes the trick (c) The MathWorks (Matlab help)

  19. RBF, well-estimated RBF in Matlab net = newrbe(P,T,SPREAD) net = newrb(P,T,GOAL,SPREAD)

  20. RBF, too few BF

  21. RBF, too small stdev

  22. RBF, too large stdev

  23. NN taxonomy 1/2 1) Paradigm • Supervised • Unsupervised 2) Learning Rule • Error-correction • Memory-based • Hebbian • Competitive • Boltzman According to: Jain,A.K. and Mao,J. (1996). Artificial Neural Networks: A Tutorial, IEEE Computer, vol.29, N: 3, pp.31-44.

  24. NN taxonomy 2/2 3) Learning Algorithm • Perceptron • BP • Kohonen SOM, ... 4) Network Architecture • FF • REC 5) Task • Pattern classification • Time-series modeling, ....

More Related