1 / 64

Supervised Learning

Business School Institute of Business Informatics. Supervised Learning. Uwe Lämmel. www.wi.hs-wismar.de/~laemmel U.laemmel@wi.hs-wismar.de. Neural Networks. Idea Artificial Neuron & Network Supervised Learning Unsupervised Learning Data Mining – other Techniques. Feed-Forward Networks

jayden
Télécharger la présentation

Supervised Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Business SchoolInstitute of Business Informatics Supervised Learning Uwe Lämmel www.wi.hs-wismar.de/~laemmel U.laemmel@wi.hs-wismar.de

  2. Neural Networks • Idea • Artificial Neuron & Network • Supervised Learning • Unsupervised Learning • Data Mining – other Techniques

  3. Feed-Forward Networks Perceptron – AdaLinE – LTU Multi-Layer networks Backpropagation Algorithm Pattern recognition Data preparation Examples Bank Customer Customer Relationship Supervised Learning

  4. Connections • Feed-back / auto-associative • From (output) layer back to previous (hidden/input) layer • All neurons fully connected to each other • Feed-forward • Input layer • Hidden layer • Output layer Hopfield network

  5. ... Perceptron – Adaline – TLU • One layer of trainable links only • Adaptive linear element • Threshold Linear Unit • class of neural network of a special architecture:

  6. Papert, Minsky and Perceptron - History "Once upon a time two daughter sciences were born to the new science of cybernetics. One sister was natural, with features inherited from the study of the brain, from the way nature does things. The other was artificial, related from the beginning to the use of computers. … But Snow White was not dead. What Minsky and Papert had shown the world as proof was not the heart of the princess; it was the heart of a pig." Seymour Papert, 1988

  7. mapping layer output-layer picture trainable, fully connected fixed 1-1- links Perception Perceptionfirst step of recognitionbecoming aware of something via the senses

  8. Perceptron • Input layer • binary input, passed trough, • no trainable links • Propagation function netj =  oiwij • Activation functionoj = aj = 1 if netjj , 0 otherwise A perceptron can learn all the functions, that can be represented, in a finite time . (perceptron convergence theorem, F. Rosenblatt)

  9. j 1 2 Linear separable Neuron j should be 0,iff both neurons 1 and 2 have the same value (o1=o2), otherwise 1: netj = o1w1j + o2w2j 0 w1j + 0w2j < j 0 w1j + 1w2jj 1w1j + 0w2jj 1w1j + 1w2j<j ? • j w2j w1j

  10. o2 (1,1) 1 o1 (0,0) 1 o1*w1 +o2*w2=q Linearseparable • netj = o1w1j + o2w2j line in a 2-dim. space • line divides plane so, that (0,1) and (1,0) are in different sub planes. • the network can not solve the problem. • a perceptron can represent only some functions •  a neural network representing the XOR-function needs hidden neurons

  11. Learning is easy while input pattern  dobegin next input patter calculate output foreachj in OutputNeurons do ifojtjthen ifoj=0 then {output=0, but 1 expected } for eachi in InputNeurons dowij:=wij+oi elseifoj=1 then {output=1, but 0 expected } foreachi in InputNeurons dowij:=wij-oi ; end repeat until desired behaviour

  12. Exercise • Decoding • input: binary code of a digit • output - unary representation:as many digits 1, as the digit represents: 5 : 1 1 1 1 1 • architecture:

  13. Exercise • Decoding • input: Binary code of a digit • output: classification:0~ 1st Neuron, 1~ 2nd Neuron, ... 5~ 6th Neuron, ... • architecture:

  14. Exercises • Look at the EXCEL-file of the decoding problem • Implement (in PASCAL/Java) a 4-10-Perceptron which transforms a binary representation of a digit (0..9) into a decimal number. Implement the learning algorithm and train the network. • Which task can be learned faster?(Unary representation or classification)

  15. Exercises • Develop a perceptron for the recognition of digits 0..9. (pixel representation)input layer: 3x7-input neuronsUse the SNNS or JavaNNS • Can we recognize numbers greater than 9 as well? • Develop a perceptron for the recognition of capital letters. (input layer 5x7)

  16. multi-layer Perceptron Cancels the limits of a perceptron • several trainable layers • a two layer perceptron can classify convex polygons • a three layer perceptron can classify any sets multi layer perceptron = feed-forward network = backpropagation network

  17. Multi-layer feed-forward network

  18. Feed-Forward Network

  19. Training pattern p Oi=pi Ni netj Oj=actj Nj netk Ok=actk Nk Input-Layer hidden Layer(s) Output Layer Evaluation of the net output in a feed forward network

  20. Backpropagation-Learning Algorithm • supervised Learning • error is a function of the weights wi :E(W) = E(w1,w2, ... , wn) • We are looking for a minimal error • minimal error = hollow in the error surface • Backpropagation uses the gradient for weight adaptation

  21. error curve weight1 weight2

  22. Problem • error in output layer: • difference output – teaching output • error in a hidden layer? output teaching output hidden layer input layer

  23. Gradient descent • Gradient: • Vector orthogonal to a surface in direction of the strongest slope • derivation of a function in a certain direction is the projection of the gradient in this direction example of an error curve of a weight wi

  24. tan  = f‘(x) = 2x tan  = f(x) / (x-x‘)  x‘ =½(x + a/x) f(x)= x²-a  x x‘ Example: Newton-Approximation • calculation of the root • f(x) = x²-5 • x = 2 • x‘ = ½(x + 5/x) = 2.25 • X“= ½(x‘ + 5/x‘) = 2.2361

  25. Backpropagation - Learning • gradient-descent algorithm • supervised learning:error signal used for weight adaptation • error signal: • teaching – calculated output , if output neuron • weighted sum of error signals of successor • weight adaptation: • : Learning rate • : error signal

  26. Standard-Backpropagation Rule • gradient descent: derivation of a function • logistic function: f´act(netj) = fact(netj)(1- fact(netj)) = oj(1-oj) • the error signal j is therefore:

  27. Backpropagation • Examples: • XOR (Excel) • Bank Customer

  28. B A C Backpropagation - Problems

  29. Backpropagation-Problems • A: flat plateau • weight adaptation is slow • finding a minimum takes a lot of time • B: Oscillation in a narrow gorge • it jumps from one side to the other and back • C: leaving a minimum • if the modification in one training step is to high, the minimum can be lost

  30. Solutions: looking at the values • change the parameter of the logistic function in order to get other values • Modification of weights depends on the output:if oi=0 no modification will take place • If we use binary input we probably have a lot of zero-values: change [0,1] into [-½ , ½] or [-1,1] • use another activation function, eg. tanh and use [-1..1] values

  31. Solution: Quickprop • assumption: error curve is a square function • calculate the vertex of the curve slope of the error curve:

  32. Resilient Propagation (RPROP) • sign and size of the weight modification are calculated separately: bij(t) – size of modification bij(t-1) + if S(t-1)S(t) > 0 bij(t) = bij(t-1) - if S(t-1)S(t) < 0 bij(t-1) otherwise +>1 : both ascents are equal  „big“ step0<-<1 : ascents are different  „smaller“ step -bij(t) if S(t-1)>0  S(t) > 0 wij(t) = bij(t) íf S(t-1)<0  S(t) < 0 -wij(t-1) if S(t-1)S(t) < 0 (*) -sgn(S(t))bij(t) otherwise (*) S(t) is set to 0,S(t):=0 ; at time (t+1) the 4th case will be applied.

  33. Limits of the Learning Algorithm • it is not a model for biological learning • no teaching output in natural learning • no feedbacks in a natural neural network (at least nobody has discovered yet) • training of an ANN is rather time consuming

  34. Exercise - JavaNNS • Implement a feed forward network containing of 2 input neurons, 2 hidden neurons and one output neuron. Train the network so that it simulates the XOR-function. • Implement a 4-2-4-network, which works like the identity function. (Encoder-Decoder-Network). Try other versions: 4-3-4, 8-4-8, ...What can you say about the training effort?

  35. Pattern Recognition output layer 2. hidden layer 1. hidden layer input layer

  36. Example: Pattern Recognition JavaNNS example: Font

  37. „font“ Example • input = 24x24 pixel-array • output layer: 75 neurons, one neuron for each character: • digits • letters (lower case, capital) • separators and operator characters • two hidden layer of 4x6 neurons each • all neuron of a row of the input layer are linked to one neuron of the first hidden layer • all neuron of a column of the input layer are linked to one neuron of the second hidden layer.

  38. Exercise • load the network “font_untrained” • train the network, use various learning algorithms: (look at the SNNS documentation for the parameters and their meaning) • Backpropagation =2.0 • Backpropagation =0.8 mu=0.6 c=0.1 with momentum • Quickprop =0.1 mg=2.0 n=0.0001 • Rprop =0.6 • use various values for learning parameter, momentum, and noise: • learning parameter 0.2 0.3 0.5 1.0 • Momentum 0.9 0.7 0.5 0.0 • noise 0.0 0.1 0.2

  39. Example: Bank Customer A1: Credit history A2: debt A3: collateral A4: income • network architecture depends on the coding of input and output • How can we code values like good, bad, 1, 2, 3, ...?

  40. Data Pre-processing • objectives • prospects of better results • adaptation to algorithms • data reduction • trouble shooting • methods • selection and integration • completion • transformation • normalization • coding • filter

  41. Selection and Integration • unification of data (different origins) • selection of attributes/features • reduction • omit obviously non-relevant data • all values are equal • key values • meaning not relevant • data protection

  42. Completion / Cleaning • Missing values • ignore / omit attribute • add values • manual • global constant („missing value“) • average • highly probable value • remove data set • noised data • inconsistent data

  43. Transformation • Normalization • Coding • Filter

  44. Normalization of values • Normalization – equally distributed • in the range [0,1] • e.g. for the logistic functionact = (x-minValue) / (maxValue - minValue) • in the range [-1,+1] • e.g. for activation function tanhact = (x-minValue) / (maxValue - minValue)*2-1 • logarithmic normalization • act = (ln(x) - ln(minValue)) / (ln(maxValue)-ln(minValue))

  45. Binary Coding of nominal values I • no order relation, n-values • n neurons, • each neuron represents one and only one value: • example: red, blue, yellow, white, black 1,0,0,0,0 0,1,0,0,0 0,0,1,0,0 ... • disadvantage: n neurons necessary  lots of zeros in the input

  46. credit history debt collateral income Bank Customer Are these customers good ones? 1: bad high adequate 3 2: good low adequate 2

  47. Data Mining Cup 2002 The Problem: A Mailing Action • mailing action of a company: • special offer • estimated annual income per customer: • given: • 10,000 sets of customer datacontaining 1,000 cancellers (training) • problem: • test set containing 10,000 customer data • Who will cancel ? Whom to send an offer?

  48. Mailing Action – Aim? • no mailing action: • 9,000 x 72.00 = 648,000 • everybody gets an offer: • 1,000 x 43.80 + 9,000 x 66.30 = 640,500 • maximum (100% correct classification): • 1,000 x 43.80 + 9,000 x 72.00 = 691,800

  49. Goal Function: Lift basis: no mailing action: 9,000 · 72.00 goal = extra income: liftM = 43.8 · cM + 66.30 · nkM – 72.00· nkM

  50. ----- 32 input data ------ Data <important results> ^missing values^

More Related