1 / 50

Learning via Neural Networks

Learning via Neural Networks. L. Manevitz All rights reserved. Natural versus Artificial Neuron. Natural Neuron McCullough Pitts Neuron. Definitions and History. McCullough – Pitts Neuron Perceptron Adaline Linear Separability Multi-Level Neurons Neurons with Loops.

mahola
Télécharger la présentation

Learning via Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning via Neural Networks L. Manevitz All rights reserved

  2. Natural versus Artificial Neuron • Natural Neuron McCullough Pitts Neuron

  3. Definitions and History • McCullough –Pitts Neuron • Perceptron • Adaline • Linear Separability • Multi-Level Neurons • Neurons with Loops

  4. Sample Feed forward Network (No loops) Weights Output Weights Weights Input Wji Vik F(S wji xj

  5. Perceptron weights A Pattern Identification (Note: Neuron is trained)

  6. Feed Forward Network weights weights A

  7. Reason for Explosion of Interest • Two co-incident affects (around 1985 – 87) • (Re-)discovery of mathematical tools and algorithms for handling large networks • Availability (hurray for Intel and company!) of sufficient computing power to make experiments practical.

  8. Some Properties of NNs • Universal: Can represent and accomplish any task. • Uniform: “Programming” is changing weights • Automatic: Algorithms for Automatic Programming; Learning

  9. Networks are Universal • All logical functions represented by three level (non-loop) network (McCullough-Pitts) • All continuous (and more) functions represented by three level feed-forward networks (Cybenko et al.) • Networks can self organize (without teacher). • Networks serve as associative memories

  10. Universality • McCullough-Pitts: Adaptive Logic Gates; can represent any logic function • Cybenko: Any continuous function representable by three-level NN.

  11. Networks can “LEARN” and Generalize (Algorithms) • One Neuron (Perceptron and Adaline) Very popular in 1960s – early 70s • Limited by representability (only linearly separable • Feed forward networks (Back Propagation) • Currently most popular network (1987 –now) • Kohonen self-Organizing Network (1980s – now)(loops) • Attractor Networks (loops)

  12. Learnability (Automatic Programming) • One neuron: Perceptron and Adaline algorithms (Rosenblatt and Widrow-Hoff) (1960s –now) Feed forward Networks: Backpropagation (1987 – now) Associative Memories and Looped Networks (“Attractors”) (1990s – now)

  13. Generalizability • Typically train a network on a sample set of examples • Use it on general class • Training can be slow; but execution is fast.

  14. Feed Forward Network weights weights A

  15. Classical Applications(1986 – 1997) • “Net Talk” : text to speech • ZIPcodes: handwriting analysis • Glovetalk: Sign Language to speech • Data and Picture Compression: “Bottleneck” • Steering of Automobile (up to 55 m.p.h) • Market Predictions • Associative Memories • Cognitive Modeling: (especially reading, …) • Phonetic Typewriter (Finnish)

  16. Neural Network • Once the architecture is fixed; the only free parameters are the weights • Thus Uniform Programming • Potentially Automatic Programming • Search for Learning Algorithms

  17. Programming: Just find the weights! • AUTOMATIC PROGRAMMING • One Neuron: Perceptron or Adaline • Multi-Level: Gradient Descent on Continuous Neuron (Sigmoid instead of step function).

  18. Prediction delay Compare Input/Output NN

  19. Abstracting • Note: Size may not be crucial (apylsia or crab does many things) • Look at simple structures first

  20. Real and Artificial Neurons

  21. x1 w1 S x2 w2 wn Integrate Threshold xn One NeuronMcCullough-Pitts • This is very complicated. But abstracting the details,we have Integrate-and-fire Neuron

  22. Representability • What functions can be represented by a network of Mccullough-Pitts neurons? • Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons.

  23. Proof • Show simple functions: and, or, not, implies • Recall representability of logic functions by DNF form.

  24. x1 w1 S x2 w2 wn Integrate Threshold xn AND, OR, NOT 1.0 1.0 1.5

  25. x1 w1 S x2 w2 wn Integrate Threshold xn AND, OR, NOT 1.0 1.0 .9

  26. x1 w1 S x2 w2 wn Integrate Threshold xn AND, OR, NOT -1.0 .5

  27. DNF and All Functions

  28. Learnability and Generalizability • The previous theorem tells us that neural networks are potentially powerful, but doesn’t tell us how to use them. • We desire simple networks with uniform training rules.

  29. One Neuron(Perceptron) • What can be represented by one neuron? • Is there an automatic way to learn a function by examples?

  30. Perceptron Training Rule • Loop: Take an example. Apply to network. If correct answer, return to loop. If incorrect, go to FIX. FIX: Adjust network weights by input example . Go to Loop.

  31. Example of PerceptronLearning • X1 = 1 (+) x2 = -.5 (-) • X3 = 3 (+) x4 = -2 (-) • Expanded Vector • Y1 = (1,1) (+) y2= (-.5,1)(-) • Y3 = (3,1) (+) y4 = (-2,1) (-) Random initial weight (-2.5, 1.75)

  32. Trace of Perceptron • W1 y1 = (-2.5,1.75) (1,1)<0 wrong • W2 = w1 + y1 = (-1.5, 2.75) • W2 y2 = (-1.5, 2.75)(-.5, 1)>0 wrong • W3 = w2 – y2 = (-1, 1.75) • W3 y3 = (-1,1.75)(3,1) <0 wrong • W4 = w4 + y3 = (2, 2.75)

  33. Perceptron Convergence Theorem • If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time. • (MAKE PRECISE and Prove)

  34. Percentage of Boolean Functions Representible by a Perceptron • Input Functions Perceptron 1 4 4 2 16 14 3 104 256 4 1,882 65,536 5 94,572 10**9 6 15,028,134 10**19 7 8,378,070,864 10**38 8 17,561,539,552,946 10**77

  35. What is a Neural Network? • What is an abstract Neuron? • What is a Neural Network? • How are they computed? • What are the advantages? • Where can they be used? • Agenda • What to expect

  36. Perceptron Algorithm • Start: Choose arbitrary value for weights, W • Test: Choose arbitrary example X • If X pos and WX >0 or X neg and WX <= 0 go to Test • Fix: • If X pos W := W +X; • If X negative W:= W –X; • Go to Test;

  37. Perceptron Conv. Thm. • Let F be a set of unit length vectors. If there is a vector V* and a value e>0 such that V*X > e for all X in F then the perceptron program goes to FIX only a finite number of times.

  38. Applying Algorithm to “And” • W0 = (0,0,1) or random • X1 = (0,0,1) result 0 • X2 = (0,1,1) result 0 • X3 = (1,0, 1) result 0 • X4 = (1,1,1) result 1

  39. “And” continued • Wo X1 > 0 wrong; • W1 = W0 – X1 = (0,0,0) • W1 X2 = 0 OK (Bdry) • W1 X3 = 0 OK • W1 X4 = 0 wrong; • W2 = W1 +X4 = (1,1,1) • W3 X1 = 1 wrong • W4 = W3 –X1 = (1,1,0) • W4X2 = 1 wrong • W5 = W4 – X2 = (1,0, -1) • W5 X3 = 0 OK • W5 X4 = 0 wrong • W6 = W5 + X4 = (2, 1, 0) • W6 X1 = 0 OK • W6 X2 = 1 wrong • W7 = W7 – X2 = (2,0, -1)

  40. “And” page 3 • W8 X3 = 1 wrong • W9 = W8 – X3 = (1,0, 0) • W9X4 = 1 OK • W9 X1 = 0 OK • W9 X2 = 0 OK • W9 X3 = 1 wrong • W10 = W9 – X3 = (0,0,-1) • W10X4 = -1 wrong • W11 = W10 + X4 = (1,1,0) • W11X1 =0 OK • W11X2 = 1 wrong • W12 = W12 – X2 = (1,0, -1)

  41. Proof of Conv Theorem • Note: 1. By hypothesis, there is a d >0 such that V*X > d for all x e F 1. Can eliminate threshold (add additional dimension to input) W(x,y,z) > threshold if and only if W* (x,y,z,1) > 0 2. Can assume all examples are positive ones (Replace negative examples by their negated vectors) W(x,y,z) <0 if and only if W(-x,-y,-z) > 0.

  42. Proof (cont). • Consider quotient V*W/|W|. (note: this is multidimensional cosine between V* and W.) Recall V* is unit vector . Quotient <= 1.

  43. Proof(cont) • Now each time FIX is visited W changes via ADD. V* W(n+1) = V*(W(n) + X) = V* W(n) + V*X >= V* W(n) + d Hence V* W(n) >= n d (*)

  44. Proof (cont) • Now consider denominator: • |W(n+1)| = W(n+1)W(n+1) = ( W(n) + X)(W(n) + X) = |W(n)|**2 + 2W(n)X + 1 (recall |X| = 1) < |W(n)|**2 + 1 So after n times |W(n+1)|**2 < n (**)

  45. Proof (cont) • Putting (*) and (**) together: Quotient = V*W/|W| > nd/ sqrt(n) Since Quotient <=1 this means n < 1/d**2. This means we enter FIX a bounded number of times. Q.E.D.

  46. Geometric Proof • See hand slides.

  47. Additional Facts • Note: If X’s presented in systematic way, then solution W always found. • Note: Not necessarily same as V* • Note: If F not finite, may not obtain solution in finite time • Can modify algorithm in minor ways and stays valid (e.g. not unit but bounded examples); changes in W(n).

  48. What wont work? • Example: Connectedness with bounded diameter perceptron. • Compare with Convex with (use sensors of order three).

  49. What wont work? • Try XOR.

  50. Limitations of Perceptron • Representability • Only concepts that are linearly separable. • Compare: Convex versus connected • Examples: XOR vs OR

More Related