1 / 11

Neural Networks

Neural Networks. 10701 /15781 Recitation February 12, 2008. Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials. Recall Linear Regression. Prediction of continuous variables Learn the mapping f: X  Y

Télécharger la présentation

Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks 10701/15781Recitation February 12, 2008 Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

  2. Recall Linear Regression • Prediction of continuous variables • Learn the mapping f: X  Y • Model is linear in the parameters w (+ some noise) • Assume Gaussian noise • Learn MLE w =

  3. Neural Network • Neural nets are also models withw parameters in them. They are now called weights. • As before, we want to compute the weights to minimize sum-of-squared residuals • Which turns out, under “Gaussian i.i.d noise” assumption to be max. likelihood. • Instead of explicitly solving for max. likelihood weights, we use Gradient Descent

  4. Perceptrons • Input x=(x1,…, xn) and target value t: or • Given training data {(x(l),t(l))}, find w which minimizes

  5. Gradient descent • General framework for finding a minimum of a continuous (differentiable) function f(w) • Start with some initial value w(1) and compute the gradient vector • The next value w(2)is obtained by moving some distance from w(1) in the direction of steepest descent, i.e., along the negative of the gradient

  6. Gradient Descent on a Perceptron • The sigmoid perceptron update rule

  7. Boolean Functions e.g using step activation function with threshold 0, can we learn the function • X1 AND X2? • X1OR X2? • X1AND NOT X2? • X1XOR X2?

  8. Multilayer Networks • The class of functions representable by perceptron is limited • Think of nonlinear functions:

  9. A 1-Hidden layer Net • Ninput=2, Nhidden=3, Noutput=1

  10. Backpropagation • HW2 – Problem 2 • Output in k-th output unit from input x • With bias: add a constant term for every non-input unit • Learn w to minimize

  11. Backpropagation Initialize all weights Do until convergence 1. Input a training example to the network and compute the output ok 2. Update each hidden-to-output weight wkj by 3. Update each input-to-hidden weight wji by

More Related