1 / 23

Linear Discrimination

Linear Discrimination. Reading: Chapter 2 of textbook. Framework. Assume our data consists of instances x = ( x 1 , x 2 , ..., x n ) Assume data can be separated into two classes, positive and negative, by a linear decision surface .

zwi
Télécharger la présentation

Linear Discrimination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Discrimination Reading: Chapter 2 of textbook

  2. Framework • Assume our data consists of instances x= (x1, x2, ..., xn) • Assume data can be separated into two classes, positive and negative, by a linear decision surface. • Learning: Assuming data isn-dimensional, learn (n−1)-dimensional hyperplaneto classify the data into classes.

  3. Linear Discriminant Feature 1 Feature 2

  4. Feature 2 Linear Discriminant Feature 1

  5. Linear Discriminant Feature 1 Feature 2

  6. Example where line won’t work? Feature 1 Feature 2

  7. Perceptrons • Discriminantfunction: w0 is called the “bias”. −w0 is called the “threshold” • Classification:

  8. Perceptrons as simple neural networks x1 x2 xn +1 w1 w0 w2 o . . . wn

  9. Example • What is the class y? 1 -1 +1 .4 -.1 -.4

  10. Hyperplane Geometry of the perceptron In 2d: Feature 1 Feature 2

  11. In-class exercise Work with one neighbor on this: (a) Find weights for a perceptron that separates “true” and “false” in x1x2. Find the slope and intercept, and sketch the separation line defined by this discriminant. (b) Do the same, but for x1x2. (c) What (if anything) might make one separation line better than another?

  12. To simplify notation, assume a “dummy” coordinate (or attribute) x0 = 1. Then we can write: • We can generalize the perceptron to cases where we project data points xn into “feature space”,  (xn):

  13. Notation • Let S = {(xk, tk):k= 1, 2, ..., m} be a training set. Note: xkis a vector of inputs, and tkis  {+1, −1} for binary classification, tk for regression. • Output o: • Error of a perceptron on a single training example, (xk,tk)

  14. Example • S = {((0,0), -1), ((0,1), 1)} • Let w = {w0, w1, w2) = {0.1, 0.1, −0.3} What is E1? What is E2? +1 0.1 0.1 x1 o −0.3 x2

  15. How do we train a perceptron?Gradient descent in weight space From T. M. Mitchell, Machine Learning

  16. Perceptron learning algorithm • Start with random weights w= (w1, w2, ... , wn). • Do gradient descent in weight space, in order to minimize error E: • Given error E, want to modify weights w so as to take a step in direction of steepest descent.

  17. Gradient descent • We want to find w so as to minimize sum-squared error: • To minimize, take the derivative of E(w) with respect to w. • A vector derivative is called a “gradient”: E(w)

  18. Here is how we change each weight: and  is the learning rate.

  19. Error function has to be differentiable, so output function o also has to be differentiable.

  20. 1 output -1 0 activation 1 output -1 0 activation Activation functions Not differentiable Differentiable

  21. This is called the perceptron learning rule, with “true gradient descent”.

  22. Problem with true gradient descent: Search process will land in local optimum. • Common approach to this: use stochastic gradient descent: • Instead of doing weight update after all training examples have been processed, do weight update after each training example has been processed (i.e., perceptron output has been calculated). • Stochastic gradient descent approximates true gradient descent increasingly well as  1/.

  23. Training a perceptron • Start with random weights, w= (w1, w2, ... , wn). • Select training example (xk, tk). • Run the perceptron with input xkand weights w to obtain o. • Let  be the learning rate (a user-set parameter). Now, • Go to 2.

More Related