1 / 38

Classification and Linear Classifiers

In the Name of God. Machine Learning. Classification and Linear Classifiers. Mohammad Ali Keyvanrad. Thanks to: M . Soleymani (Sharif University of Technology ) R. Zemel (University of Toronto ) p. Smyth (University of California, Irvine). Fall 1392. Outline. Classification

Télécharger la présentation

Classification and Linear Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In the Name of God Machine Learning Classification and Linear Classifiers Mohammad Ali Keyvanrad Thanks to: M. Soleymani (Sharif University of Technology) R. Zemel (University of Toronto) p. Smyth (University of California, Irvine) Fall 1392

  2. Outline • Classification • Linear classifiers • Perceptron • Multi-class classification • Generative approach • Naïve Bayes classifier

  3. Classification: Oranges and Lemons

  4. Classification: Oranges and Lemons

  5. Classification problem • Given: Training set • labeled set of input-output pairs • Goal: Given an input , assign it to one of classes • Examples: • Spam filter • Handwritten digit recognition

  6. Linear classifiers • Linear classifiers: • Decision boundaries are linear functions • dimensional hyper-plane within the dimensional input space. • Examples • Perceptron • Support vector machine • Decision Tree • KNN • Naive Bayes classifier • Linear Discriminant Analysis (or Fisher's linear discriminant)

  7. Linear classifiers • Linearly separable • Data points can be exactly classified by a linear decision surface. • Binary classification • Target variable

  8. Decision boundary • Discriminant function : • : bias • if then else • Decision boundary: • The sign of predicts binary class labels

  9. Linear Decision boundary (Perceptron)

  10. Linear Decision boundary (Decision Tree) t2 Income t3 t1

  11. Linear Decision boundary (K Nearest Neighbor) O x Feature 2 O x x O Feature 1

  12. Non-Linear Decision boundary Decision Boundary Decision Region 1 Decision Region 2

  13. Decision boundary • Linear classifier

  14. Non-linear decision boundary • Choose non-linear features • Classifier still linear in parameters 𝒘

  15. Linear boundary: geometry • In this Slide:

  16. SSE cost function for classification • SSE cost function is not suitable for classification • Sum of Squared Errors loss penalizes “too correct” predictions • SSE also lack robustness to noise

  17. SSE cost function for classification • Is it more suitable if we set ?

  18. Perceptron algorithm • Linear classifier • Two-class: for , for • Goal

  19. Perceptron criterion • Misclassification

  20. Batch gradient for descentPerceptron • “Gradient Descent” to solve the optimization problem • Batch Perceptron converges in finite number of steps for linearly separable data

  21. Stochastic gradient descent for Perceptron • Single-sample perceptron • If is misclassified • Perceptron convergence theorem (for linearly separable data) • If training data are linearly separable, the single-sample perceptron is also guaranteed to find a solution in a finite number of steps.

  22. Convergence of Perceptron • Change in a direction that corrects the error

  23. Convergence of Perceptron

  24. Multi-class classification • Solutions to multi-category problems • Converting the problem to a set of two-class problems • “one versus rest” or “one against all” • For each class , a linear discriminant function that separates samples of from all the other samples is found. • one versus one • linear discriminant functions are used, one to separate samples of a pair of classes.

  25. Multi-class classification • One-vs-all (one-vs-rest)

  26. Multi-class classification • One-vs-one

  27. Multi-class classification: ambiguity • Converting the multi-class problem to a set of two-class problems can lead to regions in which the classification is undefined

  28. Probabilistic approach • Bayes’ theorem

  29. Bayes’ theorem

  30. Bayes decision theory • Bayes decision: Choose the class with highest

  31. Probabilistic classifiers • Probabilistic classification approaches can be divided in two main categories • Generative • Discriminative

  32. Discriminative vs. generative approach

  33. Generative approach • Learning stage • Determine for each class individually • Determine • Use the Bayes theorem to find • Decision stage • After learning the model , make optimal class assignment for new input • if then decide

  34. Discriminative approach • Learning stage • Determine the posterior class probabilities directly • Decision stage • After learning the model (inference stage), make optimal class assignment for new input • if then decide

  35. Naïve Bayes classifier • Conditional independence assumption: • For each class , it finds univariatedistributions instead of finding one multi-variate distribution

  36. Naïve Bayes classifier • It first estimates the class conditional densities and the prior probability for each class based on the training set. • In the decision phase, it finds the label of according to:

  37. Naïve Bayes: discrete example

More Related