1 / 50

Linear Classification

Linear Classification. Fall 2014 The University of Iowa Tianbao Yang. Content. K nearest neighborhood classification Basic variant I mproved variants Probabilistic Generative Model Discriminant Functions Probabilistic Discriminative Model Support Vector Machine. Content.

Télécharger la présentation

Linear Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Classification Fall 2014 The University of Iowa Tianbao Yang

  2. Content • K nearest neighborhood classification • Basic variant • Improved variants • Probabilistic Generative Model • Discriminant Functions • Probabilistic Discriminative Model • Support Vector Machine

  3. Content • K nearest neighborhood classification • Basic variant • Improved variants • Probabilistic Generative Model • Discriminant Functions • Probabilistic Discriminative Model • Support Vector Machine

  4. Classification Problems • Given input: • Predict the output (class label) • Binary classification: • Multi-class classification: • Learn a classification function: • Regression:

  5. Examples of Classification Problem • Text categorization: Politics Sport Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, … Topic:

  6. Examples of Classification Problem • Text categorization: • Input features : • Word frequency • {(campaigning, 1), (democrats, 2), (basketball, 0), …} • Class label: • ‘Politics’: • ‘Sport’: Politics Sport Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, … Topic:

  7. Examples of Classification Problem • Image Classification: • Input features X • Color histogram • {(red, 1004), (red, 23000), …} • Class label y • Y = +1: ‘bird image’ • Y = -1: ‘non-bird image’ Which images have birds, which ones do not?

  8. Examples of Classification Problem • Image Classification: • Input features • Color histogram • {(red, 1004), (blue, 23000), …} • Class label • ‘bird image’: • ‘non-bird image’: Which images have birds, which ones do not?

  9. Supervised Learning • Training examples: • Identical independent distribution (i.i.d) assumption • A critical assumption for machine learning theory

  10. Regression for Classification • It is easy to turn binary classification into a regression problem • Ignore the binary nature of class label y • How to convert multiclass classification into a regression problem? • Pros: computational efficiency • Cons: ignore the discrete nature of class label

  11. Regression for Classification

  12. Regression for Classification

  13. (k=1)

  14. (k=1) K Nearest Neighbor (k-NN) Classifier

  15. K Nearest Neighbor (k-NN) Classifier Decision boundary K = 1

  16. (k=4) (k=1) K Nearest Neighbor (k-NN) Classifier How many neighbors should we count ?

  17. K Nearest Neighbor (k-NN) Classifier

  18. Leave-One-Out Method

  19. Leave-One-Out Method

  20. (k=1) Leave-One-Out Method

  21. Leave-One-Out Method err(1) = 1

  22. k = 2 Leave-One-Out Method err(1) = 3 err(2) = 2 err(3) = 6

  23. Cross-validation • Divide training examples into two sets • A training set (80%) and a validation set (20%) • Predict the class labels for validation set by using the examples in training set • Choose the number of neighbors k that maximizes the classification accuracy

  24. Bayes Optimal Solution for Classification • expected loss for classification • consider 0-1 loss • point-wise loss • Bayes Optimal Classifier

  25. Probabilistic Interpretation of KNN • Bayes’ theorem • KNN uses Non-parametric Density Estimation • Given a data set with data points from class and

  26. Probabilistic Interpretation of KNN • Estimate • Consider a small neighbourhood containing x such that • Given the total number N data points, K data points fall inside

  27. Probabilistic Interpretation of KNN • Estimate • Consider a small neighbourhood containing x # of neighbors in class k # of points in class k

  28. Probabilistic Interpretation of KNN • Given a data set with data points from class and , we have • and correspondingly • Since , Bayes’ theorem gives

  29. Probabilistic Interpretation of KNN • Estimate conditional probability Pr(y|x) • Count of data points in class y in the neighborhood of x • Bias and variance tradeoff • A small neighborhood  large variance  unreliable estimation • A large neighborhood  large bias  inaccurate estimation

  30. Content • K nearest neighborhood classification • Basic variant • Improved variants • Probabilistic Generative Model • Discriminant Functions • Probabilistic Discriminative Model • Support Vector Machine

  31. Weighted kNN • Weight the contribution of each close neighbor based on their distances • Weight function • Prediction

  32. Kernel Density Estimation • fix V, estimate K from the data. Let R be a hypercube centred on x

  33. Kernel Density Estimation • discontinuity in • to avoid the discontinuity

  34. Kernel Density Estimation • More generally

  35. When to Consider Nearest Neighbor ? • Less than 20 attributes per example • Advantages: • Training is very fast • Learn complex target functions • Disadvantages: • Slow at query time • Easily fooled by irrelevant attributes

  36. Curse of Dimensionality • Imagine instances described by 20 attributes, but only 2 are relevant to target function • Curse of dimensionality: conflicting assumption become severe

  37. Curse of Dimensionality • Curse of dimensionality: more data points close to the boundary • uniformly distributed over a unit ball #of data points Dimensionality

  38. Curse of Dimensionality • High-dimensional problem is not rare • Bioinformatics: microarray gene expression: d: 102~104 • Computer vision: images: d: 104 – 106 • Text analysis: d: 104 - 106

  39. Dimensionality Reduction • Can we reduce the dimensionality? • it is possible

  40. Principal Component Analysis • Dimensionality Reduction by Linear Transformation

  41. Principal Component Analysis • Dimensionality Reduction by Linear Transformation Which direction we should project the data?

  42. Principal Component Analysis • The big picture when we loot at the data Variance Mean

  43. Principal Component Analysis • Mean-centered Data variance

  44. Principal Component Analysis • Projection should keep the variance as much as possible

  45. Principal Component Analysis • Let us compute the variance after projection • assume all data points are mean-centered • data after projection variance

  46. Principal Component Analysis • The best projection should maximize variance • the first projection (the first component) • The first eigen-vector of corresponding to the largest eigen-value of • What about other projections?

  47. Principal Component Analysis • Maximize the variance of the Residual for all data • should maximize the variance of the residual data residual

  48. Principal Component Analysis • The m components • the first m eigen-vectors of the Covariance matrix • variance along different components Eigen-values of Covariance Matrix

  49. Principal Component Analysis • In geometry

  50. Principal Component Analysis • In Summary (step by step) • Compute Covariance matrix • Compute first m eigen-vectors of covariance matrix as the m components for projection • Compute New Data:

More Related