1 / 71

Lecture 2: Math Primer

Lecture 2: Math Primer. Machine Learning CUNY Graduate Center. Today. Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers. Classical Artificial Intelligence.

keala
Télécharger la présentation

Lecture 2: Math Primer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2: Math Primer Machine Learning CUNY Graduate Center

  2. Today • Probability and Statistics • Naïve Bayes Classification • Linear Algebra • Matrix Multiplication • Matrix Inversion • Calculus • Vector Calculus • Optimization • Lagrange Multipliers

  3. Classical Artificial Intelligence Expert Systems Theorem Provers Shakey Chess Largely characterized by determinism.

  4. Modern Artificial Intelligence Fingerprint ID Internet Search Vision – facial ID, object recognition Speech Recognition Asimo Jeopardy! Statistical modeling to generalize from data.

  5. Two Caveats about Statistical Modeling Black Swans The Long Tail

  6. Black Swans • In the 18th Century, black swans were discovered in Western Australia • Black Swans are rare, sometimes unpredictable events, that have extreme impact • Almost all statistical models underestimate the likelihood of unseen events. In the 17th Century, all known swans were white. Based on evidence, it is impossible for a swan to be anything other than white.

  7. The Long Tail • Many events follow an exponential distribution • These distributions have a very long “tail”. • I.e. A large region with significant probability mass, but low likelihood at any particular point. • Often, interesting events occur in the Long Tail, but it is difficult to accurately model behavior in this region.

  8. Boxes and Balls 2 Boxes, one red and one blue. Each contain colored balls.

  9. Boxes and Balls Suppose we randomly select a box, then randomly draw a ball from that box. The identity of the Box is a random variable, B. The identity of the ball is a random variable, L. B can take 2 values, r, or b L can take 2 values, g or o.

  10. Boxes and Balls Given some information about B and L, we want to ask questions about the likelihood of different events. What is the probability of selecting an apple? If I chose an orange ball, what is the probability that I chose from the blue box?

  11. Some basics • The probability (or likelihood) of an event is the fraction of times that the event occurs out of n trials, as n approaches infinity. • Probabilities lie in the range [0,1] • Mutually exclusive events are events that cannot simultaneously occur. • The sum of the likelihoods of all mutually exclusive events must equal 1. • If two events are independent then, p(X, Y) = p(X)p(Y) p(X|Y) = p(X)

  12. Joint Probability – P(X,Y) A Joint Probability function defines the likelihood of two (or more) events occurring. Let nij be the number of times event i and event j simultaneously occur.

  13. Generalizing the Joint Probability

  14. Marginalization Consider the probability of X irrespective of Y. The number of instances in column j is the sum of instances in each cell Therefore, we can marginalize or “sum over” Y:

  15. Conditional Probability • Consider only instances where X = xj. • The fraction of these instances where Y = yi is the conditional probability • “The probability of y given x”

  16. Relating the Joint, Conditional and Marginal

  17. Sum and Product Rules Sum Rule Product Rule In general, we’ll refer to a distribution over a random variable as p(X) and a distribution evaluated at a particular value as p(x).

  18. Bayes Rule

  19. Interpretation of Bayes Rule Posterior Prior Likelihood Prior: Information we have before observation. Posterior: The distribution of Y after observing X Likelihood: The likelihood of observing X given Y

  20. Boxes and Balls with Bayes Rule • Assuming I’m inherently more likely to select the red box (66.6%) than the blue box (33.3%). • If I selected an orange ball, what is the likelihood that I selected the red box? • The blue box?

  21. Boxes and Balls

  22. Naïve Bayes Classification This is a simple case of a simple classification approach. Here the Box is the class, and the colored ball is a feature, or the observation. We can extend this Bayesian classification approach to incorporate more independent features.

  23. Naïve Bayes Classification Some theory first.

  24. Naïve Bayes Classification Assuming independent features simplifies the math.

  25. Naïve Bayes Example Data

  26. Naïve Bayes Example Data Prior:

  27. Naïve Bayes Example Data

  28. Naïve Bayes Example Data

  29. Continuous Probabilities So far, X has been discrete where it can take one of M values. What if X is continuous? Now p(x) is a continuous probability density function. The probability that x will lie in an interval (a,b) is:

  30. Continuous probability example

  31. Properties of probability density functions Sum Rule Product Rule

  32. Expected Values Given a random variable, with a distribution p(X), what is the expected value of X?

  33. Multinomial Distribution If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution. The probability of x being in state k is μk

  34. Expected Value of a Multinomial The expected value is the mean values.

  35. Gaussian Distribution One Dimension D-Dimensions

  36. Gaussians

  37. How machine learning uses statistical modeling • Expectation • The expected value of a function is the hypothesis • Variance • The variance is the confidence in that hypothesis

  38. Variance The variance of a random variable describes how much variability around the expected value there is. Calculated as the expected squared error.

  39. Covariance The covariance of two random variables expresses how they vary together. If two variables are independent, their covariance equals zero.

  40. Linear Algebra • Vectors • A one dimensional array. • If not specified, assume x is a column vector. • Matrices • Higher dimensional array. • Typically denoted with capital letters. • n rows by m columns

  41. Transposition Transposing a matrix swaps columns and rows.

  42. Transposition Transposing a matrix swaps columns and rows.

  43. Addition • Matrices can be added to themselves iff they have the same dimensions. • A and B are both n-by-m matrices.

  44. Multiplication • To multiply two matrices, the inner dimensions must be the same. • An n-by-m matrix can be multiplied by an m-by-k matrix

  45. Inversion • The inverse of an n-by-n or squarematrix A is denoted A-1, and has the following property. • Where I is the identity matrix is an n-by-n matrix with ones along the diagonal. • Iij = 1 iffi = j, 0 otherwise

  46. Identity Matrix Matrices are invariant under multiplication by the identity matrix.

  47. Helpful matrix inversion properties

  48. Norm The norm of a vector,x, represents the euclidean length of a vector.

  49. Positive Definite-ness • Quadratic form • Scalar • Vector • Positive Definite matrix M • Positive Semi-definite

  50. Calculus Derivatives and Integrals Optimization

More Related