1 / 20

Final Exam Review

Final Exam Review. CS479/679 Pattern Recognition Dr. George Bebis. Final Exam May 13, 2019 - (12:10pm – 2:10pm). Comprehensive Everything covered before Midterm exam Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm.

mpiland
Télécharger la présentation

Final Exam Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final Exam Review CS479/679 Pattern RecognitionDr. George Bebis

  2. Final ExamMay 13, 2019 - (12:10pm – 2:10pm) • Comprehensive • Everything covered before Midterm exam • Linear Discriminant Functions • Support Vector Machines • Expectation-Maximization Algorithm Case studies are included in the final exam (i.e., focus on main ideas/results, not details) • No need to memorize complicated/long equations (e.g., decision boundary equations for Gaussian distributions, Chernoff bound, etc.), they will be provided to you during the exam if needed. • Simpler/Shorter equations (e.g., Bayes rule, ML/BE equations, PCA/LDA equations, linear discriminant, gradient descent etc.)should be memorized.

  3. Linear Discriminant Functions • General form of linear discriminant: • What is the form of the decision boundary? • The decision boundary is a hyperplane • What is the meaning of w and w0? • The orientation and location of the hyperplane are determined by w and w0 correspondingly.

  4. Linear Discriminant Functions • What is the geometric interpretation of g(x)? • Distance of x from the decision boundary (hyperplane) – know how to prove it.

  5. Linear Discriminant Functions • How do we estimate w and w0? • Apply learning using a set of labeled training examples • What is the effect of each training example? • Places a constraint on the solution a2 paraneter space (ɑ1, ɑ2) feature space (y1, y2) a1

  6. How do we “learn” the parameters? • Iterative optimization – what is the main idea? • Minimize some error function J(α) iteratively: • How are the parameters updated? search direction learning rate

  7. Methods • Gradient descent – search direction? • Newton – search direction? • Perceptron rule – error function?

  8. Methods (cont’d) • Gradient descent • Effect of parameter initialization • Effect of learning rate • Newton • Computational requirements • Convergence • Perceptron rule • Batch vs single-sample Perceptron • Perceptron Convergence Theorem

  9. Support Vector Machines • What is the capacity of a classifier? • How is the VC dimension related to the capacity? • What is structural risk minimization? • Find solutions that: (1) minimize the empirical risk and (2) have low VC dimension

  10. Support Vector Machines • What is the margin of separation? How is it defined? • What is the relationship between VC dimension and margin of separation? • VC dimension is minimized by maximizing the margin of separation. support vectors

  11. Support Vector Machines • SVM optimization problem: What is the role of these terms?

  12. Support Vector Machines (cont’d) • SVM solution: • Are all λκnon-zero? • Soft margin classifier – tolerate “outliers” • What is the effect of “c” on the solution?

  13. Support Vector Machines (cont’d) • Non-linear SVM – what is the main idea? • Map data to a high dimensional space h • Use a linear classifier in the new space • Computational complexities?

  14. Support Vector Machines (cont’d) • What is the kernel trick? • Compute dot products using a kernel function K(x,y)=(x . y) d e.g., polynomial kernel:

  15. Support Vector Machines • SVM is based on exact optimization (i.e., no local optima). • Complexity depends on the number of support vectors, not on the dimensionality of the transformed space. • Performance depends on the choice of the kernel and its parameters.

  16. Expectation-Maximization (EM) • What is the EM algorithm? • An iterative method to perform ML estimation i.e., max p(D/ θ) • When is EM useful? • Most useful for problems where the data is incompleteor can be thought as being incomplete.

  17. Expectation-Maximization (EM) • What are the steps of the EM algorithm? • Initialization:θ0 • Expectation Step: • Maximization Step: • Test for convergence: • Convergence properties of EM ? • Solution depends on the initial estimate θ0 • No guarantee to find global maximum but stable (i.e., no oscillations)

  18. Expectation-Maximization (EM) • What is a Mixture of Gaussians (MoG)? • How are the MoG parameters estimated? • Introduce “hidden” variables • Use EM algorithm to estimate E[zi]

  19. Expectation-Maximization (EM) • Can you interpret the EM steps for MoGs?

  20. Expectation-Maximization (EM) • Can you interpret the EM steps for MoGs?

More Related