Final Exam Review

Final Exam Review CS479/679 Pattern RecognitionDr. George Bebis

Final Exam Material • Midterm Exam Material • Dimensionality Reduction • Feature Selection • Linear Discriminant Functions • Support Vector Machines • Expectation-Maximization Algorithm

Dimensionality Reduction • What is the goal of dimensionality reduction and why is it useful? • Reduce the dimensionality of the data • Eliminate redundant and irrelevant features • Less training samples, faster classification • How is dimensionality reduction performed? • Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation y = UTx x ϵ RN, U is NxK, and y ϵ RK • Or, select a subset of features (feature selection)

Dimensionality Reduction • Give two examples of linear dimensionality reduction techniques. • Principal Component Analysis (PCA) • Linear Discriminant Analysis (LDA) • What is the difference between PCA and LDA? • PCA seeks a projection that preserves as much information in the data as possible. • LDA seeks a projection that best separates the data.

Dimensionality Reduction • What is the solution found by PCA? • “Largest” eigenvectors of the covariance matrix (i.e., corresponding to the largest eigenvalues - principal components) • You need to know the steps of PCA, its geometric interpretation, and how to choose the number of principal components.

Dimensionality Reduction • You need to know how to apply PCA for face recognition and face detection. • What practical issue arises when applying PCA for face recognition? How do we deal with it? • The covariance matrix AAT is typically very large (i.e., N2xN2 for NxN images) • Consider the alternative matrix ATA which is only MxM (M is the number of training face images)

Dimensionality Reduction • What is the solution found by LDA? • Maximize the between-class scatter Sb while minimizing the within-class scatter Sw • Solution is given by the eigenvectors the following generalized eigenvalue problem:

Dimensionality Reduction • What practical issue arises when applying LDA for face recognition? How do we deal with it? • Solution can be obtained as follows: • But Sw is singular in practice due to the large dimensionality of the data; use PCA first to reduce dimensionality.

Feature Selection • What is the goal of feature selection? • Select features having high discrimination power while ignoring or paying less attention to the rest. • What are the main steps in feature selection? • Search the space of possible feature subsets. • Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).

Feature Selection • What are the main search and evaluations strategies? • What is the difference between filter and wrapper methods? • In filter methods, evaluation is independent of the classification algorithm. • In wrapper methods, evaluation depends on the classification algorithm. Search strategies: Optimal, Heuristic, Randomized Evaluation strategies: filter, wrapper

Feature Selection • You need to be familiar with: • Exhaustive and Naïve search • Sequential Forward/Backward Selection (SFS/SBS) • Plus-L Minus-R Selection • Bidirectional Search • Sequential Floating Selection (SFFS and SFBS) • Feature selection using GAs

Linear Discriminant Functions • General form of linear discriminant: • What is the form of the decision boundary? What is the meaning of w and w0? • The decision boundary is a hyperplane ; its orientation is determined by w and its location by w0.

Linear Discriminant Functions • What does g(x) measure? • Distance of x from the decision boundary (hyperplane)

Linear Discriminant Functions • How do we find w and w0? • Apply learning using a set of labeled training examples • What is the effect of each training example? • Places a constraint on the solution a2 solution space (ɑ1, ɑ2) feature space (y1, y2) a1

Linear Discriminant Functions • Iterative optimization – what is the main idea? • Minimize some error function J(α) iteratively search direction α(k) α(k+1) learning rate

Linear Discriminant Functions • Gradient descent method • Newton method • Perceptron rule

Support Vector Machines • What is the capacity of a classifier? • What is the VC dimension of a classifier? • What is structural risk minimization? • Find solutions that (1) minimize the empirical risk and (2) have low VC dimension. • It can be shown that: with probability(1-δ)

Support Vector Machines • What is the margin of separation? How is it defined? • What is the relationship between VC dimension and margin of separation? • VC dimension is minimized by maximizing the margin of separation. support vectors

Support Vector Machines • What is the criterion being optimized by SVMs? maximize margin:

Support Vector Machines • SVM solutiondepends only on the support vectors: • Soft margin classifier – tolerate “outliers”

Support Vector Machines • Non-linear SVM – what is the main idea? • Map data to a high dimensional space h

Support Vector Machines • What is the kernel trick? • Compute dot products using a kernel function K(x,y)=(x . y) d polynomial kernel:

Support Vector Machines • Important comments about SVMs • SVM is based on exact optimization (no local optima). • Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space. • Performance depends on the choice of the kernel and its parameters.

Expectation-Maximization (EM) • What is the EM algorithm? • An iterative method to perform ML estimation max p(D/ θ) • When is EM useful? • Works best for problems where the data is incompleteor can be thought as being incomplete.

Expectation-Maximization (EM) • What are the steps of the EM algorithm? • Initialization:θ0 • Expectation Step: • Maximization Step: • Test for convergence: • Convergence properties of EM ? • Solution depends on the initial estimate θ0 • No guarantee to find global maximum but stable

Expectation-Maximization (EM) • What is a mixture of Gaussians? • How are the parameters of MoGs estimated? • Using the EM algorithm • What is the main idea behind using EM for estimating the MoGs parameters? • Introduce “hidden variables:

Expectation-Maximization (EM) • Explain the EM steps for MoGs

Final Exam Review