180 likes | 302 Vues
This document explores the fundamentals of Kernel PCA and the Perceptron algorithm, focusing on their roles in machine learning, particularly with high-dimensional data. It discusses the dual representation, kernel trick, and challenges like overfitting in high-dimensional spaces. The document covers essential concepts such as dimensionality reduction, eigenvectors, and the significance of choosing the right kernel function. Additionally, it highlights the differences between PCA and LDA, employing practical examples like face recognition to illustrate their applications.
E N D
Is PCA enough? Irena Váňová
. . . A1 A2 A3 Dot product B
for i=1...N if then end end * * * * * o o * * o * o * o o Perceptron algorithm … labels of classes repeat until no sample is misclassified
Find the coefficients is equivalent to find Gram matrix Rewitten algorithm – dual form repeat for i=1...N if then end end until no sample is misclassified • In the dual representation, the data points only appear inside dot products • Many algorithms have dual form
Mapping to higher dimensions • Perceptron works for linear separable problems • There is a computational problem (very large vectors) • Kernel trick:
Example of kernels • Polynomial kernels • Gaussian kernels • Infinit dimensions • Separated by a hyperplane • Good kernel? • Bad kernel! • Almost diagonal
repeat for i=1...N if then end end until no sample ismisclassified Kernel Perceptron • We precompute • We are in implicitly in higher dimensions (too high?) • Generalization problem - easy to overfit in high dimensional spaces
Kernel trick • Kernel function • Use: replacing dot products with kernels • Implicit mapping to feature space • Solve the computational problem • Can make it possible to use infinite dimensions • Conditons: continuous, symmetric, positive definite • Information ‘bottleneck’: contains all necessaryinformation for the learning algorithm • Fuses information about the data AND the kernel
PCA • Orthogonal linear transformation • The greatest variance = first coordinate, … • Rotation around mean value • Dimensionality reduction • many dimensions = high correlation
Singular value decomposition • W,T – unitary matrix ( ) • Columns of W,V? • Basis vector, eigenvectors of XTX, resp. XXT n n m n m m n
PCA • Data with zero mean, SVD n m n m m m covariance matrix (1/n) eigenvectors
Kernel PCA • Projections of data onto few larger eigenvectors equation for PCA kernel function equation for high-dim. PCA We don’t know eigenvector explicitly - only vector of numbers which identify the vector projection onto k-th eigenvector
If something does wrong PCA is blind
LDA • fundamental assumption: normal distribution • First: same covariance matrix , full rank
LDA • fundamental assumption: normal distribution • First: only full rank • Kernel variant
Example LDA • Face recognition – eigenfaces