Is PCA enough ?

Is PCA enough? Irena Váňová

. . . A1 A2 A3 Dot product B

for i=1...N if then end end * * * * * o o * * o * o * o o Perceptron algorithm … labels of classes repeat until no sample is misclassified

Find the coefficients is equivalent to find Gram matrix Rewitten algorithm – dual form repeat for i=1...N if then end end until no sample is misclassified • In the dual representation, the data points only appear inside dot products • Many algorithms have dual form

Mapping to higher dimensions • Perceptron works for linear separable problems • There is a computational problem (very large vectors) • Kernel trick:

Example of kernels • Polynomial kernels • Gaussian kernels • Infinit dimensions • Separated by a hyperplane • Good kernel? • Bad kernel! • Almost diagonal

repeat for i=1...N if then end end until no sample ismisclassified Kernel Perceptron • We precompute • We are in implicitly in higher dimensions (too high?) • Generalization problem - easy to overfit in high dimensional spaces

Kernel trick • Kernel function • Use: replacing dot products with kernels • Implicit mapping to feature space • Solve the computational problem • Can make it possible to use infinite dimensions • Conditons: continuous, symmetric, positive deﬁnite • Information ‘bottleneck’: contains all necessaryinformation for the learning algorithm • Fuses information about the data AND the kernel

PCA • Orthogonal linear transformation • The greatest variance = first coordinate, … • Rotation around mean value • Dimensionality reduction • many dimensions = high correlation

Singular value decomposition • W,T – unitary matrix ( ) • Columns of W,V? • Basis vector, eigenvectors of XTX, resp. XXT n n m n m m n

PCA • Data with zero mean, SVD n m n m m m covariance matrix (1/n) eigenvectors

Kernel PCA • Projections of data onto few larger eigenvectors equation for PCA kernel function equation for high-dim. PCA We don’t know eigenvector explicitly - only vector of numbers which identify the vector projection onto k-th eigenvector

KPCA example

If something does wrong PCA is blind

LDA • fundamental assumption: normal distribution • First: same covariance matrix , full rank

LDA • fundamental assumption: normal distribution • First: only full rank • Kernel variant

Example LDA • Face recognition – eigenfaces

LDA versus PCA

Is PCA enough ?

Is PCA enough ?

Presentation Transcript