120 likes | 228 Vues
Explore the power of principal component analysis (PCA) in reducing dimensionality and finding projections with the largest variance. Learn how to apply PCA to various datasets and discover its applications in different scenarios.
E N D
Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics
Xlisp-stat (demo) • (plot-points x y) • (scatterplot-matrix (list x y z u w)) • (spin-plot (list x y z)) • Link, remove, select, rescale • Examples : • (1) simulated data • (2) Iris data • (3) Boston Housing data
PCA(principal component analysis) • A fundamental tool for reducing dimensionality by finding projections with largest variance • (1)Data version • (2) Population version • Each has a number of variations • (3) Let’s begin with an illustration using • (pca-model (list x y z))
Find a 2-D plane in 4-D space • Generate 100 cases of u from uniform(0,1) • Generate 100 cases of v from uniform(0,1) • Define x = u + v, y= u-v, • Apply PCA-model to (x, y,u,v); demo. • It still works with small errors (e ~N(0,1)) present: • x = u + v + .01 e_1 ; y=u - v +.01e_2 • Define x = u + v^2 , y= u - v^2, z = v^2 • Apply PCA-model to (x, y, z, u); works fine • But not so well with Nonlinear manifold; try • ( pca-model (list x y u v))
Other examples • 1-D from 2-D • rings • Ying and Yang
Data version • 1. Construct the sample variance-covariance matrix • 2. Find the eigenvectors • 3. Projection : use each eigenvector to form a linear combination of original variables • 4. The larger, the better : the k-th principal component is the projection with the k-th largest eigenvalue
Data Version(alternative view) • 1-D data matrix : rank 1 • 2-D data matrix :rank 2 • K-D data matrix : rank k • Eigenvectors for 1-D sample covariance matrix: rank 1 • Eigenvectors for 2-D sample covariance matrix: rank 2 • Eigenvectors for k-D sample matrix • Adding i.i.d. noise • Connection with automatic basis curve finding (to be discussed later)
Population version • Let the sample size tend to the infinity • Sample covariance-matrix converges to a matrix which is the population covariance-matrix (due to law of large number) • The rest of steps remain the same • We shall use the population version for theoretical discussion
Some Basic facts • Variance of linear combination of random variables • var(a x + b y)= a^2 var(x) + b^2 var(y) + 2 a b cov(x,y) • Easier if using matrix representation : • (B.1) var ( m’ X)= m’ Cov(X) m • here m is a p-vector, X consists of p random variables (x_1, …,x_p)’ • From (B.1), it follows that
Basic facts (Cont.) • Maximizing var(m’x) subject to ||m||=1 is the same as Max m’cov(X)m subject to ||m||=1 • (here ||m|| denotes the length of the vector m) • Eigenvalue decomposition : • (B.2) M vi = i vi, where • 1 ≥ 2 ≥ …. ≥ p • Basic linear algebra tells us that the first eigenvector will do : • Solution of max m’ M m subject to ||m||=1 must satisfy M m= 1 m
Basic facts(cont.) • Covariance matrix is degenerated (I.e, some eigenvalues are zero) if data are confined to a lower dimensional space S • Rank of covariance matrix = number of non-zero eigenvalues = dim. of the space S • This explain why pca works for our first example • Why small errors can be tolerated ? • Large i.i.d. errors are fine too • Heterogeneity is harmful, correlated errors too
Further discussion • No guarantee of finding nonlinear structure like clusters , curves, etc. • In fact, sampling properties for pca are mostly developed for normal data • Still useful • Scaling problem • Projection pursuit: guided; random