Principal Component Analysis

Principal Component Analysis Tanya and Caroline

Overview • Basic function is to condense data • PCA is used when several underlying factors shape the data • Differences in geology between two areas • Unlike Bray-Curtis ordination, PCA is objective • It finds the most useful angle from which to view the shape of the pattern the data points make

PCA is NOT… • Factor Analysis or Principal Coordinates Analysis (PCO) • A test of significance • No null hypothesis is required • Prior to ordination – no way to objectively decide which variables to include • After analysis – no way to decide which variables were unimportant • Cannot cope with missing values

2-D vs. multi-D • Make a scatter plot of all data points • As the number of variables increases, data space becomes harder to visualize This is where PCA comes in!

PCA • Simplifies data by reducing dimensions of data space • Finds the most informative viewpoint from which to visualize the data from a scatter plot • Produces low-dimensional images of high dimensional shapes • Shows amount of variance between axes

Find first principal axis which always passes through the overall mean of the dataset • Find second ordination axis which must be orthogonal or 90° to first axis • Each successive axis explains less variance than its predecessors and is assumed to be less important • First principal axis accounts for greatest possible percentage of overall variance and second principal axis accounts for remaining variance

Example

Mechanics of PCA • Normalizing data • Generating Principal Axes • Loadings→ Eigenvalues + Eigenvectors → Correlation matrix Eigenvalues– rate of growth per multiplication Eigenvector– pattern formed • Interpretation of eigenvalues- gives the importance of each ordination axis and the largest eigenvalue indicates the first principal axis, etc. • Eigenvalues and eigenvectors summarize underlying structure of a matrix • Deriving axis scores- take the Normalized Data X First Eigenvector to get first principal axis then the same for second eigenvector

Normalization

Example • 2 sites- site 1 is a Heath and site 2 is a Mound • PCA only for the data for the 8 plant species (vegetation)

The larger the variance, the greater the amount of info that has been condensed into the ordination axis % Variance

Kaiser-Guttman vs. Broken Stick

Let’s GRAPH!

Homework • What’s the purpose of PCA and what 3 things does it give us? • Define eigenvalue and eigenvector. • Interpret the Figure 6.10 on page 111, what does the First Principal Axis show and what does the Second Principal Axis show???

Principal Component Analysis