CHAPTER 19 Correspondence Analysis

CHAPTER 19 Correspondence Analysis Tables, Figures, and Equations From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities.MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com

Figure 19.1. A synthetic data set of eleven species with noiseless hump-shaped responses to an environmental gradient. The gradient was sampled at eleven points (sample units), numbered 1-11.

Figure 19.2. Comparison of PCA and CA of the data set shown in Figure 19.1. PCA curves the ends of the gradient in, while CA does not. The vectors indicate the correlations of the environmental gradient with the axis scores.

How it works Axes are rotated simultaneously through species space and sample space with the object of maximizing the correspondence between the two spaces. This produces two sets of linear equations, X and Y, where X = A Y and Y = A' X A is the original data matrix with n rows (henceforth sample units) and p columns (henceforth species). X contains the coordinates for the n sample units on k axes (dimensions). Y contains the coordinates for the p species on k axes. Note that both Y and X refer to the same coordinate system.

The goal is to maximize the correspondence, R, defined as: under the constraints that

Major steps  eigenanalysis approach • 1. Calculate weighting coefficients based on the reciprocals of the sample unit totals and the species totals. • v contains the sample unit weights, • w contains the species weights, • ai+is the total for sample unit i, • a+jis the total for species j.

The square roots of these weights are placed in the diagonal of the two matrices V½ and W½, which are otherwise filled with zeros. • Given n sample units and p species: • V½ has dimensions nn • W½ has dimensions pp.

2. Weight the data matrix A by V½ and W½: In other words, This is a simultaneous weighting by row and column totals. The resulting matrix B has n rows and p columns.

3. Calculate a cross-products matrix: S = B'B = V½AWA'V½. • The dimensions of S are nn. • The term on the right has dimensions: • (nn)(np)(pp)(pn)(nn) • Note that S, the cross-products matrix, is a variance-covariance matrix as in PCA except that the cross-products are weighted by the reciprocals of the square roots of the sample unit totals and the species totals.

4. Now find eigenvalues as in PCA. Each eigenvalue (latent root) is a lambda () that solves: │S - lI│ = 0 This is the “characteristic equation.” Note that it is the same as that used in PCA, except for the contents of S.

5. Also find the eigenvectors Y (pk) and X (nk) for each of k dimensions: [S - lI]x = 0 and [ W½A'VAW½ - lI]y = 0 using the same set of l in both cases. For each axis there is one l and there is one vector x. For every l there is one vector y.

6. At this point, we have found X and Y for k dimensions such that: • X = A Y • nk np pk • and • Y = A' X • pk pn nk • where • Y contains the species ordination, • A is the original data matrix, and • X contains the sample ordination.

Each component or axis can be represented as a linear combination of the original variables. Each eigenvector contains the coefficients for the equation for one axis. For eigenvector 1 (the first column of Y): Score1 xi = y1ai1 + y2ai2 + ... + ypaip for entity i

The sample unit scores are scaled by multiplying each element of the SU eigenvectors, X, by where a++ is the grand total of the matrix A. The species scores are scaled by multiplying each element of the SU eigenvectors, Y, by

Major steps  reciprocal averaging approach 1. Arbitrarily assign scores, x, to the n sample units. The scores position the sample units on an ordination axis.

2. Calculate species scores as weighted averages, where a+j is the total for species j:

3. Calculate new site scores by weighted averaging of the species scores, where ai+ is the total for sample unit i:

4. Center and standardize the site scores so that

5. Check for convergence of the solution. If the site scores are closer than a prescribed tolerance to the site scores of the preceding iteration, then stop. Otherwise, return to step 2.

Figure 19.4. One dimensional CA ordination of the same data set used in the weighted averaging example in the previous chapter (Fig. 18.1). Scores were standardized to unit variance, then multiplied by 100.

CA Figure 19.5. Comparison of 2-D CA (RA), nonmetric multidimensional scaling (NMS), and principal components analysis (PCA) of a data set with known underlying structure. The lines connect sample points along the major environmental gradient. The minor secondary gradient is nearly orthogonal to the major gradient. In the perfect ordination, the points would form a horizontally elongate grid. Inset: the ideal result is a regular 3  10 grid.

Table 19.1. Comparison of correspondence analysis CA, nonmetric multidimensional scaling (NMS), and principal components analysis (PCA) of a data set with known underlying structure.

CHAPTER 19 Correspondence Analysis

CHAPTER 19 Correspondence Analysis

Presentation Transcript

Detrended Correspondence Analysis DCA

Contingency tables and Correspondence analysis

What is multiple correspondence analysis?

Multidimensional Scaling and Correspondence Analysis

Correspondence analysis applied to microarray data

CHAPTER 20 Detrended Correspondence Analysis

Correspondence

Chapter 19 Decision Analysis

Correspondence

Correspondence

Correspondence Analysis

CORRESPONDENCE

Contingency Table and Correspondence Analysis

Multiple Correspondence Analysis

Correspondence Analysis

Correspondence Analysis

CORRESPONDENCE

Chapter 19 Analysis of Variance (ANOVA)

Contingency tables and Correspondence analysis

CHAPTER 19 Content analysis

Multiple Correspondence Analysis