Best Practices vs. Misuse of PCA in the Analysis of Climate Variability

Best Practices vs. Misuse of PCA in the Analysis of Climate Variability Bob Livezey Climate Services /Office of Services/NWS/NOAA 30th Climate Diagnostics and Prediction Workshop State College, PA, October 26, 2005

Outline • Motivation, take-home messages and references • Preprocessing considerations • S-mode example: Mathematics, characteristics, interpretation, testing, and truncation • Rotation: Benefits and truncation considerations • Conclusions

Eigenvector-BasedLinear Techniques • Dealing simultaneously with many time series: • Principal Component Analysis (PCA) – efficient representation of the information in multiple time series (time series of gridded maps); • Rotation – linear transformation of PCA and other eigenvector based methods to improve the representation; • Canonical Correlation Analysis (CCA) – one of the better ways to efficiently represent linearly the relationships between two different time series of gridded maps (say 500 mb heights and surface temperatures).

Take-Home Messages • PCA is an extremely useful linear tool for data compression, orthogonalization, and filtering • PCA results are mathematical and (for even the first mode) don’t necessarily have to have physical relevance • Even when the first mode has physical relevance its representation may be flawed (e.g. the “Arctic Oscillation”) • PCA results can be critically impacted by choices of domain, grid, scaling, etc. • Effective PC truncation requires insight and experimentation • Rotation can enhance physical relevance and reduce sampling variability • Under- and over-rotation can negate these gains • Just because an area on a map has a closed loading contour doesn’t make it part of a “dipole” or “tripole”

REFERENCES FOR BASIC PCA AND RPCA • Barnston, A. G., and R. E. Livezey, 1987: Classification, seasonality, and persistence of low frequency atmospheric circulation patterns. Mon. Wea. Rev., 115, 1083-1126. • Huth, R., 2006: The effect of various methodological options on the detection of leading modes of sea level pressure variability. Tellus, under revision. • Jolliffe, I. T., 1995: Rotation of principal components: choice of normalization constraints. J. Appl. Statistics, 22, 29-35. • Livezey, R. E., and T. M. Smith, 1999b: Considerations for use of the Barnett and Preisendorfer (1987) algorithm for canonical correlation analysis of climate variations. J. Climate, 12, 303-305. • North, G. R., T. L. Bell, and R. F. Cahalan, 1982: Sampling errors in the estimation of empirical orthogonal functions. Mon. Wea. Rev., 110, 699-706. • O'Lenic, E., and R. E. Livezey , 1988: Practical considerations in the use of rotated principal components analysis (RPCA) in diagnostic studies of upper_air height fields. Mon. Wea. Rev., 116, 1682-1689. • Richman, M. B., 1986: Rotation of principal components. J. Climatology, 6, 293-335. • Richman, M. B., and P. J. Lamb, 1985: Climatic pattern analysis of 3- and 7-day summer rainfall in the central United States: Some methodological considerations and a regionalization. J. Clim. Appl. Meteor., 24, 1325-1343.

Preparing Data 1. Preprocessing often has major impact on results and their interpretation. 2. PCA results are inherently domain dependent as I will illustrate later. 3. Standardization means each record has equal weight in variance-based multivariate analyses; ie high latitudes vs tropics, January vs. November. If this is desirable then PCA should be based on the correlation matrix, if not desirable then the covariance matrix.

Preparing Data 4. PCA should be performed on as narrow a window in the seasonal cycle as sample considerations permit to avoid mixing inhomogeneous climates (like the January vs. November example in 3 above). 5. Area averaged or gridded data often must be weighted in in multivariate analyses: Smaller areas can influence results as much as larger; On lat/lon grids density of points (and influence) increase with latitude.

Preparing Data 5. Two ways to treat the problem: Create an approximate equal area representation (ie CPC megadivisions, Barnston and Livezey, 1987, grid); Weight the data – generally proportional to the square root of the area.

Preparing Data 5 . If weights are needed and PCA on the correlation matrix is the objective, then standardization should be performed before weighting and then the covariance matrix formed. Otherwise weights are removed in the standardization step.

Preparing Data 6. In EPCA (see below), CCA, etc. maps of variables with greater numbers of data points will have disproportionate influence on the results unless the maps are weighted, ie proportionately to the square root of the ratio of the total variance in all variables to the total variance in the weighted variable (see Livezey and Smith, 1999b).

Principal Component Analysis • Used principally for data compression and filtering, often as first step to other analyses; direct physical interpretation VERY limited. • The form most commonly used in climate studies (S-mode) starts with n (t = 1,…,n) maps or groups of maps z with m data points x and the period-of-record means removed; z(x,t). • The maps are decomposed into a linear combination of map patterns; the first pattern explains the most variance, the second is orthogonal to the first and explains the second most variance, etc.

Principal Component Analysis • N=smaller(m,n), • z(x,t): Original maps, linear combinations of fixed patterns ei(x) with time-dependent weights ai(t) • ai(t): Principal component scores (time series), the projections of the maps onto the eigenvectors • ei(x): Principal component loadings (map patterns), also eigenvectors of the covariance matrix of z. • λi: Eigenvalues of the covariance matrix of z.

Principal Component Analysis 4. Example of first four patterns of 3-day precipitation for May-August over the central US (Richman and Lamb, 1985). The sequence of patterns is seen repeatedly in other analyses and can be considered an artifact of the geometry of PCA:

Principal Component Analysis • All of the patterns (the e’s) are orthogonal and the leading ones reflect the data points with the most variance. The eigenvaluesgive these variances; the first four for the Richman and Lamb patterns are 11.13%, 9.33%, 5.55%, and 4.54%. • Usually (always when the PCA is on the correlation matrix) the numbers on the maps are correlations of the original data series with the corresponding scores, thus their squares represent explained variance. Thus in the latter context: (a) a point with 0.5 is more than 6 times more important than a point with 0.2, a point with 0.8 more than 7 times more important than one with 0.3, etc.; (b) summations of the squares over the maps give the total variances listed in 5 above; (c) comparing the squared central values within closed contours allows practical discrimination between monopoles, dipoles, etc.

Principal Component Analysis 7. The time series that go with the patterns (the a’s) are uncorrelated (i.e. not collinear), so they are desirable for multiple linear regression. 8. To compress or filter the data some of the patterns must be thrown out, i.e. the series must be truncated; this is an ART (see O’Lenic and Livezey, 1988 for the best approach I know). In these applications over-truncation (throwing baby out with the bath water) is of far more concern than under-truncation (retention of some noise). As a pre-step for rotation, CCA, etc., both should be of concern (see below).

Principal Component Analysis 9. Physical interpretation of other than the leading PC pattern is usually unwarranted, and this is often the case for the first as well. Richman (1986) shows this for the example in two ways. First he splits the domain in two and does separate PCA on each. Here’s the result for the first PCA mode. Note that the first mode for the southern domain (a monopole covering the domain) is not reproduced in the full domain analysis:

Principal Component Analysis Next he computes the one-point teleconnectionpattern for the largest loading on each pattern. Here’s the result for the second PCA mode. The PCA mode is a dipole, the teleconnection pattern (reflecting the physical covariance structure around the point) a monopole:

Principal Component Analysis 10. The North et al. (1982) Test is to determine whether two consecutive patterns can be reasonably interpreted as distinct patterns or separate signals. It assumes the n samples are independent (heuristically adjust downward for dependence): 10. Other kinds of PCA: Combined (CPCA) – more than one mapped variable; Extended (EPCA) – group of maps of same variable at different lags to capture pattern evolution (MSSA is a variant); Rotated (RPCA) – to reduce sampling error and improve physical representiveness.

Rotation • Rotation, ie the linear transformation of a truncated set of patterns (Richman, 1986), should be considered in many problems when patterns with minimum sampling variability, little domain dependence, and increased physical relevance are needed. 2. Note the robustness of rotated patterns in Richman’s split domain example (all patterns are present in both analyses):

Rotation Now compare rotated mode 2 and its corresponding teleconnection pattern (both are monopoles with similar scales):

Rotation 3. Barnston and Livezey (1987) compared 120 monthly 700 mb height PCA and RPCA patterns with their corresponding one-point teleconnection patterns – the average pattern correlation was 0.69 and 0.90 respectively. They also used sensitivity tests to demonstrate dramatic reductions in sampling error.

Barnston and Livezey (1987) RPCA Patterns Pacific North America North Atlantic Oscillation (a dipole!) Western Pacific Oscillation Tropical Northern Hemisphere

Rotation 4. The most likely reason for the success of rotation is the relaxation of the geometrical and mathematical constraints on the analysis, ie the data can speak more for itself. In a commonly used variant of varimax where the eigenvectors are weighted by the square root of the eigenvalue the resulting patterns do not have to be orthogonal and the resulting time series do not have to be independent (Jolliffe, 1995).

Under- and Over-Rotation 5. Under-rotation (truncation of too many modes) can result in discarded signal while over-rotation (truncation of too few) can result in over-regionalization of signals (see Olenic and Livezey, 1988). Map (a) here is a dipole but (b)and (c) are monopoles.

Conclusions • PCA is an extremely useful linear tool for data compression, orthogonalization, and filtering • PCA results are mathematical and (for even the first mode) don’t necessarily have to have physical relevance • Even when the first mode has physical relevance its representation may be flawed (e.g. the “Arctic Oscillation”) • PCA results can be critically impacted by choices of domain, grid, scaling, etc. • Effective PC truncation requires insight and experimentation • Rotation can enhance physical relevance and reduce sampling variability • Under- and over-rotation can negate these gains • Just because an area on a map has a closed loading contour doesn’t make it part of a “dipole” or “tripole”

Best Practices vs. Misuse of PCA in the Analysis of Climate Variability