INFERRING PAST ENVIRONMENTS FROM BIOLOGICAL DATA - PROGRESS, PROBLEMS, AND PITFALLS

INFERRING PAST ENVIRONMENTS FROM BIOLOGICAL DATA - PROGRESS, PROBLEMS, AND PITFALLS H.J.B. Birks University of Bergen & University College London

Fossil biological data Environmental variable (e.g., pollen, chironomids) (e.g., temperature) 'Proxy data' 1, ........... m species 1 YO XOUnknown. t t samples samples To be estimated or reconstructed To solve for XO, need modern data or 'training data' or 'calibration set' 1, ........... m species 1 Y X n n samples samples Modern biology Modern environment (e.g., pollen, chironomids) (e.g., temperature) BASIC IDEA OF BIOINDICATION OR ENVIRONMENTAL RECONSTRUCTION

BASIC BIOLOGICAL ASSUMPTIONS Marine planktonic foraminifera - Imbrie & Kipp 1971 Foraminifera are a function of sea-surface temperature  Foraminifera can be used to reconstruct past sea-surface temperature Pollen Pollen is a function of vegetation Vegetation is a function of climate  Pollen is an indirect function of climate and can be used to reconstruct past climate Chironomids (aquatic non-biting midges) Chironomids are a function of lake-water temperature Lake-water temperature is a function of climate  Chironomids are an indirect function of climate and can be used to reconstruct past climate Freshwater diatoms (microscopic algae) Diatoms are a function of lake-water chemistry  Diatoms can be used to reconstruct past lake-water chemistry Lake-water chemistry may be some weak function of climate  Diatoms may be a weak function of climate

BIOLOGICAL 'PROXY' DATA PROPERTIES May have 200-300 species, expressed as proportions or percentages in 200-500 samples Multicollinearity Biological data contain many zero values (absences) Species invariably show non-linear unimodal responses to their environment, not simple linear responses

Betula (birch) Alnus (alder) Quercus (oak) Pinus (pine) Empetrum nigrum (crowberry) Agropyron repens (Gramineae) (grass) 'PROXIES' Pollen - good indicators of vegetation and hence indirect indicators of climate. Modern pollen, identical treatment, all at same magnification, all stained with safranin

Chironomids - good indicators of past lake-water temperatures and hence past climate Common late-glacial chironomid taxa. A: Tanytarsina; b: Sergentia; c: Heterotrissocladius; d: Hydrobaenus/Oliveridia; e: Chironomus; f: Dicrotendipes; g: Microtendipes; h: Polypedilum; i: Cladopelma. Scale bar represents 50 m.

Freshwater diatoms - excellent indicators of lake-water chemistry (e.g. pH, total P). Not reliable climate indicators.

(1) Y = f(X) + error Biology Environment INVERSE APPROACH In practice, for various mathematical reasons, do an inverse regression or calibration BASIC NUMERICAL MODELS CLASSICAL APPROACH (2) Estimate f by some mathematical procedure and 'invert' our esti-mated (f) to find unknown past environment X0 from fossil data Y0 XO f-1(YO) (3) X = g(Y) + error (4) XO = g(YO) Obtain 'plug-in' estimate of past environment XO from fossil data YO f or g are 'transfer functions'

PC1 PC2 Y X PC3 'INVERSE' PROCEDURES • Principal components regression. Imbrie & Kipp (1971) Multiple linear regression or polynomial regression of X on PC1, PC2, PC3, etc. PCA components maximise variance within Y Selection of components done visually until very recently. Now cross-validation is used to select model with fewest components and low RMSEP and maximum bias.

2. Two-way weighted averaging. ter Braak & van Dam (1989) and Birks et al. (1990) (i) Estimate species optima (u) by weighted averaging of the environmental variable (x) of the sites. Species abundant at a site will tend to have their ecological optima close to the environmental variable at that site. (WA regression). (ii) Estimate the environmental values (x) at the sites by weighted averaging of the species optima (u). (WA calibration.) (iii) Because averages are taken twice, the range of estimated x-values is shrunken, and a simple 'inverse' or 'classical' deshrinking is required. Usually regress x on the preliminary estimates (x) and take the fitted values as final estimates of x. Can downweight species in step (ii) by their estimated WA tolerances (niche breadths) so that species with wide tolerances have less weight than species with narrow tolerances ^ ^ ^ ^

PLS1 PLS2 Y X PLS3 3. Weighted averaging partial least squares regression (WA-PLS). ter Braak & Juggins (1983) and ter Braak et al. (1993) Components selected to maximise covariance between species weighted averages and environmental variable x Selection of number of PLS components to include based on cross-validation. Model selected should have fewest components possible and low RMSEP and maximum bias.

4. Modern analog technique (MAT) = k-nearest neighbours (k-NN). Hutson (1980), Prell (1985), ter Braak (1995), et al. Compare fossil sample t with modern sample i Repeat for all modern samples Calculate DC between t and i Repeat for all fossil samples Select k-closest analogues for fossil sample t Value of k estimated by visual inspection, arbitrary rules (e.g., 10, 20, etc.), or cross-validation Estimate past environment for sample t as (weighted) mean of the environment of the k analogues

USE OF METHODS In comparisons using simulated and real data, WA and WA-PLS usually outperform PCR and MAT but not always. Classical methods of Gaussian logit or multinomial logit regression and calibration rarely used (freshwater, terrestrial). Some applications of artificial neural networks and few studies within a Bayesian framework. Bayesian framework may be an important future research direction.

HIDDEN BASIC ASSUMPTIONS 1. Species in training set (Y) are systematically related to the physical environment (X) in which they live. 2. Environmental variable (XO , e.g. summer temperature) to be reconstructed is, or is linearly related to, an ecologically important variable in the system. 3. Species in the training set (Y) are the same as in the fossil data (YO) and their ecological responses (Gm) have not changed significantly over the timespan represented by the fossil assemblage. 4. Mathematical methods used in regression and calibration adequately model the non-linear biological responses (Gm) to the environmental variable (X). 5. Other environmental variables than, say, summer temperature have negligible influence, or their joint distribution with summer temperature in the past is the same as in the modern training set.

MODEL PERFORMANCE AND SELECTION • Root mean square error of prediction (RMSEP) as low as possible. • Maximum bias as low as possible. • Smallest number of components to avoid 'overfitting'. Based on leave-one-out cross-validation, n-fold cross-validation, or boot-strapping. Very rare to have an independent test set.

But when done, sometimes the model that gives the closest cor-respondence is not the model with lowest RMSEP or maxi-mum bias! Conflict between model performance and selection based on cross-validation and validation results using independent historical test-sets. MODEL VALIDATION Compare reconstructed values with historical data. Rarely possible as few historical data exist. Renberg & Hultberg (1992)

AN EXAMPLE OF RECONSTRUCTING PAST CLIMATE FROM POLLEN DATA 304 modern pollen samples Norway, northern Sweden, Finland (Sylvia Peglar, Heikki Seppä, John Birks, Arvid Odland) Seppä & Birks (2001)

Performance statistics - WA-PLS - leave-one-out cross-validation Seppä & Birks (2001)

Seppä & Birks (2001) Summary pollen diagram from Tsuolbmajavri, northern Finland. The age scale in modelled calibrated years BP is shown along with four phases. The total pollen- and spore-accumulation rate (grains cm-2 yr-1) is also shown. The hollow silhouette curves denote the 10 x exaggeration of the percentages.

RECONSTRUCTIONS Seppä & Birks (2001)

Isotopes Inferred from pollen Inferred from pollen Theory RECONSTRUCTION VALIDATION Tibetanus, Abisko Valley, Sweden Hammarlund et al. (2002)

BROAD-SCALE PATTERNS Changes in July summer temperature relative to present-day reconstructed temperature on a south-north transect west of the Scandes mountains. 16 sites covering all or much of the Holocene. South

North Anne Bjune et al.

FINE-RESOLUTION CHANGES

Inferred mean July air temperature Oxygen isotope ratios in Greenland ice-core Brooks & Birks, (2000)

where xi,boot is the mean of xi,boot for all cycles when i is in the test set. RMSEP = (s1 + s2)½ (s1 usually ca. 25% of RMSEP, s2ca. 75%) STATISTICAL AND BIOLOGICAL PROBLEMS AND PITFALLS • Sample specific errors of reconstruction for fossil samples. • Estimate by boot-strapping. • Mean square error of prediction (MSEP) = • Error due to variability in Error due to variation in species • estimates of species para- + abundances at a given environ- • meters in the training set mental value (actual prediction • (s.e. of boot-strap estimates) error between observed and mean boot-strap estimate) s1 s2

For temperature RMSEP usually 1-1.5ºC (about 10% of the modern range sampled) pH RMSEP usually 0.3-0.5 pH units (about 10%) Components of RMSEP (i) Within-lake variability - Heiri et al. (2002) Maximum of 15% of total RMSEP. (ii) Variability in modern environmental data - Nilsson et al. (1996). Can be 30-40% (even 70%) of total RMSEP. Major problem. Cannot take account of natural variability of environmental data. (iii) Variance in the model (model error or lack of fit).

What to do with sample specific errors? There is a consistent temporal trend but also continuous overlap in RMSEP!

2. How do we identify signal from noise in reconstructions? LOESS smoothers are a help. Seppä & Birks (2002)

Trends or RMSEP? Brooks & Birks (2001)

3. Different methods, although they have similar modern model performances, can give very different reconstruction results. Birks (2003)

4. Some indication of consistent model bias when applied to fossil data. MAT - low variability, insensitive WA - some variability, overestimates at low values WA-PLS - more variability PCR - considerable variability but in terms of modern model performance, all seem good in terms of RMSEP and maximum bias. Extensive experiments using simulated independent test data-sets currently underway by Richard Telford are showing important model differences and biases.

5. Biological data, when sampled over natural environmental gradients, show a mixture of symmetric unimodal (40%) and monotonic responses (40%) and some skewed unimodal responses (5%) and no statistically significant responses (ca. 15%), great variation in species tolerances or niche breadths, and a compositional turnover gradient of 3-4 standard deviations. Perhaps too many monotonic responses to feel comfortable with a unimodal-based model like WA or WA-PLS but too many unimodal responses for linear-based models, like PLS or PCR. Classical approach based on Gaussian or multinomial logit regression and calibration (tried but dropped because of computa-tional limitations in the 1990s) should be re-investigated, possibly within a Bayesian framework (e.g. Toivenen et al. (2001); Korhola et al. (2002)) but incorporating a priori ecological information about the species concerned (depth preferences, lake-chemical preferences, sediment preferences) as priors or conditionals.

6. Incorporation of species tolerances (niche widths) into WA-PLS is needed so that species with narrow tolerances ('good' indicators) have greater weight in the model. 7. Use non-linear deshrinking equations (e.g. smoothing spline) in WA or WA-PLS because the pattern of initially estimated x in relation to observed x is often non-linear, especially at the gradient ends ('edge effects'). 8. Some species may show great dominance and abundance in some ecological settings ('weeds') but then occur with lower abundance in other settings. Great dominance can bias estimates of species parameters, not only of the few dominants, but also of the other species because of the percentage compositional constraint.

9. Do we really need all 200-300 species in a calibration set? Would a model based on only those species that are necessary for the model to perform well be more robust as it is not so 'overfitted' as a model based on 200-300 species? ANN with a backward-elimination pruning algorithm, Racca et al. (2003) SWAP diatom-pH data-set 167 samples 267 species 18.5% +ve data entries pH 4.3-7.3 Species N2 1-120.9 Sample N2 5.1-57.2 Could eliminate 85% of species with little change in model performance RMSEP maximum bias All 267 species 0.32 -0.44 37 species remaining 0.33 -0.46

Use difference between RMSE(apparent) and RMSEP(jack-knife) as a guide to possible model 'overfitting'. Racca et al. (2003)

In general we have many species and few lakes in our modern calibration sets. 'Curse of dimensionality' and hence model overfitting. Ideally ratio of species number to lake number should be as close to 1 as possible to minimise 'curse of dimensionality'. Racca et al. (2003) How to find minimal set of 'driving' species in WA or WA-PLS?

10. Covarying environmental variables e.g. temperature and lake trophic status (e.g. total N or P) or temperature and lake depth Brodersen & Anderson (2002)

pH and climate Anderson (2000)

Validate using another proxy - macrofossils of tree birch Importance of independent validation 11. Use of different proxies - different proxies may give different reconstruction, e.g. mean July temperature at Bjørnfjell, northern Norway.

12. One large modern calibration data-set or several regional data-sets? Merging data-sets increases the floristic diversity and environmental range of the resulting transfer function but can introduce further noise due to secondary environmental gradients. Dynamic or local calibration data-set. Use MAT to find 10-20 closest modern analogues for each fossil sample in a core, and use these selected samples as a local calibration data-set for that site. Current evidence suggests a modest improvement only in RMSEP and maximum bias of about 2-5%.

13. Hidden assumption number 5. 'Other environmental variables than, say, summer temperature have neglible influence or their joint distribution with summer temperature in the past is the same as the training set.' Climate model and glaciological results suggest that the joint distribution between summer temperature and winter accumulation has not been the same in the past 11,000 years. Good evidence to suggest that lake-water pH has decreased naturally (soil deterioration) whilst summer temperature rose and then fell in the last 11,000 years. In Norway today, lake-water pH is negatively correlated with summer temperature because lakes of pH 6-7.5 are on basic rock and this happens in Norway to occur mainly at high altitudes and hence at low temperatures. In the past after deglaciation, almost all lakes had a higher pH than today, so the pH-temperature relationship in the past was different than today.

PROJECTS THAT HAVE STIMULATED OUR TRANSFER FUNCTION WORK SWAP Surface Water Acidification Project 1987-1990 NORPAST 1998-2002 NORPAST-2 2003- 2000-2004 1995-2000 NFR KILO 1993-1996 NFR SETESDAL 1996-1999 EU CHILL 1998-2001

PERSON WHO HAS STIMULATED OUR TRANSFER FUNCTION WORK • Major attributes of Cajo: • Wonderful person and loyal friend 2. Exceptional scientist with over 7400 citations 3. Revolutionised numerical ecology and quantitative palaeoecology with his creative ideas, remarkable powers of synthesis, and genius at working at the interface between practical ecology and statistical theory. Thank you Cajo, for all your contributions in the last 20 years. Cajo ter Braak, April 1992

INFERRING PAST ENVIRONMENTS FROM BIOLOGICAL DATA - PROGRESS, PROBLEMS, AND PITFALLS