Emilie Henderson, Janet Ohmann , Matthew Gregory, Heather Roberts and Harold Zald

All for one or One for All?Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather Roberts and Harold Zald August 10, 2012 Ecological Society of America Annual Meeting Portland, Oregon

Species Distribution Modeling • Been around for a long time, and has exploded over the last decade. With the rise of new powerful statistical techniques and GIS tools, the development of predictive habitat distribution models has rapidly increased in ecology. – Guisan and Zimmerman 2000 • Generalized Linear/Additive Models • Neural networks • Bayesian models • Ordination • Classification methods • Web of Knowledge: ‘species distribution’ • 2000 - 2001: 556 articles • 2011 – 2012: 1,389 articles

SDM Uses From Giusan and Thuiller 2005

Strategies for community-level modeling • ‘assemble first, predict later’ • ‘predict first, assemble later’ • ‘assemble and predict together’ --Ferrier & Guisan 2006 Objective: Compare two strategies for community-level predictive mapping.

You Are Here

# True / # Trees = 4/6 = .66 For RF Regression, predicted value for a pixel is the average of all the predictions of nodes.

Random forest -- Nearest-Neighbor imputation Imputation = Filling in missing values from existing values.

Methods: k-NN (2) Place new pixel within feature space study area (4) impute nearest neighbor’s Plot ID # to pixel (3) find nearest-neighbor plot within feature space feature space geographic space Elevation (1) Place plots within feature space Rainfall “Assemble and Predict Together”

Methods: GNN (Ohmann and Gregory 2002) (2) calculate axis scores of pixel from mapped data layers study area (4) impute nearest neighbor’s Plot ID# to pixel (3) find nearest-neighbor plot in gradient space gradient space geographic space CCA Axis 2 (e.g., Temperature, Elevation) (1) conduct gradient analysis of plot data CCA Axis 1 (e.g., Rainfall, local topography)

Methods: Random Forest Nearest Neighbor Imputation study area Random Forest space geographic space

5 3 7 1 7 2 5 4 6 2 5 3 3 7 1 9 7 4 7 6 10 2 5 7 8 8 2 3 1 5 Nearest Neighbor Plot: #3 Second Nearest Neighbor: #5

Strategies for communitiy-level modeling • ‘assemble first, predict later’ • ‘predict first, assemble later’ • Random forest – classification (binary prediction) • Random forest – regression (continuous prediction) • ‘assemble and predict together’ • Random forest – imputation (continuous prediction) --Ferrier & Giusan 2006

Dimensions of Map Accuracy • Single-species metrics • Range – presence/absence • Abundance – How much basal area? • Is the distribution of values predicted realistic? • Community-level metrics • Diversity • Composition

Fails To Predict Absences Sensitivity: True positives/(True Positives + False Negatives) Specificity: True Negatives/(True Negatives + False Positives) True Skill Statistic (TSS): Sensitivity + Specificity - 1

Cannot Predict Abundance Predictions missing Zeros Root Mean Square Difference: 17.72 18.46

Root Mean Square Difference: 21.34 18.73

Single Species Models • Range • Random Forest – Binary: best • Random Forest – Nearest Neighbor: acceptable • Random Forest -- Continuous: fail • Abundance (Basal Area) • RMSD • Random Forest – Continuous: best • Random Forest – Nearest Neighbor: acceptable • Random Forest – Binary: NA • Empirical Cumulative Distribution Functions: (predicted value distributions) • Random Forest – Nearest Neighbor: best • Random Forest – Continuous: fail • Random Forest – Binary: NA

Diversity: Species Richness and Evenness

Beta Diversity

Average Alpha Diversity for Blue Pixel: 3.04

Results – Composition What is the Bray-Curtis distance between our observed and predicted communities?

Discussion • Species absences are an important dimension of composition • Disturbance? • Succession? • Competition/Facilitation? • Dispersal limitations? • Community assembly rules can be used to help refine mapped species lists. (e.g., Guisan and Rahbek, 2011) • But… imputation avoids the pitfalls & complications of re-assembling communities after mapping because they are never taken apart.

Conclusions • Practical Considerations: • Models of individual species may be • Strongest in one dimension • Useful for understanding species’ ecology • The best option for some types of available data (e.g., presence-only data from museum specimens) • Nearest Neighbor mapping is a useful tool for building multipurpose maps. • Ranges and abundances • Composition • Diversity

Acknowledgements • Nationwide Forest Imputation Study • Landscape Ecology Modeling Mapping and Analysis team in Corvallis.

Emilie Henderson, Janet Ohmann , Matthew Gregory, Heather Roberts and Harold Zald