Understanding Spatial Autocorrelation in R: SAR Models and Neighborhood Definitions

R for Macroecology Spatial models

Next week • Any topics that we haven’t talked about? • Group projects

SAR Models • Augment standard OLS with an additional term to model the spatial autocorrelation • We’ll focus on error SAR models, which focuses on spatial pattern in the error part of the model OLS Y = βX + ε SARlag Y = ρWY + βX + ε SARerror Y = βX + λWu+ ε Defining the spatial weights matrix, W, is crucial

Neighborhoods in R • spdep • dnearneigh() • knearneigh() Coordinates (matrix or SpatialPoints) dnearneigh(x, d1, d2, row.names = NULL, longlat = NULL) Minimum and maximum distances (in km if longlat = T) Returns a list of vectors giving the neighbors for each point

Neighborhoods in R • spdep • dnearneigh() • knearneigh() > x = c(1,3,2,5) > y = c(3,2,4,4) > n = dnearneigh(cbind(x,y),d1 = 0,d2 = 3) > n Neighbour list object: Number of regions: 4 Number of nonzero links: 10 Percentage nonzero weights: 62.5 Average number of links: 2.5 > str(n) List of 4 $ : int [1:2] 2 3 $ : int [1:3] 1 3 4 $ : int [1:3] 1 2 4 $ : int [1:2] 2 3 - attr(*, "class")= chr "nb" - attr(*, "nbtype")= chr "distance” ...

Converting a neighborhood to weights neighbors list what to do with neighborless points nb2listw(neighbours, style="W", zero.policy=NULL) W = row standardized (rows sum to 1) B = binary (0/1) C = global standardized (all links sum to n) U = C/n S = variance stabilization (Tiefelsdorf et al. 1999)

Converting a neighborhood to weights > nb2listw(n,style = "W")$weights [[1]] [1] 0.5 0.5 [[2]] [1] 0.3333333 0.33333330.3333333 [[3]] [1] 0.3333333 0.33333330.3333333 [[4]] [1] 0.5 0.5 > nb2listw(n,style = "B")$weights [[1]] [1] 1 1 [[2]] [1] 1 1 1 [[3]] [1] 1 1 1 [[4]] [1] 1 1 > nb2listw(n,style = "C")$weights [[1]] [1] 0.4 0.4 [[2]] [1] 0.4 0.40.4 [[3]] [1] 0.4 0.40.4 [[4]] [1] 0.4 0.4> > nb2listw(n,style = "S")$weights [[1]] [1] 0.4494897 0.4494897 [[2]] [1] 0.3670068 0.3670068 0.3670068 [[3]] [1] 0.3670068 0.3670068 0.3670068 [[4]] [1] 0.4494897 0.4494897

Converting a neighborhood to weights > nb2listw(n,style = "W")$weights [[1]] [1] 0.5 0.5 [[2]] [1] 0.3333333 0.33333330.3333333 [[3]] [1] 0.3333333 0.33333330.3333333 [[4]] [1] 0.5 0.5 > nb2listw(n,style = "B")$weights [[1]] [1] 1 1 [[2]] [1] 1 1 1 [[3]] [1] 1 1 1 [[4]] [1] 1 1 > nb2listw(n,style = "C")$weights [[1]] [1] 0.4 0.4 [[2]] [1] 0.4 0.40.4 [[3]] [1] 0.4 0.40.4 [[4]] [1] 0.4 0.4 > nb2listw(n,style = "S")$weights [[1]] [1] 0.4494897 0.4494897 [[2]] [1] 0.3670068 0.3670068 0.3670068 [[3]] [1] 0.3670068 0.3670068 0.3670068 [[4]] [1] 0.4494897 0.4494897 Emphasizes weakly connected points Emphasizes strongly connected points Emphasizes strongly connected points Tries to balance

Lots of options – how to choose? • Define the neighborhood • Define the spatial weights matrix • Try things out! • Look for stability in model estimates • Look for residual autocorrelation

Defining the neighborhood - d #1. Small distance n = dnearneigh(cbind(x,y),d1 = 0, d2 = 0.1) w1 = nb2listw(n,zero.policy = T) #2. Medium distance n = dnearneigh(cbind(x,y),d1 = 0, d2 = 0.3) w2 = nb2listw(n,zero.policy = T) #2. Large distance n = dnearneigh(cbind(x,y),d1 = 0, d2 = 0.5) w3 = nb2listw(n,zero.policy = T) par(mfrow = c(1,4)) plot(x,y,axes = F,xlab = "",ylab = "") plot(w1,cbind(x,y)) plot(w2,cbind(x,y)) plot(w3,cbind(x,y))

Defining the neighborhood - K #4. 2 neighbors n = knn2nb(knearneigh(cbind(x,y),k=2,RANN = F)) w4 = nb2listw(n,zero.policy = T) #5. 4 neighbors n = knn2nb(knearneigh(cbind(x,y),k=4,RANN = F)) w5 = nb2listw(n,zero.policy = T) #6. 8 neighbors n = knn2nb(knearneigh(cbind(x,y),k=8,RANN = F)) w6 = nb2listw(n,zero.policy = T) par(mfrow = c(1,4)) plot(x,y,axes = F,xlab = "",ylab = "") plot(w4,cbind(x,y)) plot(w5,cbind(x,y)) plot(w6,cbind(x,y))

Neighborhoods on grids x = rep(1:20,20) y = rep(1:20,each = 20) plot(x,y) n = dnearneigh(cbind(x,y),d1=0,d2 = 1) w = nb2listw(n) plot(w,cbind(x,y)) n = dnearneigh(cbind(x,y),d1=0,d2 = sqrt(2)) w = nb2listw(n) plot(w,cbind(x,y)) Rook’s case Queen’s case

Data size • SAR models can take a very long time to fit • 2000 points is the maximum I have used • sample() is useful again

Fitting the SAR model • errorsarlm() just like lm() what to do with neighborless points errorsarlm(formula, listw, zero.policy=NULL) The neighborhood weights

Try it out • Build several SAR models with different W • Which one works best?

Spatial eigenvector maps • Generate new predictors that represent the spatial structure of the data • Three steps • Calculate a pairwise distance matrix • Do a principal components analysis on this matrix • Select some of these PCA axes to add to an OLS model

Spatial eigenvector maps Diniz-Filho and Bini 2005

Filter 1 Filter 2 Filter 3 Filter 4

Filter 20 Filter 10 Filter 30 Filter 40

Understanding Spatial Autocorrelation in R: SAR Models and Neighborhood Definitions