Spatial Data Analysis

Spatial Data Analysis Why Geography is important.

What is spatial analysis? • From Data to Information • beyond mapping: added value • transformations, manipulations and application of analytical methods to spatial (geographic) data • Lack of locational invariance • analyses where the outcome changes when the locations of the objects under study changes » median center, clusters, spatial autocorrelation • where matters • In an absolute sense (coordinates) • In a relative sense (spatial arrangement, distance)

Components of Spatial Analysis • Visualization • Showing interesting patterns • Exploratory Spatial Data Analysis (ESDA) • Finding interesting patterns • Spatial Modeling, Regression • Explaining interesting patterns

Implementation of Spatial Analysis • Beyond GIS • Analytical functionality not part of typical commercial GIS » Analytical extensions • Exploration requires interactive approach » Training requirements » Software requirements • Spatial modeling requires specialized statistical methods » Explicit treatment of spatial autocorrelation » Space-time is not space + time • ESDA and Spatial Econometrics

What Is Special About Spatial Data? • Location, Location, Location • “where” matters • Dependence is the rule • spatial interaction, contagion, externalities, spill-overs, copycatting • First Law of Geography (Tobler) • everything depends on everything else, but closer things more so

Spatial heterogeneity • Lack of stationarity in first-order statistics • Pertains to the spatial or regional differentiation observed in the value of a variable • Spatial drift (e.g., a trend surface) • Spatial association

Nature of Spatial Data • Spatially referenced data “georeferenced” » “attribute” data associated with location » where matters • Example: Spatial Objects • points: x, y coordinates » cities, stores, crimes, accidents • lines: arcs, from node, to node » road network, transmission lines • polygons: series of connected arcs » provinces, cities, census tracts

GIS Data Model • Discretization of geographical reality necessitated by the nature of computing devices (Goodchild) • raster (grid) vs. vector (polygon) • field view (regions, segments) vs. object view (objects in a plane) • Data model implies spatial sampling and spatial errors

3 Classes of Spatial Data • Geostatistical Data • points as sample locations (“field” data as opposed to “objects”) • Continuous variation over space • Lattice/Regional Data • polygons or points (centroids) • Discrete variation over space, observations associated with regular or irregular areal units

Point Patterns • points on a map (occurrences of events at locations in space) • Observations of a variable are made at location X • Assumption that the spatial arrangement is directly related to the interaction between units of observation

Visualization and ESDA • Objective • highlighting and detecting pattern • Visualization • mapping spatial distributions • outlier detection • smoothing rates • ESDA • dynamically linked windows • linking and brushing

Mapping patterns http://www.cdc.gov/nchs/data/gis/atmapfh.pdf

ESDA http://www.public.iastate.edu/~arcview-xgobi/

Spatial Process • Spatial Random Field • { Z(s): s ∈ D } » s ∈ Rd : generic data location (vector of coordinates) » D ⊂ Rd : index set (subset of potential locations) » Z(s) random variable at s, with realization z(s) • Examples • s are x, y coordinates of house sales, Z sales price at s • s are counties, Z is crime rate in s

Point Pattern Analysis • Objective • assessing spatial randomness • Interest in location itself • complete spatial randomness • clustering, dispersion • Distance-based statistics • nearest neighbors • number of events within given radius

Point Patterns • Spatial process • index set D is point process, s is random • Data • mapped pattern » examples: location of disease, gang shootings • Research question • interest focuses on detecting absence of spatial randomness (cluster statistics) • clustered points vs dispersed points

Geostatistical Data • Spatial Process • index set D is fixed subset of Rd (continuous) • Data • sample points from underlying continuous surface » examples: mining, air quality, house sales price • Research Question • interest focuses on modeling continuous spatial variation • spatial interpolation (kriging)

Variogram Modeling (Geostatistics) • Objective • modeling continuous variation across space • Variogram • estimating how spatial dependence varies with distance • modeling distance decay • Kriging • optimal spatial prediction

Lattice or Regional Data • Spatial process • index set D is fixed collection of countably many points in Rd • finite, discrete spatial units • Data • fixed points or discrete locations (regions) » examples: county tax rates, state unemployment • Research question • interest focuses on statistical inference • estimation, specification tests

Spatial Autocorrelation • Objective • hypothesis test on spatial randomness of attributes = value and location • Global and local autocorrelation statistics: Moran’s I, Geary’s c, G(d), LISA • Visualization of spatial autocorrelation • Moran scatterplot • LISA maps

Spatial process models • How is the spatial association generated? • Spatial autoregressive process (SAR) • Y = ρWY + ε • Spatial moving average process (SMA) • Y = (I + ρW) ε • ε – vector of independent errors • W = distance weights matrix • In SAR, correlation is fairly persistent with increasing distance, whereas with SMA is decays to zero fairly quickly.

Spatial process—the rule governing the trajectory of the system as a chain of changes in state. • Spatial pattern—the map of a single realization of the underlying spatial process (the data available for analysis). • Say you conduct a regression analysis. If the residuals do not display spatial autocorrelation, then there is no need to add “space” to the model. Examine s.a. in the residuals using Moran’s I or Geary’s c or G(d).

Perspectives on spatial process models • Finding out how the variable Y relates to its value in surrounding locations (the spatial lag) while controlling for the influence of other explanatory variables. • When the interest is in the relation between the explanatory variables X and the dependent variable, after the spatial effect has been controlled for (this is referred to as spatial filtering or spatial screening).

The expected value of the dependent variable at each location is a function not only of explanatory variables at that location, but of the explanatory variables at all other locations as well.

Spatial Data Analysis