370 likes | 490 Vues
This presentation by Dr. Ikuho Yamada discusses the importance of detecting hot spots in network spaces, where the occurrence level of spatial phenomena exceeds expectations. It highlights clusters' role in understanding factors influencing various events, from vehicle crashes to retail locations. The analysis considers network constraints and utilizes techniques like local indicators of network-constrained clusters (LINCS) for accurate detection. The talk focuses on case studies, particularly with vehicle crash data in Buffalo, NY, and emphasizes decision-making implications for regional planning and policy development.
E N D
Hot Spot Detection in a Network Space: Geocomputational Approaches Ikuho Yamada, Ph.D. Department of Geography & School of Informatics IUPUI October 3, 2005 Fall 2005 Talk Series on Network and Complex Systems
Introduction • Clusters in a spatial phenomenon = hot spots, where occurrence or level of the phenomenon is higher than expected. • Detecting hot spots is useful for • Understanding of the nature of the phenomenon itself: • Factors influencing the phenomenon; • Decision making in related policies/planning: • Remedial/preventive actions; • Regional development planning; • New facility design, etc…
Introduction (cont.) • Potential problem: • Spatial distribution of the phenomenon may be affected by a transportation network; • E.g., vehicle crashes, retail facilities, crime locations, … • Analytical results derived w/o considering the network’s influence will be misleading, especially for • Detailed micro-scale data, and local scale analysis. Analysis based on a network space, rather than the Euclidean space. No!! Cluster?
Stage2: Identifying influencing factors Data Stage 1: Detecting local clusters Classifier to determine cluster or not (e.g., Decision tree) Answer to Questions 1, 2, & 3 Answer to Question 4 Highway network Vehicle crash location Black spots (Clusters of crashes) Objectives • Is there any clustering tendency? • Where are the clusters? • How large are the clusters? • What causes the clusters?
Event-based data Link-attribute-based data Objectives (cont.) • Stage 1: Cluster detection in the network space • To develop exploratory spatial data analysis methods for network-based local-cluster detection, named local indicators of network-constrained clusters (LINCS). K-function Moran’s I and Getis & Ord’s G statistics
Objectives (cont.) • Stage 2: Influencing factor identification • To examine applicability of inductive learning techniques for constructing models that explain the clusters in relation to the characteristics of the network space; • Decision tree induction algorithms; • Feedforward neural networks; • Discrete choice/regression models --- as examples of traditional statistical methods.
Outline • Constraints imposed by the network space • Stage 1 — Development of LINCS • Network K-function for event-based data • Stage 2 — Inductive learning • Decision tree induction to model relationships between the detected clusters using the network attributes • Case study: • 1997 vehicle crash data in Buffalo, NY • Conclusions
Constraints imposed by the network space • Location constraint: • Some spatial phenomena occur only on the links of the network. • E.g., vehicle crashes, retail facilities, geocoded addresses (crime locations, patient residences, …); • Movement constraint: • Movement between locations is restricted to the network links; • E.g., One can get to a gas station only by driving along the streets; • Distance between locations is more appropriately represented by the network (shortest-path) distance than by the Euclidean (straight-line) distance.
Network constraints (cont.) Location constraint Movement constraint
Network K-function Planar K-function Global Network K-function (Okabe & Yamada 2001) • Extension of Ripley’s K-function (1976) to determine • If a point pattern has clustering/dispersal tendency significantly different from random with respect to the network; • For a set of network-constrained events P, • where ρ is the intensity of points. Not within distance h Within distance h
Global Net K-function (cont.) An example of random distribution in a network space
Global Net K-function (cont.) • Weakness of the global K-function in determining the scale of clustering: • If there is a strong cluster with radius R, K(h) tends to exceed the upper significance envelope, indicating clustering, even for h≥R. • Incremental K-function: • Instead of examining the total number of events within distance h, examine an increment of the number of events by a unit distance; • It can identify clustering scale more accurately than the original K-function. Different IncK(ht) Similar K(h)
Local Network K-function • Local indicator of clustering tendency: • Decomposition of the global K-function: • This indicator is determined only for event locations; only for limited locations in a network; • Introduction of reference points: • Distributed over the network with a constant interval for which indicator values are calculated; • c.f., regular grid used in the planar space analysis such as Geographical Analysis Machine (GAM).
Local Net K-function (cont.) • Local network K-function: where j=1, …, m, and m is the number of reference points; • For an observed pattern, • Local K-function values are obtained for the reference points for a range of distance h. LINCS for event-based data (KLINCS)
Example of the KLINCS analysis • The incremental K-function can be an indicator of the scale of clustering to help us determine which scale(s) of the local K-function to be closely examined; Distance 2, in this case.
KLINCS (cont.) • Results of the local network K-function: • Significance of individual reference points is determined by comparing with 1,000 simulations of random patterns on the network; • Obs. LKj(h) ≥ the largest simulated LKj(h) clustering; • Obs. LKj(h) ≤ the smallest simulated LKj(h) dispersal. (0.1% significance level)
Local version ILINCS LINCS for link-attribute-based data • Moran’s I statistic (1948): • A global measure of spatial autocorrelation; • Dependence of a variable value at a location on those on its nearby locations in a spatial context • LISA (local indicators of spatial association) by Anselin (1995); • Network Moran’s I (Black 1992): • A measure of network autocorrelation; • Dependence between a variable value at a given link and those of other links that are connected to the link in a network context. • Getis and Ord local G statistics (1992): • A local measure of concentration of variable values around a region; • Applicable to link-attribute-based data (Berglund and Karlström 1999). GLINCS
Relationship between I and G statistics Value of the target link i Values of the links in the neighborhood of link i
From LINCS to inductive learning • Question: What causes the detected clusters? • LINCS gives a measure of clustering tendency for each spatial unit (ref. point or link segment). • Network data include attributes that may be related to the cause of the clusters. • E.g., travel speed, traffic volume, … • Spatial attributes can also be assigned to the spatial units. • E.g., distance from the closest intersection, travel time from the closest police station, average income of the area, …
Spatial units LINCS results Network attributes Spatial attributes Clustering Random Dispersion Causality Relationships? LINCS to IL (cont.) • The spatial units can be categorized based on their LINCS values. • E.g., cluster/random/dispersion; large cluster/medium cluster/ small cluster/random; cluster center/cluster fringe/random. Inductive Learning Decision tree induction Feedforward neural network
Inductive learning • A means to model relationships between input variables and outcome (classification) without relying on prior knowledge: (Gahegan 2000) • Learns from a set of instances for which desired outcome is known; • Predicts outcomes for new instances with known input variables.
Decision tree • A way of representing rules for classification in a hierarchical manner; (Witten & Frank 2000; Thill & Wheeler 2000) • Node --- test on an attribute; • Leaf node --- specification of a class. • Decision tree induction: • Recursive process of splitting a set of instances with correct class information (training set) into subsets based on a particular attribute; • E.g., CHAID (Kass 1980), CART (Breiman et al. 1984), C4.5(Quinlan 1993) .
Other techniques of modeling • Feedforward neural network with back-propagation: (Thill & Mozolin 2000, Demuth & Beale 2000) • A way of deriving a mapping of multiple input variables to classification from a training dataset. • Discrete choice model ~ as an example of traditional statistical modeling: • A way to analyze a relationship between a set of independent variables and a dependent variable of binary formor discrete choice outcome among a small set of alternatives; • Probit model/logit model.
Crash distribution in the study region Data • 1997 vehicle crash data for the Buffalo, NY area (by New York State Department of Transportation): • NY State highways; • Milepost system with the resolution of 0.1 mile; • 1,658 crashes in the study region; • Mileposts are used as reference points; Scale of analysis = 0.1 mile; • Monte Carlo simulation with 1,000 trials (0.1% significance level).
0.1~0.5mile 0.1mile Stage 1: Global scale results • Under the null hypothesis: • Crash probability = uniform over the network; • Crash probability = proportional to traffic volume; • Annual Average Daily Traffic.
KLINCS at 0.1 mile scale Adjusted for AADT Cluster: 110 ref. points Random: 1304 ref. points Dispersion: 38 ref. points (Total: 1452) Stage 1: Local scale results KLINCS at 0.1 mile scale Not adjusted for AADT Cluster: 125 ref. points Random: 1327 ref. points Dispersion: 0 ref. points (Total: 1452)
Stage 1 local results (cont.) ILINCS at 0.1 mile scale adjusted for AADT GLINCS at 0.1 mile scale adjusted for AAD T Positive autocorrelation: 23 links Not significant: 1462 links Negative autocorrelation: 0 links (Total: 1485) High-valued cluster: 19 links Not significant: 1438 links Low-valued cluster: 28 links (Total: 1485)
Stage 1 local results (cont.) Priority Investigation Locations (PILs) designated by NYSDOT KLINCS at 0.1 mile scale Adjusted for AADT
Stage 2: Inductive learning results • AADT-adjusted KLINCS classification • Decision tree by the C4.5 induction algorithm with 24 attributes
Stage 2 results (cont.) • AADT-adjusted GLINCS model • Dependent variable = degree of significant clustering (0~1000) • Model tree, where each leaf node represents a linear model
Stage 2 results (cont.) • Accuracy for the test set: • Not much difference between the three models, especially in terms of all instances; • Because 90% of the instances are “random,” the modeling processes tried to fit the models more to the random instances to make fewer errors Weighting schemes to emphasize underrepresented classes
Conclusions • This research proposes a comprehensive framework for a network-based spatial cluster analysis when the phenomenon of interest is constrained by a network space; • Event-based data & link-attribute-based data; • Detection of local clusters (stage 1) • The LINCS methods can detect clusters without detecting spurious clusters caused merely by the network constraints; • Identification of influencing factors (stage 2) • Inductive learning techniques are useful to construct robust models to explain the detected clusters in relation to the network’s attributes.
Conclusions (cont.) • Combination of exploratory spatial data analysis and inductive learning modeling is a powerful tool • to reveal latent relationships between distributions of spatial phenomena and characteristics of physical/social environments; and then • to assist spatial decision making processes by providing guidance where/what to focus attention; • Stage 1 Spatial focus; Stage 2 Contextual focus. • The case study showed relatively well correspondence between the LINCS results and PILs, which verifies the effectiveness of the LINCS methods.
Thank you! Any questions & suggestions