USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES Authors Daniel Emerson;Richi Nayak; QUT Justin Weligamage: QDTMR Presenter Daniel Emerson Computer Science Discipline Queensland University of Technology (QUT)

Project Details • The work for this presentation was conducted as a larger skid resistance – crash analysis as CIEAM I and CIEAM II projects from 2009 -20011 and conducted at QUT. • Project initiators & organizers: Justin Weligamage, Richi Nayak. • Data mining supervisor: Richi Nayak. • Data preparation, data mining & dm strategist : Daniel Emerson • Road engineering advisor: NappadolPiyatrapoomi

Motivation(why the work was done) • Applied data mining as a new approach for analysis with Queensland road & crashes data. • Had found a relationship between the crash risk of roads and their attributes, with skid resistance being significant. (roads having crash). • Sought a higher resolution measure of road crash risk through the crash count method. • Application of crash count data mining models in decision support systems to identifypotential roads for investigation and treatment.

Introduction • This paper presents a data mining case study in which predictive data mining is applied to model the skid resistance & road attributesto predict crashrelationship with the purpose of: • development of models (algorithms) on sample data, • applicationof the models to other data to predict high risk roads.

Data and Data Preprocessing • Several data sources obtained from QDTMR for four year period of 2004 to 2007 include • annual 1 km (or less) road segment snapshots with a list of road variables, • road surface texture depth test readings; seal type and seal age;roadway features, traffic flow, features such as intersections and many others. • dated, skid resistance 100 metre (or less) values representing skid resistance tests F0, • Crash instances, crash details and their road location

Examination of road segment crash count • Meeting our need for a more precise crash measure: crashes per 1km per year.

Crash count characteristics • Road segment crash count showed stability from year to year, indicating its value in crash risk analysis. 1 yr time scale

Clusters: crash count ranges (4yr) • Road segment data mining clusters based on road properties showed characteristic crash counts, thus relating road crash proneness with road properties

Method: Applying predictive data mining Reasons; • To demonstrate that road segment crash count can be modeled, thus establishing a relationship between crash count and roadway features. • Use the rules obtained from the model output in the analytical process to further contribute to understanding of how the roadway features contribute to crash count. • Later apply successful models in decision support.

Method: Applying predictive data mining … using a subset of quality data • Select the target variable to be predicted (crash count). • Select the input variables (road segment attributes). • Select a modelling method (regression tree algorithm). • Run a range of models with varying configurations (regression tree). • Evaluate and understand the results.

Model variables Road attribute input variables (significant order) AVG_FRICTION_AT_60_Ikm (F60 skid resistance) AADT (traffic rates) traffic_percent_heavy lane_count Texture Depth roughness_average rutting_average seal_age seal_type CRASH_SPEED_LIMIT CWAY_TYPE (single, double) CRAS_DIVIDED_ROAD ROAD_TYPE (highway, urban arterial etc) Roadway Feature (roundabouts, bridges, intersections etc) • These road segment attributes were relevant to predicting road segment crash count and became model input variables. Target Variable Road segment crash count

Model results • All models show a high correlation between actual crash count and predicted crash count

Charts of actual value vs. predicted value predicted value • Comparing models with 143 leaves and 83 leaves Actual value

A sample output rule Sample Rule 1. IFAVG_FRICTION_AT_60 < 0.4095 • AND CRASH_SPEED_LIMIT IS ONE OF: 90 100 110 • AND 3987 <= AADT < 6105 • AND CWAY_TYPE EQUALS SINGLE THEN • NODE : 48 • N : 315 …. Number of road segments in the group • AVE : 4.04444 …average crashes for the group • SD : 2.5357 ..standard deviation of the predicted crash values

Conclusion • Road segment crash count can be successfully modelled with road attributes using data mining. • A strong relationship exists between road crash countand road attributes. • Skid resistance plays an important role in determining the crash characteristics of the road segment. • The models may be of sufficient quality to use in decision support. • While the models are specific to Queensland roads, the method can be trialled and evaluated elsewhere.

Future Work • Work with road asset domain experts to analyse the rules, draw conclusions and improve the models. • Apply models for analysis of data subsets, such as crashes with severe human outcomes. • Apply the models to the whole-of-network dataset with the goal of identifying road segments that are skid resistance sensitive, i.e surface intervention to improve skid resistance will result in reduce crash risk.

Acknowledgement • This study is an ongoing investigation into road-crash supported by CIEAM (CRC Asset Management), QDTMR and Faculty of Science and Technology, QUT • Data mining tools used include • SAS (Statistical Analysis Software) • WEKA (Data Mining Software)

Acknowledgement Thanks and Questions Project Publications [1] Nayak, R., Piyatrapoomi, N. and Weligamage, J. (2009). Application of text mining in analysing road crashes for road asset management. Proceedings of the Third World Congress on Engineering Asset Management, WCEAM 2009, ( Athens, Greece, 28-30 September 2009). [2] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N.(2010) Using Data Mining on Road Asset Management Data in Analysing Road Crashes. Proceedings of the 16th Annual TMR Engineering & Technology Forum, (Brisbane, July 20, 2010, 2010). [3] Emerson, D., Nayak, R., Weligamage, J. and Piyatrapoomi, N. (2011). Identifying differences in wet and dry road crashes using data mining. (2010). Proceedings of the Fifth World Congress on Engineering Asset Management, WCEAM 2010, ( Brisbane, October 26,2010). [4] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N. (2011) Road Crash Proneness Prediction using Data Mining, Proceedings of the EDBT 2011, (Uppsala, Sweden., 2011).

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

Presentation Transcript

Using Horoscopes to Predict Data Provenance

Structural Concrete Innovations: A Focus on Blast Resistance

Using FWD Data to Predict Vibration Sensitive Pavement ...

“Data Mining on a Mushroom Database”

A Parallel Data Mining Package Using MatlabMPI

Missing values problem in Data Mining

1.6 Using Data to Predict

Working with crash data

A Crash Course in CASA With a focus on calibration

Dropout Prevention – Using Data to Predict Student Outcome

Focus Study: Mining on the Grid with ADaM

A survey on using Bayes reasoning in Data Mining

A survey on stream data mining

Data Mining on New Road Prediction

Data Mining with Big Data

Austroads developments in skid resistance

Using K values to predict reactions between different acids and bases.

Mining Metrics to Predict Component Failures

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

Using Matrices to Predict Growth

Laboratory test method for the prediction of the evolution of road skid-resistance with traffic

Mining Educational Data to Predict Students' Future Performance using Naïve Bayesian Algorithm