100 likes | 228 Vues
This presentation by Christine Körner from Fraunhofer AIS and Uni Bonn explores active learning techniques applied to spatial data. It addresses the challenges of obtaining labeled data for various applications, such as traffic prediction in major German cities. The FAW-Project aims to accurately predict traffic frequencies using selective sampling and KNN methods to maximize classification quality with minimal training examples. The presentation outlines experimental strategies and discusses issues like model stability and the influence of spatial characteristics on data analysis.
E N D
Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn
Outline • Active Learning • FAW-Project • Spatial Data • Experiment Outline
Active Learning • Difficult / expensive to obtain labelled data • manual preparation of documents for text mining • analysis of drugs or molecules • Active learning strategies actively select which data points to query in order to • minimize the number of training examples for a given classification quality • maximize the quality of results for a given number of data points
Selective Sampling Label? • Which Instance to choose next? Where we • have no data? • perform poorly? • have a low confidence? • expect our model to change? • previously found data that improved quality? ORACLE Instance add to training set
The FAW-Project • FAW: Association to regulate outdoor commercials • Goal: Prediction of traffic frequencies for 82 major German cities • Samples: ~ 400-1500 poster sites measured per city
Data Characteristics, Prediction • street name, • segment ID • speed class • street type • sidewalks • one-way-road • POIs • no. restaurants • no. public buildings • … • spatial coordinates • KNN: • similarity calculated based on scalar attributes and spatial coordinates • applies weights according to (spatial) distance of neighbors
Frequency Nordstraße Riesenweg 2000 1500 1000 500 0 Streets Segments Spatial Data • Spatial Data: • spatial covariance between data points • high autocorrelation and concentrated linkage* on street name bias test accuracy • 1:n relationship between street name and segments • frequencies within one street are alike • here: complete instance space is known (all street segments of a city) *David Jensen, Jennifer Neville: Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners
Active Learning in FAW • Usage: • additional samples at ~50 places per city • KNN needs cross product of street segments with all poster places • Cologne: 50 GB, 5 days • Strategy: • Data density • mean distance of next k neighbors • Model differences • Build Model Tree with predicted frequencies • Disagreement between models?
Experiment Outline Samples • Comparison of accuracy-increase using • Ranking vs Random order of added samples • Alternatives • iterative ranking (reality?, greedy search optimal?) • rank once, remove similar objects (eg: exclude segments of same street, …) • Possible Problems: • KNN not very stable • few samples, Oracle has little choice to provide requested data sets Model Tree KNN Frequencies Training Distance Test Iterations Oracle Ranking for AL
Thank you! • Suggestions • Ideas ? • Questions