180 likes | 321 Vues
Towards Visual Analytics of Movement Behaviours. Natalia & Gennady Andrienko www.ais.fraunhofer.de/and www.geopkdd.eu. Presentation @ Dagstuhl seminar, July 2007. Context. EU-funded GeoPKDD project: Geographic Privacy Aware Knowledge Discovery and Delivery
E N D
Towards Visual Analytics of Movement Behaviours Natalia & Gennady Andrienko www.ais.fraunhofer.de/and www.geopkdd.eu Presentation @ Dagstuhl seminar, July 2007
Context • EU-funded GeoPKDD project: Geographic Privacy Aware Knowledge Discovery and Delivery • Goal: develop methods for analysis of movements of discrete objects { (object_id, time, position_in_space) } • Consortium composition (disciplines): data mining + DB/DW + (geo)visualisation Data miners’ vision: Our vision (Visual Analytics): Data Mining DB/DW Visualisation Visualisation DB/DW Data Mining
Data • Problem: real data are hard to get! Let’s collect our own data… N;Time;Lat;Long;Height;Course;Speed;PDOP;State;NSat … 8;22/03/07 08:51:52;50.777132;7.205580; 67.6;345.4;21.817;3.8;1808;4 9;22/03/07 08:51:56;50.777352;7.205435; 68.4;35.6;14.223;3.8;1808;4 10;22/03/07 08:51:59;50.777415;7.205543; 68.3;112.7;25.298;3.8;1808;4 11;22/03/07 08:52:03;50.777317;7.205877; 68.8;119.8;32.447;3.8;1808;4 12;22/03/07 08:52:06;50.777185;7.206202; 68.1;124.1;30.058;3.8;1808;4 13;22/03/07 08:52:09;50.777057;7.206522; 67.9;117.7;34.003;3.8;1808;4 14;22/03/07 08:52:12;50.776925;7.206858; 66.9;117.5;37.151;3.8;1808;4 15;22/03/07 08:52:15;50.776813;7.207263; 67.0;99.2;39.188;3.8;1808;4 16;22/03/07 08:52:18;50.776780;7.207745; 68.8;90.6;41.170;3.8;1808;4 17;22/03/07 08:52:21;50.776803;7.208262; 71.1;82.0;35.058;3.8;1808;4 18;22/03/07 08:52:24;50.776832;7.208682; 68.6;117.1;11.371;3.8;1808;4 … About 75,000 in total
A systematic approach • Postulate: to develop methods for exploring and analysing data, we first need to determine what results these methods must produce (purpose!) • In other words, what an analyst is likely to seek in the data • Axiom: the purpose of (exploratory) analysis is to produce a parsimonious description (or other sort of representation) of all data enumeration of all data items! • Can we call it “model”? (descriptive but not (yet) predictive) • Corollary: an analyst is likely to seek invariants, i.e. something common for multiple data items • Parsimony: a single invariant is described instead of multiple data items • Invariant pattern? • Reservation: it may not always be possible to find invariants covering the whole dataset the analyst will have to describe deviations • But invariants need to be discovered first!
A systematic approach (continued) • Implication: develop methods for data analysis develop methods that enable the discovery of invariants (+ possibly methods to deal with deviations) • Implication: to develop methods, we need to define the possible types of invariants • Hypothesis: the types of invariants can be predicted by considering • data structure (what components are there and how they are related) • properties of the components • Let’s take movement data • Components: population (set of objects); time (set of moments); space (set of positions) • Relation: [object] [time] → [position] (space is dependent component!); single moving object: [time] → [position] • Population: discrete; no (natural) order; no distances • Time: continuous; linearly and, possibly, cyclically ordered; has distances • Space: continuous; partly ordered; has distances
Types of invariants in movement data • Let’s take a single moving object: [time] → [position] where time is continuous linearly ordered set with distances (+ possibly cyclically ordered) • First-order invariants (AKA “movement episodes”): constancy on adjacent time moments (throughout an interval) (reservation: “constancy” is not absolute but involves certain tolerance) • Constant position (= stop, no movement) • Constant change ( trend?): constant shift of position (direction, speed); constant acceleration (increasing/decreasing speed); constant change of direction (arc) • Relaxed constancy: object keeps moving (opposite to stop) • Movement can be described as sequence: episode – {change} – episode – …(time is ordered). However, little parsimony is achieved! • Second-order invariants: repeated episodes; repeated sequences of episodes • Yet more parsimony (constancy in repetition → third-order invariants?): • periodic repetition (constant distance in time between repetitions) (time has distances) • repetition on same positions in temporal cycles
Further design considerations • The data may be very large (possibly, not fitting in computer memory) develop methods that can work out of memory use database technology • Visual Analytics Mantra (D.Keim): Analyze First – Show the Important – Zoom, Filter and Analyze Further – Details on Demand • Integration model: • What we have achieved by now: • Use DB to extract episodes (1st-order invariants) • Use DM to detect repetitions (2nd-order invariants) • Use visualisation to see if the repetitions are regular (3rd-order invariants) Visualisation Database Data Mining
Data preprocessing (one-time operation in DB) • Original data: • <t, x, y> • Results of preprocessing: • Points are connected into sequences (NEXTTIME field) • dX, dY, dT, distance • Derived speed, course, acceleration and turn in each point • Additional temporal components are easy to extract from the database: day of week, day of year, decade of month… We separated time-consuming data pre-processing (needs to be performed only once!) from rapid analysis and aggregation procedures (that take just few seconds)
A new galaxy? (points are plotted according to DX & DY) An artefact of the straight line filtering (threshold = 20 meters) by the data collection software
Extraction of stops (DB operation + GUI + Visualisation) 1 hour 5 minutes 30 seconds Differ in meaning!
Seeking repetitions and regularity of the stops Spatial clustering: find repeated stops Visualisation: look for regularityw.r.t. time cycles
Seeking repetitions of moves: progressive clustering Joint work with S.Rinzivillo, Univ. Pisa: • generic clustering algorithm OPTICS (Ankerst, Breunig, Kriegel, and Sander 1999) • the algorithm has been implemented so that cluster building is separated from distance and neighbourhood computation • several variants of distance measures designed specially for trajectories • various ways of handling the times in the data
Seeking regularity of repetitions w.r.t. time cycles hours days months
Result 2: refinement of the two selected clusters Hours; Days; Months
What’s next? • Enhance visualisation! • Extraction of episodes on the basis of speeds, directions, accelerations, and turns • Spatial aggregation • Temporal aggregation • Spatio-temporal aggregation • Attribute-based aggregation • We are seeking for a killer application • Data are available… despite the privacy concerns