170 likes | 195 Vues
Trajectory Pattern Mining. Fosca Giannotti. Dino Pedreschi. Mirco Nanni. Fabio Pinelli. Chris Andrews. Georgia Institute of Technology B.S. Computer Science 5 th Year Undergraduate. Concepts. Analyze trajectory of moving objects A 3mins B 5mins C 10mins D
E N D
Trajectory Pattern Mining Fosca Giannotti Dino Pedreschi Mirco Nanni Fabio Pinelli Chris Andrews Georgia Institute of Technology B.S. Computer Science 5th Year Undergraduate
Concepts • Analyze trajectory of moving objects A 3mins B 5mins C 10mins D • Trajectory Patterns – description of frequent behavior relating to space and time • Frequent Sequence Pattern (FSP) • Determine if trajectory sequence matches any trajectory patterns in a given set • Study different methods of preparing a Temporally Annotated Sequence (TAS) for data mining
Trajectory Patterns (T-Patterns) • Trajectory Pattern • sequence of time-stamped locations • S = { ( x0, y0, t0 ) , … , ( xn, yn, tn ) } • Temporal Annotation • set of times relating to trajectories • A = { a1 , a2, … an } • Temporally Annotated Sequence • (S,A) = (x0,y0) a1 (x1,y1) a2 … an (xn,yn)
Neighborhood Function • Neighborhood Function N : R2 -> P (R2) • Calculates spatial containment of regions • Input point to find enclosing Region of Interest • Defines the necessary proximity to fall into a region • Parameters: • e – radius or necessary proximity of points
Regions of Interest (RoI) • Performing these comparisons on points is costly • A simple preprocessing step can alleviate this • Utilize the Neighborhood Function NR() • Translate each set of points into regions • Timestamp is selected from when the trajectory first entered the region • Now compare sequence of regions and timestamps using the TAS mining algorithm presented in [2].
Static RoI • Neighborhood Function NR() • Initially receives set of R disjoint spatial regions • R regions are predefined based on prior knowledge • Each represents relevant place for processing • Static NR() simplifies problem of mining patterns • Sequence of points become grouped • Result: sequence of regions • (x,y) a1 (x’,y’) becomes X a1 Y
Dynamic RoI • Data sets often do not possess predetermined regions • Instead need to formulate regions based on criteria of density of the trajectories • Preprocessing now must determine set R of popular regions from the data set • R is now the set of Region of Interests from used by the Neighborhood Function NR() to translate points into Regions of Interest
Popular Regions Grid G of n x m cells Density Threshold d Each cell with density G(i,j) Set R of popular regions • Each region in R forms rectangular region • Sets in R are pair wise distinct • Dense cells always contained in some region in R • All regions in R have average density above d • All regions in R cannot expand without their average density decreasing below d
Grid Density Preparation • Split space into n x m grid with small cells • Increment cells where trajectory passes • Neighborhood Function NR() determines which surrounding cells • Regression - increment continuously along trajectory
Popular Regions Algorithm • Algorithm: PopularRegions( G, d ) • Complexity: O ( |G| log |G| ) • Iteratively consider each dense cell • For each: • Expands in all four directions • Select expansion that maximizes density • Repeat until expansion would decrease below density threshold
Evaluating the T-Patterns • Compute density of each cell of grid • Compute set of RoI’s by determining Popular Regions • Translate the input trajectories into sequence of RoI’s and timestamps for the transitions • Input the trajectories and times into TAS mining algorithm[2]
Experiments • GPS Data • Fleet of 273 trucks in Athens, Greece • 112,203 total points recorded • Running both static & dynamic pattern algorithms • Various parameter settings • Performance Analysis • Synthetic Data by CENTRE synthesizer • 50% random & 50% predetermined
Pattern Mining Results Static found: A t1 B t2 B Dynamic found: A t1 B’ t2 B’’
Execution Time Results • Increase linearly with increasing number of input trajectories (both algorithms) • Grow when density threshold decreases • Static performs better with extreme threshold • Static does not perform with middle threshold
Additional Results • Increasing radius of spatial neighborhood obtains irregular performance and large values lead to poor execution times • Changing time tolerance (t) obtains results similar to TAS’s • Increasing the number of points in each trajectory causes linear growth of execution times
Works Cited • [1] Trajectory pattern mining, Fosca Giannotti, Mirco Nanni, Fabio Pinelli, Dino Pedreschi, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining KDD. ACM, 2007. • [2] Efficient Mining of Sequences with Temporal Annotations. F. Giannotti, M. Nanni, and D. Pedreschi. In Proc. SIAM Conference on Data Mining, pages 346–357. SIAM, 2006.