260 likes | 269 Vues
Trajectory Splitting Models for Efficient Spatiotemporal Indexing. Slobodan Rasetic Jörg Sande r James Elding Mario A. Nascimento. Department of Computing Science University of Alberta Canada. Overview. Introduction Optimal Trajectory Splitting Heuristic Trajectory Splitting
E N D
Trajectory Splitting Models for Efficient Spatiotemporal Indexing Slobodan Rasetic Jörg Sander James Elding Mario A. Nascimento Department of Computing Science University of Alberta Canada
Overview • Introduction • Optimal Trajectory Splitting • Heuristic Trajectory Splitting • Experiments • Conclusion VLDB 2005
Introduction • Trajectories (e.g., for moving objects) consist of a sequence of observations in space and time Moving object trajectory [Pfo+00] Moving object trajectories [Pfo+00] VLDB 2005
Introduction • Several query types can be posed over trajectory data • Range queries answer questions about object positions • Topological queries answer questions about topology of objects (a) Topological (b) Combined queries [Pfo+00] • To process such queries efficiently, we need spatio-temporal index structures VLDB 2005
Introduction • Most spatio-temporal index structures are based on R-trees • R-Trees support a wide range of query types for spatial data • Trajectories can be viewed as 3-dimensional “spatial” objects and can be approximated using a Minimum Bounding Rectangle (MBR) Approximation of Spatial Data [Gut84] A resulting R-Tree Structure [Gut84] VLDB 2005
[Pfo+00] Introduction • Two baseline options for approximating trajectories with MBRs • “No Split”: whole trajectory is approximated by one 3d MBR • In general, bad approximation quality but small index size • “Full Split”: Each individual “elementary segment” of a trajectory is approximated by one 3d MBR. • Improved approximation quality by reducing “dead space”, but index can be large, although trajectory information can be completely stored inside the directory VLDB 2005
Introduction • Hadjieleftheriou et al. [Had+02] popose a solution that balances between “No Split” and “Full Split” by splitting trajectories only partially • Given a total number of allowed splits for a set of trajectories, distribute the splits among the trajectories so that the total volume of the resulting MBRs is minimized A 2D example of a trajectory approximated by one and three MBRs [Had+02] VLDB 2005
Introduction • Problems: • How many splits should be used? • Is there a cost model that can help us to “optimally” split trajectories w.r.t. expected number of I/Os (and not just with respect to volume reduction)? • Can we find a split strategy that is more efficient, and even applicable in an “online” fashion? VLDB 2005
Introduction • Minimizing the total volume of trajectory approximations does not necessarily lead to the smallest number of disk I/Os when processing range queries • A cost model must consider a distribution of the query sizes VLDB 2005
A query extended MBR Extended MBRs using 1,2 and 3 segments Optimal Trajectory Splitting • We model a query size distribution by the mean of the query sizes in each dimension, and assume the center of the queries are uniformly distributed in a finite 3d space S • Probability that a range query q intersects an MBR Bi VLDB 2005
Optimal Trajectory Splitting Let BT = (B1, …, Bm) denote specific MBR approximation of T into m segments • Expected number of data page I/Os for a trajectory T split into m segments • The model ignores directory level I/Os! VLDB 2005
t x Optimal Trajectory Splitting • Trajectory T can be split into m segments in different ways • Best decomposition of T into m segments VLDB 2005
t x Optimal Trajectory Splitting • Trajectory T can be split into different numbers of segments • Best overall decomposition of T: • can be found using dynamic programming based on VLDB 2005
Heuristic Trajectory Splitting • Based on the intuition that a trajectory can be approximated by “constant-slope” trajectory segments • The extended MBR volume of a “constant-slope” trajectory can then be analytically determined VLDB 2005
Heuristic Trajectory Splitting • We introduce the function g(c) – inverse density of elementary trajectory segments in an MBR (determined by c) • g(c) has a global minimum in the domain of real numbers, and it holds that for a “constant slope” trajectory T with t-1 elementarty segments: VLDB 2005
Heuristic Trajectory Splitting Algorithm LinearSplit u := 1, v :=2; //after the first two points of T while (there is a next point pv+1 in trajectory T) if find copt for T[u,v] using Theorem 2; c’ := round(copt); extract the first (v-u)/c’ segments from T[u,v], and insert them into the index; u := u+k*c; v++; Insert last MBR(T[u,v]) into the index; //end of T is reached VLDB 2005
Experiments • Experiments performed on a 1900+ AMD Athlon PC with 512 Mb RAM • XXL library for the R-tree implementation (using 4 Kb page size for all trees) • Data generated with • Network Data Generator • GSTD • Datasets containing 10,000, 20,000, and 50,000 trajectories • Average trajectory length: ~100 • Reported performance values are averages over 10,000 uniformly distributed queries VLDB 2005
Experiments • Test of the cost model • Qi,j denotes the query with spatial extensions i [%] and duration j • Ii,j denotes the tree optimized for spatial extensions i [%] and duration j Robustness of Optimal Split for Network Data Robustness of Optimal Split for GSTD Data VLDB 2005
Experiments • The approaches compared in the following experiments • NoSplit - each trajectory is approximated by a single MBR • OptimalSplit - our optimal trajectory splitting algorithm • LinearSplit - our linear trajectory splitting algorithm • FullSplit - each trajectory line segment is approximated by an MBR • HKTG-k% - N*k/100 total number of splits are used for splitting N trajectories • The query types used in the following experiments Snapshot query types Range query types VLDB 2005
Experiments • I/O performance when varying query size (50K objects) VLDB 2005
Experiments • I/O performance when varying database size (RM query) VLDB 2005
Experiments • Index building time when varying query size (50K objects) VLDB 2005
Experiments • Summary VLDB 2005
Conclusion • Cost model for splitting trajectories is developed based on an average query size • Based on the cost model, we develop an optimal split algorithm • We also develop a linear heuristic that is based on a constant-slope trajectory segment approximation • Experimental evaluation shows that our approaches consistently outperform other indexing approaches VLDB 2005
References [Gut84] GUTTMAN, A.: R-trees: a Dynamic Index Structure for Spatial Searching. In Proc. of ACM-SIGMOD Conference on theManagement of Data, pp. 47-57, 1984. [Pfo+00] PFOSER, D., JENSEN, C. S., AND THEODORIDIS, Y.Novel Approaches in Query Processing for Moving Object Trajectories. In Proceedings of the 26st VLDB Conf. (Cairo,Egypt, September 2000), pp. 395–406. [Had+02] Hadjieleftheriou, M., Kollios, G., Tsotras, V., Gunopulos, D.: Efficient indexing of Spatiotemporal Objects. In Proc. Of the Intl. Conf. On Extending Database Technology, pp. 251-268, 2002. VLDB 2005
Thank you for your attention! VLDB 2005