Trajectory Splitting Models for Efficient Spatiotemporal Indexing

Trajectory Splitting Models for Efficient Spatiotemporal Indexing Slobodan Rasetic Jörg Sander James Elding Mario A. Nascimento Department of Computing Science University of Alberta Canada

Overview • Introduction • Optimal Trajectory Splitting • Heuristic Trajectory Splitting • Experiments • Conclusion VLDB 2005

Introduction • Trajectories (e.g., for moving objects) consist of a sequence of observations in space and time Moving object trajectory [Pfo+00] Moving object trajectories [Pfo+00] VLDB 2005

Introduction • Several query types can be posed over trajectory data • Range queries answer questions about object positions • Topological queries answer questions about topology of objects (a) Topological (b) Combined queries [Pfo+00] • To process such queries efficiently, we need spatio-temporal index structures VLDB 2005

Introduction • Most spatio-temporal index structures are based on R-trees • R-Trees support a wide range of query types for spatial data • Trajectories can be viewed as 3-dimensional “spatial” objects and can be approximated using a Minimum Bounding Rectangle (MBR) Approximation of Spatial Data [Gut84] A resulting R-Tree Structure [Gut84] VLDB 2005

[Pfo+00] Introduction • Two baseline options for approximating trajectories with MBRs • “No Split”: whole trajectory is approximated by one 3d MBR • In general, bad approximation quality but small index size • “Full Split”: Each individual “elementary segment” of a trajectory is approximated by one 3d MBR. • Improved approximation quality by reducing “dead space”, but index can be large, although trajectory information can be completely stored inside the directory VLDB 2005

Introduction • Hadjieleftheriou et al. [Had+02] popose a solution that balances between “No Split” and “Full Split” by splitting trajectories only partially • Given a total number of allowed splits for a set of trajectories, distribute the splits among the trajectories so that the total volume of the resulting MBRs is minimized A 2D example of a trajectory approximated by one and three MBRs [Had+02] VLDB 2005

Introduction • Problems: • How many splits should be used? • Is there a cost model that can help us to “optimally” split trajectories w.r.t. expected number of I/Os (and not just with respect to volume reduction)? • Can we find a split strategy that is more efficient, and even applicable in an “online” fashion? VLDB 2005

Introduction • Minimizing the total volume of trajectory approximations does not necessarily lead to the smallest number of disk I/Os when processing range queries • A cost model must consider a distribution of the query sizes VLDB 2005

A query extended MBR Extended MBRs using 1,2 and 3 segments Optimal Trajectory Splitting • We model a query size distribution by the mean of the query sizes in each dimension, and assume the center of the queries are uniformly distributed in a finite 3d space S • Probability that a range query q intersects an MBR Bi VLDB 2005

Optimal Trajectory Splitting Let BT = (B1, …, Bm) denote specific MBR approximation of T into m segments • Expected number of data page I/Os for a trajectory T split into m segments • The model ignores directory level I/Os! VLDB 2005

t x Optimal Trajectory Splitting • Trajectory T can be split into m segments in different ways • Best decomposition of T into m segments VLDB 2005

t x Optimal Trajectory Splitting • Trajectory T can be split into different numbers of segments • Best overall decomposition of T: • can be found using dynamic programming based on VLDB 2005

Heuristic Trajectory Splitting • Based on the intuition that a trajectory can be approximated by “constant-slope” trajectory segments • The extended MBR volume of a “constant-slope” trajectory can then be analytically determined VLDB 2005

Heuristic Trajectory Splitting • We introduce the function g(c) – inverse density of elementary trajectory segments in an MBR (determined by c) • g(c) has a global minimum in the domain of real numbers, and it holds that for a “constant slope” trajectory T with t-1 elementarty segments: VLDB 2005

Heuristic Trajectory Splitting Algorithm LinearSplit u := 1, v :=2; //after the first two points of T while (there is a next point pv+1 in trajectory T) if find copt for T[u,v] using Theorem 2; c’ := round(copt); extract the first (v-u)/c’ segments from T[u,v], and insert them into the index; u := u+k*c; v++; Insert last MBR(T[u,v]) into the index; //end of T is reached VLDB 2005

Experiments • Experiments performed on a 1900+ AMD Athlon PC with 512 Mb RAM • XXL library for the R-tree implementation (using 4 Kb page size for all trees) • Data generated with • Network Data Generator • GSTD • Datasets containing 10,000, 20,000, and 50,000 trajectories • Average trajectory length: ~100 • Reported performance values are averages over 10,000 uniformly distributed queries VLDB 2005

Experiments • Test of the cost model • Qi,j denotes the query with spatial extensions i [%] and duration j • Ii,j denotes the tree optimized for spatial extensions i [%] and duration j Robustness of Optimal Split for Network Data Robustness of Optimal Split for GSTD Data VLDB 2005

Experiments • The approaches compared in the following experiments • NoSplit - each trajectory is approximated by a single MBR • OptimalSplit - our optimal trajectory splitting algorithm • LinearSplit - our linear trajectory splitting algorithm • FullSplit - each trajectory line segment is approximated by an MBR • HKTG-k% - N*k/100 total number of splits are used for splitting N trajectories • The query types used in the following experiments Snapshot query types Range query types VLDB 2005

Experiments • I/O performance when varying query size (50K objects) VLDB 2005

Experiments • I/O performance when varying database size (RM query) VLDB 2005

Experiments • Index building time when varying query size (50K objects) VLDB 2005

Experiments • Summary VLDB 2005

Conclusion • Cost model for splitting trajectories is developed based on an average query size • Based on the cost model, we develop an optimal split algorithm • We also develop a linear heuristic that is based on a constant-slope trajectory segment approximation • Experimental evaluation shows that our approaches consistently outperform other indexing approaches VLDB 2005

References [Gut84] GUTTMAN, A.: R-trees: a Dynamic Index Structure for Spatial Searching. In Proc. of ACM-SIGMOD Conference on theManagement of Data, pp. 47-57, 1984. [Pfo+00] PFOSER, D., JENSEN, C. S., AND THEODORIDIS, Y.Novel Approaches in Query Processing for Moving Object Trajectories. In Proceedings of the 26st VLDB Conf. (Cairo,Egypt, September 2000), pp. 395–406. [Had+02] Hadjieleftheriou, M., Kollios, G., Tsotras, V., Gunopulos, D.: Efficient indexing of Spatiotemporal Objects. In Proc. Of the Intl. Conf. On Extending Database Technology, pp. 251-268, 2002. VLDB 2005

Thank you for your attention! VLDB 2005

Trajectory Splitting Models for Efficient Spatiotemporal Indexing

Trajectory Splitting Models for Efficient Spatiotemporal Indexing

Presentation Transcript

A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing

An Indexing Framework for Efficient Retrieval on the Cloud

Indexing similarity for efficient search in multimedia databases

Mining, Indexing, and Querying Historical Spatiotemporal Data

Indexing Trajectory Data

Continuous Spatiotemporal Trajectory Join

iDistance -- Indexing the Distance An Efficient Approach to KNN Indexing

Indexing Large Trajectory Data Sets With SETI

A Distributed Indexing Strategy for Efficient XML Retrieval

Indexing Methods for Efficient XML Query Processing

Efficient Indexing of Versioned Document Sequences

Efficient Bitmap Indexing Techniques for Very Large Datasets

Latent trajectory models: an appetizer

Splitting and Merging Approach for Image Indexing and Retrieval in DC Domain

Spatiotemporal Data Indexing using hB π - tree

Sparselet Models for Efficient Multiclass Object Detection

Image Context, Efficient Indexing, and Sense-Specific Category Models

Indexing HDFS Data in PDW: Splitting the data from index

Chapter Two Trajectory Indexing and Retrieval

Estimating Spatiotemporal Effects for Ecological Alcohol Intervention Models

Efficient Trajectory Joins using Symbolic Representations