200 likes | 371 Vues
This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC. Indexing Spatio-Temporal Data Warehouses. Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong
E N D
This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC. Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong 26, Feb, 2002
Outline • Preliminary – Spatial data warehouses and aggregate trees • Applications and motivation • Solution for static objects • Solution for dynamic objects • Performance study • Conclusion
Preliminary – Spatial Data Warehouses • Each spatial object carries some sort of aggregate information (i.e., each landscape may involve the population). • A common query is the window aggregate query, which specifies a window query and retrieves the aggregate sum of all objects intersecting it. • Analogy of the “group-by” in conventional data warehouses. • Materialization techniques common in traditional data warehouses are of limited use since possible positions of queries are infinite. • Ad-hoc “group-by” R2 R4 75 R1 12 R3 150 132 qs
Preliminaries – Spatial Data Warehouse • A better approach is to deploy aggregate trees to introduce the spatial hierarchy [Kline and Snodgrass, 1995, Papadias, et al, 2001, Lazaridis and Mehrotra, 2001]. Aggregation R-tree R6 R5 R5 R2 225 144 R4 75 R1 12 R3 150 R4 R3 132 R1 R2 R6 132 12 qs 150 75 Retrieve the sum of aggregate of objects intersecting qs
Spatio-Temporal DW: Applications and Motivation • Spatio-temporal databases deal with objects whose properties may change with time. • Traditional studies in spatio-temporal databases focus on retrieving the actual objects that satisfy the query predicates. • Retrieve all vehicles that appear in the north district during 3pm to 5pm yesterday. • A more useful type of queries may be to retrieve, instead of the actual object IDs, the number of objects that satisfy the query conditions. • Retrieve the (approximate) number of vehicles in the north district during 3pm-5pm yesterday. • In the above example, the spatial objects (i.e., streets in the north district) that carry aggregate information (i.e., number of cars) are static. Other queries may involve dynamic objects. • The mobile phone antenna (i.e., the aggregate information = # of users served by the antenna) whose spatial extents (i.e., covering areas) may change over time.
Example (Static Objects) Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.
Traditional Methods • Pre-materialization • Even more difficult than spatial DW due to the inclusion of the temporal dimension. • Use an aggregation tree. • When the aggregate of a region changes, create a 3D box. An aggregate 3D R-tree is used to index all these boxes. • Problem: The spatial extent of a region must be duplicated many times although it does not change. 3D boxes for region R1 130 T5 135 T4 145 T3 150 T1
Aggregate RB-tree Spatial extents are stored only once.
Example (Dynamic Objects) Situation during timestamps 1-4 qs R2 R4 R1 R3 Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.
Example (cont.) change position at timestamp 5 R2 R4 R1 R3 qs Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.
Aggregate HRB-tree timestamp 1-4 timestamp 5 • Integrates the previous idea with the spatio-temporal access method HR-trees.
Aggregate 3D RB-tree • Creates a 3D box only when the spatial extent of an object changes.
Managing Numerous B-trees • If each B-tree is too small (i.e., the rates of spatial extent and aggregate changes are similar) • A block contains too few entries and much space is wasted. • Not suitable for caching. • Our solution is to use a B-File, which “packs” numerous B-trees into a single file • Avoiding empty spaces in a disk page. • Maintaining the same query performance.
Performance • Dataset settings • Number of spatial objects = 10,000 • History length = 1,000 timestamps • Aggregate agility – describes how fast the aggregate information changes (4%, 8%, 16%, 32%, 64%) • Region agility – describes how fast the spatial extents change • 0% for static objects • 0.01% for dynamic objects (capturing the fact that spatial dimension changes much slower than the aggregate data) • Datasets include 500,000 to 6,500,5000 records. • Each query contains 2 parameters: (spatial extents and interval length).
Conclusion • We propose indexing techniques that replace the data cube in spatio-temporal data warehouses and answer ad-hoc group-by queries very efficiently. • Both static and dynamic spatial dimensions are discussed. • Extensions • Cost models that predict the performance of alternative structures. • Query optimization based on the cost models. • Complex query evaluation