260 likes | 340 Vues
Improving Min/Max Aggregation over Spatial Objects. Donghui Zhang, Vassilis J. Tsotras University of California, Riverside. ACM GIS’01. Outline. Problem Definition Straightforward Solutions Our Solution Performance Results By-Product: Optimized the MSB-tree Conclusions. ACM GIS’01.
E N D
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01
Outline • Problem Definition • Straightforward Solutions • Our Solution • Performance Results • By-Product: Optimized the MSB-tree • Conclusions ACM GIS’01
Problem Definition • Consider a collection of spatial objects. • Each object: rectangle r, value v. • Spatial Aggregation: find aggregate value over objects intersecting a given rectangle. We focus on MAX. • E.g.: a database of rainfalls over geographical areas. Find max rainfall in Los Angeles area. Problem Definition ACM GIS’01
Straightforward Solutions • Use an R*-tree [BKS+90] to index the objects. • Reduce to range search. • Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes; • If query rectangle contains a sub-tree, no need to search it. Straightforward Solutions ACM GIS’01
Straightforward Solutions • Use an R*-tree [BKS+90] to index the objects. • Reduce to range search. • Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes; • If query rectangle contains a sub-tree, no need to search it. Straightforward Solutions ACM GIS’01
Our Solution -- overview • The MR-tree: a specialized index for Min/Max aggregation. It uses the R*-tree and four optimization techniques: • k-max : increase the chance for the search algorithm to stop at higher tree levels; • box-elimination : erase information from the tree that will not contribute to any query; • union : do not insert an object which will not contribute to any query; • area-reduction : reduce the area of the object to be inserted. Our Solution ACM GIS’01
The k-max Optimization • Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle. Optimization Techniques ACM GIS’01
The k-max Optimization • Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle. Optimization Techniques ACM GIS’01
The k-max Optimization • Along with each index record r, store the k max-value objects in sub-tree(r). • Upon query, if the query rectangle intersects any of the k objects at r, omit sub-tree(r). • Trade-off: larger k more sub-trees to be omitted during query; but also more space & update. Optimization Techniques ACM GIS’01
The box-elimination Optimization • Motivation: if for objects o1 and o2 , o1.box contains o2 .box and o1.valueo2 .value, o2 is obsolete, i.e. does not contribute to any query and thus can be deleted. Optimization Techniques ACM GIS’01
The optimization: at insertion, remove obsolete objects and sub-trees along the insertion path. • Ideally, remove all obsolete objects/sub-trees, but too expensive. Instead, pick c (c : constant) paths. The box-elimination Optimization • Similar for object o1 and index record r2 , i.e. if o1.box contains r2 .box and o1.value max value in sub-tree(r2), the whole sub-tree is obsolete. • Trade-off: larger c smaller index size and faster query time; but also more update time. Optimization Techniques ACM GIS’01
The union Optimization • Motivation 1: if a new object o1 is obsolete due to an existing object o2 , o1should not be inserted. • Motivation 2: a new object o1 may be obsolete due to the union of several existing objects. Optimization Techniques ACM GIS’01
The union Optimization • Motivation 1: if a new object o1 is obsolete due to an existing object o2 , o1should not be inserted. • Motivation 2: a new object o1 may be obsolete due to the union of several existing objects. Optimization Techniques ACM GIS’01
The union Optimization • Along with each index record r, store the union of boxes of all objects in sub-tree(r); also store the MIN value of all these objects. • Do not perform the insertion of object o1 if: • o1.box is contained in r.union, and • o1.value r.min. • Question: how is the union computed and stored? Optimization Techniques ACM GIS’01
The union Optimization • Store an approximate union representation using t (t : constant) boxes. • The approximation should be fully contained in the actual union, and should cover as much space as possible. • Def: given a set of n boxes S={s1,…, sn}, the covered t-union of S is a set of t boxes A={a1,…, at} s.t. • si covers ai , and • ai covers max area possible. Optimization Techniques ACM GIS’01
The union Optimization • To compute the exact covered t-union: O(n 2t+4). • We propose an much faster approximate algorithm: O(n logn). • Idea of our algorithm: pick up t largest boxes and expand them. Optimization Techniques ACM GIS’01
The area-reduction Optimization • Motivation: the box of a new object o1can be reduced if an existing object o2 intersects it with a larger or equal value. Optimization Techniques ACM GIS’01
The area-reduction Optimization • Motivation: the box of a new object o1can be reduced if an existing object o2 intersects it with a larger or equal value. Optimization Techniques ACM GIS’01
The area-reduction Optimization • Reduce the area of new object o1 when: • index record r s.t. r.union intersects o.box and r.min o.value, or • one of the k max-value objects intersects o1 with a larger or equal value, or • leaf object o2 s.t. o2 .box intersects o1.box and o2 .value o1.value . Optimization Techniques ACM GIS’01
The area-reduction Optimization • Benefit 1: reduce overlap among sibling nodes. Optimization Techniques ACM GIS’01
The area-reduction Optimization • Benefit 1: reduce overlap among sibling nodes. • Benefit 2: increase chance to make new objects obsolete. Optimization Techniques ACM GIS’01
Performance Results • Datasets: 5 million square objects, size randomly chosen from 10 to 10000 (space in each dimension is 1 to one million). • Implemented algorithms: • R*: the R*-tree [BKS+90]; • aR: the aR-tree [PKZ+01, LM01]; • kaR: the aR-tree with k-max optimization; • MR: the MR-tree (with all the optimizations). Performance Results ACM GIS’01
Index Sizes Performance Results ACM GIS’01
Query Performance (log scale) • Query time is the total of 100 random queries of the same query rectangle size. Performance Results ACM GIS’01
Optimizing the MSB-tree • The MSB-tree [YW00]: efficiently maintains and computes MIN/MAX aggregates over 1-dim interval data. • Insertion/Query: O(logB m), B is page capacity, m is number of leaf records. • [YW00]: periodically reconstruct the whole tree to maintain a small m. During reconstruction, the index is off-line. • Can avoid reconstruction by applying the box-elimination optimization. Idea: if a new interval contains all intervals in a sub-tree with a larger value, the sub-tree is obsolete. Optimizing the MSB-tree ACM GIS’01
Conclusions • Addressed the MIN/MAX aggregation problem over spatial objects; • Four optimization techniques; • The MR-tree; • Much smaller index size and query time; • By-product: optimized the MSB-tree. Thank You! Conclusions ACM GIS’01