670 likes | 796 Vues
Improving Access Efficiency for Spatial Databases. Amr El Abbadi Computer Science Department University of California, Santa Barbara. Collaborators. Divyakant Agrawal Current Graduate Students: Alireza Aghili Ying Feng Abhishek Gupta Huagang Li Lin Qiao Ozgur Sahin Chengyu Sun
E N D
Improving Access Efficiency for Spatial Databases Amr El Abbadi Computer Science Department University of California, Santa Barbara
Collaborators • Divyakant Agrawal • Current Graduate Students: • Alireza Aghili • Ying Feng • Abhishek Gupta • Huagang Li • Lin Qiao • Ozgur Sahin • Chengyu Sun • Hailing Yu
Roadmap • Browsing large spatial dataset • Spatial join selectivity estimation • Hardware accelerated spatial selection and join
Browsing • Alexandria Digital Library (ADL) • Started in 1995 • A repository for geo-referenced materials • 6,000,000+ records • Browsing Service • Motivation • Explore large spatial datasets efficiently • Make educated queries • Challenges • 2-dimensional objects • Various spatial relations
Browsing Service Prototype • Modeled after ADL query client • Spatial footprint, temporal coverage, subject type, format type … • Intersection and Containment • Return selectivity instead of actual records • Hundreds of queries (“tiles”) all at once
Histogram-based Approach • Performance is independent of dataset size • Histograms for point data is trivial • More difficult for rectangular objects 3 0 1 1 1 2 1 2
Problem Formulation • Given • Rectangular objects • Rectangular queries • A pre-defined grid • Return • selectivity for intersect, contains, contained queries • Requirements • Exact answers, or • Good estimations • FAST!!
Selectivity for Intersection Queries • [BeigelT98], [JinAS00] • Histograms for rectangle objects • Exact query selectivity • Constant query response time • Intersection query only
9-Intersection Model … • [EgenhoferH94] • The spatial relation between two objects P and Q can be defined by the intersections of their interiors, exteriors and boundaries. P Q P contains Q P.I Q.I P.I Q.B P.I Q.E P.B Q.I P.B Q.B P.B Q.E P.E Q.I P.E Q.B P.E Q.E
…9-Intersection Model contains covers overlaps meets contained covered equals disjoint
Contains Contained Overlaps Disjoint Interior-Exterior Model … • Four intersections • Five spatial relations P.I Q.I P.I Q.B P.I Q.E P.B Q.I P.B Q.B P.B Q.E P.E Q.I P.E Q.B P.E Q.E Neq Ncs Ncd No Nd Equals
… Interior-Exterior Model • Neq = 0 |S| • nee = |S| is the size of the dataset • nii is the number of intersecting objects
Euler’s Formula • F – E + V = 2 • For example • 10 faces (including the exterior face) • 24 edges • 16 vertices • 10 – 24 + 16 = 2
Beigel-Tanin’s Corollary • Fi – Ei + Vi = 1 • For example • 9 interior faces • 12 interior edges • 4 interior vertices • 9 – 12 + 4 = 1
-1 0 -1 1 -1 0 0 0 -1 0 Euler Histogram • [BeigelT98] 1 1 1 1 Conventional Histogram 1 1 1 1 1 1 1 1 Euler Histogram
Compute nii • Selectivity for an intersection query • Sum up everything inside the query • For example: • 1-1+2-1+1-2+1-1+2 = 2 1 -1 2 -2 3 -1 1 -2 2 -2 1 -1 2 -2 2 -1 1 -2 2 -2 1 -1 2 -2 2 1 -1 1 1 -1 1 -1 -1 1 -1 1 1
Recall………. • nii is the number of intersecting objects • What about nei ?
Compute nei • Euler Histogram is a histogram about object interiors. • nii can be computed by summing up every bucket inside the query … • … nei can be computed by summing up every bucket outside the query? • Well, not always. 1 -1 2 -2 3 -1 1 -2 2 -2 1 -1 2 -2 2 -1 1 -2 2 -2 1 -1 2 -2 2
Problem #1: Crossover Objects • Two disconnected intersection regions will be counted separately • Example: 1+1 = 2 • Solution? 0 0 1 0 0 0 0 -1 0 0 0 0 1 0 0 0 0 -1 0 0 0 0 1 0 0 • Life is tough, live with it
Problem #2: Loophole Effect • Intersection regions with a hole will not be counted • Solution? 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 • Assume such objects don’t exist (Ncd = 0) 1 -1 1 -1 1 • Break the loop somehow
Simple EulerApprox • From Interior-Exterior Model: • Assume Ncd = 0 • or
B EulerApprox • For datasets with large objects, so Ncd != 0 • Compute nei’ by breaking the loop A • Objects that are strictly inside Region A, plus • Objects that intersect Region B
Multi-resolution EulerApprox • Multiple Euler histograms • Use EulerApprox at higher levels • Use Simple EulerApprox at lower levels
Experimental Setup • Datasets • ADL Catalog • California Road Segments • SP_SKEW • SZ_SKEW • Data space 360x180 • Histogram resolution 1x1 • Tile-like queries • Each query set covers the complete data space • 2x2, 3x3, … , 20x20 sp_skew sz_skew
Performance - EulerApprox • For ADL dataset • Worst case Average Relative Error for contains queries drop from 120% to about 15% • For SZ_SKEW dataset • Worst case Average Relative Error for contains queries is around 95%
Performance – Multi-resolution EulerApprox • For ADL dataset with 2 histograms • Worse case for contains queries is about 5% • For SZ_SKEW dataset • With 3 Histograms • ARE peak at below 3% (b) With 4 Histograms ARE peak at around 1% (c) With 5 Histograms ARE peak at about 0.5%
Performance - Timing • Timing performed on a PIII 800 desktop • Results • All three algorithms process 12,600 queries under 25ms • Simple EulerApprox and EulerApprox are about twice as fast as Multi-resolution EulerApprox • More details in ICDE ’02
Spatial Join Selectivity Estimation • Efficient browsing techniques are essential for digital libraries with large spatial datasets. • Selection estimation for browsing • Spatial joins are needed for more sophisticated GIS applications and spatial databases. • “Find all French-speaking regions in Europe’’ • Expensive and need optimization
Spatial Join • Find pairs of objects from two datasets that satisfy certain criteria • Intersection Join • Rectangular objects • (a,b) is a join result if • a A • b B • a intersects b A B
SA SB Spatial Join with Geometric Selections (SJGS) • General case • (a,b) is a join result, and • a intersects SA • b intersects SB • Special case • SA = SB = S • Applications • Map overlays • Data analysis • … • Selectivity Estimation • Find the number of results A B
S Spatial Join with Geometric Selections (SJGS) • General case • (a,b) is a join result, and • a intersects SA • b intersects SB • Special case • SA = SB = S • Applications • Map overlays • Data analysis • … • Selectivity Estimation • Find the number of results A B
Related Work … • [AnYS01] • Geometric Histogram • Number of intersection points / 4 • Perform well for full set spatial join • Do not handle selections
… Related Work • [MamoulisP01] • Histogram-based approach • Complete Solution for SJGS • General cases • Multi-way join • Strong Uniformity Assumption • Object centers are uniformly distributed • Objects have roughly same widths and heights
1 1 2 2 3 0 0 1 1 1 1 1 2 2 2 0 0 1 1 1 1 1 2 2 2 1 1 2 1 1 1 1 2 2 2 1 1 1 0 0 1 1 2 2 2 1 1 1 0 0 Euler Histogram for SJGS (1x1+2x2+1x1+2x1) – (1x1+1x1+2x1+1x1) + (1x1) =4
0 1 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 1 0 0 0 0 0 1 0 0 1 0 1 1 0 Euler Histogram Revisited • Limitation • Cannot represent fractions • Solution • More information per bucket
0 p1 1 0 p2 1 p0 = 1 Generalized Euler Histogram Framework … • pk is the probability of a set of objects intersecting another set of objects inside bucket hk
… Generalized Euler Histogram Framework • Calculate p2 • Probabilistic Model: assumptions about the data distribution inside a bucket • Statistics: average height , average width, average area … HA: HB:
Possible Probabilistic models • [MamoulisP01] model • Uses average object height and width • [AnYS01] model • Uses average object height, width and area. • [SunAE02] hybrid model • Uses a hybrid of two models
Discussion Estimation is important for performance optimization in GIS and spatial databases. More details in EDBT ‘02. Many issues remain: • Explore alternative probabilistic models • General SJGS queries • Alternative queries, e.g., containment
Spatial Query Processing • Filtering Step • MBR / Index • Find candidate objects • Refinement Step • Polygons • Find final results
Spatial Query Processing • Filtering Step • MBR / Index • Find candidate objects • Refinement Step • Polygons • Find final results
Spatial Query Processing • Filtering Step • MBR / Index • Find candidate objects • Refinement Step • Polygons • Find final results
Refinement • Costs • I/O • Computation • Spatial Selection • For polygon objects, both costs are significant [KothuriR01] • Spatial Join
Computation Cost • Complexity of the data • Alaska has more than 70,000 vertices • Arbitrary shape • Concave • Non-simple • Complexity of the algorithms • O(NlogN) intersection test • O(N2) distance calculation
Reducing Computation Cost • Better filtering for intersection queries • Convex hull, n-corner, MER … [BrinkKSS94] • Tiling [ZimbraoS98, BadawyA99, KothuriR01] • More efficient intersection test • TR* Tree [BrinkoffKSS94]
Graphics Hardware • Handles points, lines, and polygons • Fast • Real-time simulation, VR, computer games … • Sophisticated • Graphics processor • Geforce4: 63M transistors • General Processor • AthlonXP: 37.5M transistors • Pentium4: 55M transistors • Ubiquitous • From workstations to desktops to laptops
Graphics HW for Non-Visualization Applications • Interference and Collision Detection • [ShinyaF91] • [RossignacMS93] • [BaciuWS99] • Generalized Voronoi Diagram • [HoffCKLM99] • 2D Intersection Detection • [HoffZLM01]