1 / 35

An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS

An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS. Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer Science University of Denver {wbae, salkobai, leut}@cs.du.edu. Outline. Introduction Motivation Spatial Join Estimation

sachi
Télécharger la présentation

An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer Science University of Denver {wbae, salkobai, leut}@cs.du.edu

  2. Outline • Introduction • Motivation • Spatial Join Estimation • IRSJ Algorithm • Sampling • Joining • Statistics • Experiments • Conclusion

  3. Introduction • GIS data is used to describe the geometry and location of geographic phenomena. • GIS data is represented in two ways: • Raster: divides the world into cells. • Vector: defines features based on coordinate-based structures (e.g. point, line, polygon). • This paper targets vector data.

  4. Introduction Cont. • Geographic or spatial queries are applied to spatially indexed databases. • e.g. containment and intersection. • We focus on finding the number of intersections of two spatial datasets (spatial joins). • e.g. number of roads that intersect rivers in the US.

  5. Spatial Joins • Spatial joins relate two data sets that share locations in space. • The processing of spatial queries can be accelerated when some spatial indexing such as R-tree exists. • Spatial joins of two R-trees can be done by applying synchronized tree traversals on both R-tree nodes to find intersecting items.

  6. Spatial Joins Cont. R: Rivers S: Cities R1 R2 S1 S2 r1 r2 r3 r4 r5 s1 s2 s3 s4 R2 S1 r4 s1 r1 R1 r5 S2 s3 r2 s2 r3 s4

  7. Motivation • GIS supports very large data sets finding exact answers to spatial queries can be very time consuming. • In GIS data analysis a fast estimation of the final result that has error bounded to 2%-10% can do the job. • So, provide an approximate answer through an incremental refining process. • Thus, allow for more interactive data exploration.

  8. Examples • “What are the intersections of mineral plants and radiometric ages areas in the US?” • “Where do mineral resources intersect geochemical sediments in the US?”

  9. Examples Cont. Locations of geochemical sediments in CO Locations of mineral resources in CO Intersections

  10. Spatial Join Estimation • Parametric: uses some properties of data distribution to present a formula for the estimation. (e.g. power law, fractal dimension). • Histograms: keep certain information for different regions of the data to be used when a query is given. (e.g. Geometric Histogram, Euler Histogram).

  11. Spatial Join Estimation Cont. • Sampling: uses smaller data sets (samples) to calculate an estimate of the final result by applying the join on the sample.

  12. Dataset 2 (S) WQ Sampling # intersections (intermediate result) Samples Statistics Final Estimation w/ CI User Incremental Refining Join Process Dataset 1 (R) Report Incremental Process

  13. Random Sampling • We assume that both data sets are indexed using R-trees. • Samples are chosen from one R-tree called the outer relation R. • Samples are used as window queries to query the inner relation S. • Randomness: • Acceptance/Rejection method: inclusion probability is proportional to some parameter of the item sampled.

  14. Tuple and Page Level Sampling • Tuple-level: • A page is selected at random from R and one tuple (MBR) of that page is chosen at random. • Page-level: • A page is selected at random from R and all tuples (MBRs) of that page are used as a sample.

  15. Window Query • The chosen MBR from one data set (R-tree) serves as a window query to the other data set to find the intersections. • The query returns all the objects from the second data set that overlaps with the query window. • The number of intersections found is used in the process of finding an approximate answer to the query.

  16. R: Rivers S: Cities R1 R2 S1 S2 r1 r2 r3 r4 r5 s1 s2 s3 s4 R2 S1 r4 s1 r1 R1 r5 S2 s3 r2 s2 r3 s4 Window Query Example

  17. Estimated Value and Confidence Interval • Estimated Value: the statistic computed from sample information. • Population Proportion: fraction indicating the part of the sample having a particular interest. • Confidence interval: an interval that estimates a population parameter within a range of possible values at specified probability. • The specified probability is called the level of confidence.

  18. IRSJt Algorithm • C 0; CI 0 {count, confidence interval} • repeat • for i = 0 to k do • L Choose leaf from R at random • M MBR of a randomly chosen tuple within L • I number of intersections of a Window Query (M,S) • CC + I • end for • CI Compute confidence interval using C • EV Compute estimated value using C • until The desired confidence interval Cf attained

  19. Experiments (settings and data sets) • IRSJ compared to full R-tree join. • Confidence level set to 95%. • Varied buffer size and data size. • Data sets: • Synthetic: U x S, S x U, U x U (# of tuples in each relation varied from 100,000 to 600,000). • Real: from the U.S. Geological Survey: • Mineral Resources in the US 2005 (300,432 tuples). • Geochemistry of unconsolidated sediments in the US 2001 (199,850 tuples).

  20. Experiments Cont.(Synthetic data results) Estimated Value U-600K x S-400K

  21. Synthetic Cont. Confidence Interval U-600K x S-400K

  22. Synthetic Cont. I/Os with 10% buffer

  23. Synthetic Cont. R-tree join Ratio to IRSJt

  24. Experiments Cont.(Real data results) Estimated Value

  25. Real Cont. Confidence Interval

  26. Real Cont. I/O and Node Accesses of IRSJt and a full R-tree join

  27. Conclusion • Proposed Incremental Refining Spatial Join: • Page-level • Tuple-level • Experimental results showed: • IRSJ provides a reasonably accurate estimate in much earlier stages than the exact answer obtained by full R-tree join. • IRSJt performs better than IRSJp. • As the data size increased, the improvement of IRSJt over full R-tree join increased.

  28. Statistics Final Estimation w/ CI Report Dataset 2 (S) Dataset 1 (R) Sampling WQ # intersections (intermediate result) Samples Dataset Output / Input Incremental Joining Process User Process

  29. R: Rivers R: Rivers S: Cities S: Cities L1 L1 L2 L2 S1 S1 S2 S2 r1 r1 r2 r2 r3 r3 r4 r4 r5 r5 s1 s1 s2 s2 s3 s3 s4 s4 L2 L2 S1 S1 r4 r4 s1 s1 r1 r1 L1 L1 r5 r5 S2 S2 s3 s3 r2 r2 s2 s2 r3 r3 s4 s4 r3 r3

  30. R: Rivers L1 L2 r1 r2 r3 r4 r5 L2 r4 r1 L1 r5 r2 r3

  31. L1 L2 L3 L5 L6 L4 r1 r3 r2 r4 r5 r11 r7 r8 L2 r2 r4 R: Rivers ST3 ST1 ST2 r10 r12 r6 r9 L5 r11 L3 r5 r9 L6 r6 r7 L1 L4 r8 r3 r12 r10 r1

  32. L1 L2 L3 L5 L6 L4 L2 r2 r4 R: Rivers r1 r3 r2 r4 r10 r12 r5 r11 r7 r8 r6 r9 L5 r11 L3 r5 r9 L6 r6 r7 L1 L4 r8 r3 r12 r10 r1

  33. Sampling WQ Samples Statistics # intersections (intermediate result) Final Estimation w/ CI Report Stop End Yes No User Dataset 1 (R) Dataset 2 (S) Dataset Output / Input Transition step Process

  34. R: Rivers R1 R2 R3 R4 ST1 ST2 L1 L2 L3 L4 L5 L6 L7 L8 L1 L2 L3 L4 r1 r2 r4 r6 r9 r10 r12 r3 R3 L5 r11 R1 R2 r5 r2 R4 r9 L3 r4 L2 r6 r7 L6 L7 r12 r8 r3 r10 L4 L1 L8 r1

  35. R: Rivers R1 R2 R3 R4 ST1 ST2 R1 R2 R3 R4 L1 L2 L3 L4 L5 L6 L7 L8 L2 L1 L3 L4 L5 L6 L7 L8 r1 r3 r5 r7 r8 r9 r11 r13 r16 r17 r2 r4 r6 r10 r12 r14 r15 r18

More Related