1 / 52

Spatial Databases - Indexing

Spatial Databases - Indexing. Spring, 2010 Ki-Joune Li. What is Indexing ?. Indexing : Fight against TIME Example Suppose that you have a Hamlet , and you want to know the name of Hamlet’s father. Without Index : Full (Sequential) Scan of the book With Index : Direct Access to the Page.

habib
Télécharger la présentation

Spatial Databases - Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Databases- Indexing Spring, 2010 Ki-Joune Li

  2. What is Indexing ? • Indexing : Fight against TIME • Example Suppose that you have a Hamlet, and you want to know the name of Hamlet’s father. • Without Index : Full (Sequential) Scan of the book • With Index : Direct Access to the Page Hamlet

  3. Some Constraints • Modern Database • Very Huge Volume : e.g. several peta bytes • Storage on Disk • Inevitable • But slow (cf. main memory) : msec. vs. nano sec. • Even in Main Memory Database System • What should we do ? Minimize the number of Disk Access

  4. Disk Address (Block Number) Index Indexing The Objective of Indexing Database in Disk Query Condition

  5. Disk Address (Block Number) Spatial Index Spatial predicate Classification of Indexing • According to the type of query and data • Alphanumeric query • Image • Spatial • What is the nearest post office to the Louvre Museum ? Database in Disk Spatial Query

  6. Spatial Query • Sophisticated • Types of Spatial Query • One Scan Query • Region Query : Containment, Intersection • K-Nearest Neighbor Query • Multi-Scan Query : Join • Spatial Join • Distance Join • Spatial Query Processing • Tightly coupled with Spatial Indexing Method

  7. Verification of Geometry Candidates Result Simplification of Geometry Complete Data 1. More Light Index : e.g. < 1 M bytes 2. Remove Unnecessary Disk Accesses Spatial Processing Strategy • Filtering and Refinement Strategy Index Spatial Query Filtering Refinement

  8. Classification of Spatial Indexing Methods • Hashing and Indexing • Index (in wide sense) • Hashing, Indexing (in narrow sense) • Space Decomposition vs. MBR • Decomposition of a space : Whole Space • Bounding Rectangle : Only Interesting Area • Dimensionality • No Transformation • to Higher Dimension • To Lower Dimension : Linearization

  9. Indexing vs. Hashing • Hashing • 1. b = h(r.key) • 2. Store(r, b) • Block number is determined by hashing function or mechanism • Only for primary index • Search by a hashing function • Indexing (in narrow sense) • 1. b = Store(r ) • 2. Insert(B, (r.key, b) ) • Block number is independent from indexing mechanism • For primary or secondary index • Search by a data structure called index

  10. Decomposition Bounding Region Decomposition vs. Bounding Region

  11. Decomposition Methods • Grid File : An Extension of Hashing to 2-D • Variation • Fixed Grid • Grid File • Multi-Level Grid File • Hierarchical Data Structure • KD-tree • Quadtree • skd-tree • etc.

  12. 1 Disk Page Query Window 40 30 20 10 0 0 10 20 30 40 50 Fixed Grid • Most Simple Method • Minimum Data for Hashing 1. Find intersecting grids 2. Find corresponding blocks 3. Read objects from the blocks 4. Refinement

  13. Query Window 40 30 20 10 0 0 10 20 30 40 50 Problems of Fixed Grid • Only for Point Object • Object with measure : duplicated storage • Degrade performance • Large Dead Space • Causes Unnecessary Disk Accesses • Not very Flexible • On Distribution

  14. Grid Boundary Block# A (0,0),(15,20) Page 0 B (15,0),(30,20) Page 1 . . . . . . . . . Directory Query Window I (30,28),(50,40) Page 15 Grid File • To overcome problems of Fixed Grid • Reduce Dead Space within a cell • Increase Blocking Factor 40 28 20 0 0 15 20 30 50

  15. Blocking Factor • A Key Factor on performance • Number of Objects in a Disk Block • Number of Disk Accesses • How to increase Bf ? • Increase Block Size : not always possible • Packing

  16. Grid Boundary Block# A (0,0),(15,20) Page 0 B (15,0),(30,20) Page 1 . . . . . . . . . Directory I (30,28),(50,40) Page 15 Problems of Fixed Grid • Only for Point Object • Still Large Dead Space • Large Size of Directory

  17. Hierarchical Decomposition • To overcome the size of directory in Grid File • Hierarchical Structure of Directory • Acceleration of Search

  18. A Directory x=20 =< < y=10 y=20 x=30 Each leaf node points to the disk page KD-tree : Index • Extension of Binary Tree to K-Dimension (K=2 for us) • Example : suppose Bf =3 B E 15 A E B 10 D A C C D 30 20

  19. KD-tree : Search B E x=20 =< < y=10 y=20 15 x=30 A A E B 10 D A C C D 30 20

  20. Weak Points of KD-tree • Only for Point Objects • Dead Space • How to Store Tree Structure on Disk Space • Blocking Problem • Widely used for main memory index • Rarely used for disk resident index • Unbalanced Tree • Zipf’s Law (or 80/20 law) • Most events are concentrated • Leads highly skewed tree B E D A C

  21. Each leaf node points to the disk page Quadtree • Extension of KD-tree : • KD-tree : binary split • Quadtree 4-way equi-split instead • Example : Bf =3 C D F A F B E B C D E G H I J H J G A I

  22. Weak Points of Quadtree • Same Problems of KD-tree • In addition to the lack of flexibility • Only for Point Objects • Dead Space • How to Store Tree Structure on Disk Space • Blocking Problem • Widely used for main memory index • Rarely used for disk resident index • Unbalanced Tree • Zipf’s Law (or 80/20 law) • Most events are concentrated • Leads highly skewed tree

  23. Point Quadtree • A Simple Variation of Quadtree • Specification of Partition Point instead of equi-split • More Adaptive to the distribution of objects • Less Skewed (10,20) (5,25) A (5,25) F (35,10) (10,20) B C D E G H I J (35,10)

  24. 6 13 11 Linear Quadtree : Space-Filling Curve • Quadtree but another representation • Linearization by Space-Filling Curve Hilbert Column-wise N-order Linearize points(or cells) by their peano-key

  25. Peano key = 1 0 0 1 Linear Quadtree • Example : N-order curve • Computation of Peano-Key : Bit-Interleaving 11 1. Binary representation of coordinates (10,01) 2. Bit-Interleaving x = 1 0 y = 0 1 10 01 00 = 9 00 01 10 11

  26. (X1max, X2max ) (X1min, X2min) MBR Methods • MBR (Minimum Bounding Box) • Two dimensional geometric simplification of objects • Not the Whole space, • only in the region occupied by objects • R-tree and its variants

  27. H I B C D E F G J K R-tree • Construction of R-tree : Sequence of Insertion • Upward Split R-tree B C E A H F G I D J K A Leaf node points to the disk page 2-D Objects

  28. New MBR Splitting in R-tree • Split MBR in the case of overflow • Line sweeping : Compare Cost-X and Cost-Y Splitting Line • Cost Measure • Area, • Perimeter • Overlapping Area

  29. C F G A B C D E F G J A H I B I H D E K K J Candidate Query Region W R-tree : Query Processing B C E H F I G D J K A Read its exact geometry from databaseCandidate Refinement Sample : http://www.dbnet.ece.ntua.gr/~mario/rtree/

  30. B E C E H F I G D J D K A C Strength of R-tree • For point and non-point Objects • Good for non-uniform distribution • Paged Tree • Hierarchical Structure but Balanced • Less Dead Space than Decomposition Methods

  31. M K J L G D A H E B I F C Query Region Weak Points of R-tree : Overlapping Area • Overlapping : False Matching A B G C L H J K D I K E F M False Matching : Visit unnecessary node Performance Degradation

  32. Query Region Weak Points of R-tree : Dead Space A B G C L H J D I E K F M At least one visit at this node (K) even though there is nothing

  33. Good Split Bad Split Weak Points of R-tree : Bad Split • 50:50 Split 1. Make them as COMPACT as possible 2. Preserve spatial proximity as possible

  34. Improvement of R-tree • Minimize • Overlapping area • Dead Space • Or Make it more COMPACT • Preserve Spatial Proximity • Two approaches • Packing (or Bulk Loading) • Good Split or Insertion Strategies

  35. Newly Inserted Object Delete and Re-Insert this R*-tree : An Improvement of R-tree • Re-Insertion Strategy on Overflow Overflow

  36. More Compact Re-Inserted Object R*-tree : An Improvement of R-tree • Re-Insertion Strategy on Overflow

  37. R*-tree : An Improvement of R-tree • R*-tree • Compact • Small Overlapping Area • Small Sum of MBR area or perimeters • Small Dead Space • Stable : Not very affected by the order of insertions • The most widely used spatial indexing method

  38. Packing R-tree : Improvement of R-tree • Preprocessing for making R-tree more compact • Hilbert R-tree • STR (Sort-Tile Recursive) • Uniformization • Instead of Sequential Insertions

  39. Hilbert Packing • Hilbert Curve • A Space Filling Curve • Linearize spatial objects by their peano-key N-order Hilbert Column-wise

  40. Hilbert Packing • Hilbert Packing • Sort objects by Hilbert key • Packing by round-robin way • Maximize storage utilization • Minimum Dead Space, and Sum of MBR area • Example: Bf =3

  41. STR (Sort-Tile Recursive) • Basic idea : “tile” the data space using vertical slices • r : number of rectangles • n : blocking factor • P ( leaf node page ) = Example Suppose r = 25, n =3 nTile = 9, nV = 3, nH = 3

  42. Large Objects Points Comparison : Hilbert Packing vs. STR HP STR HP STR

  43. Uniformization • Non-Uniform Distribution • Negative Effect on the performance • But in real applications : Non-Uniform • Uniformization Technique • Step 1 : Transform Non-Uniform data to Uniform by STR • Step 2 : Apply R-tree (or Fixed Grid) • Step 3 : Transform Query Region • Strength • High Storage Utilization • Very Simple and Good Performance

  44. Uniformization Equi-Width Non Equi-Width 1. Area of each cell : identical2. Number of objects within each cell : almost identical

  45. Uniformization : Example By Delaunay Triangulation By STR Original

  46. Uniformization : Example Original By STR

  47. Query Point Query Processing by R-tree : Nearest Neighbor Searching Space 2nd Distances in 2-D Minimum

  48. Query Processing by R-tree : Nearest Neighbor Branching Branching Pruning Minimum

  49. Transformation to Higher Space • Transformation to Higher Dimension • Transform non-point object to point object • Reuse of spatial indexing methods (e.g. Grid File) applicable only to point objects to non-point objects • Example Max C B B A  A C Amin Amax Min

  50. Corner Transformation • From 2-D to 4-D 1. Simplification by MBR 2. MBR ((Xmin, Ymin), (Xmax, Ymax)) to Point (Xmin, Ymin, Xmax, Ymax) (Xmax, Ymax) (Xmin, Ymin)

More Related