1 / 13

Chapter 3: Data Storage and Access Methods

Chapter 3: Data Storage and Access Methods. Title: The R* Tree: An Efficient and Robust Access Method for Points and Rectangles Authors: N. Beckmann, H. Kriegel, R. Schneider and B. Seeger Pages: 207-216. The R* Tree: An Efficient and Robust Access Method for Points and Rectangles. Problem

miles
Télécharger la présentation

Chapter 3: Data Storage and Access Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3: Data Storage and Access Methods • Title: The R* Tree: An Efficient and Robust Access Method for Points and Rectangles • Authors:N. Beckmann, H. Kriegel, R. Schneider and B. Seeger • Pages: 207-216

  2. The R* Tree: An Efficient and Robust Access Method for Points and Rectangles • Problem • Problem Statement • Why is this problem important? • Why is this problem hard? • Approaches • Approach description, key concepts • Contributions (novelty, improved) • Assumptions

  3. Problem Statement – R* Tree • Given • Data containing points and rectangles • Spatial queries (point, range query, insert, delete) • Find - An Access Method (Data Structure) • A hierarchical organization of rectangles • Example from wikipedia • Objectives • Efficiency of spatial queries • Constraints • Balanced tree • Each node is a disk page and has >= m (min # of entries) entries. • Root has at least two children unless it is a leaf • Efficiency metric = number of disk-pages accessed

  4. Why is this problem important? • Multi-dimensional Applications • Large geographic data. e.g., Map objects like countries occupy regions of non-zero size in two dimension. • Common real world usage: “Find all museums within 2 miles of my current location". • CAD • … • Many DBMS servers support spatial indices • Orcale, IBM DB2, …

  5. Why is this problem Hard? • B-tree split methods ineffective in 2-dimensions • Ex. Sorting • Size variation across data Rectangles • Large rectangles limit split options! • Non-uniform data distribution over space • Dynamic Access Method • Insertions and deletions • Overlapping directory rectangles => multiple search paths

  6. Novelty of Contribution • Related Work • Traditional one-dimensional indexing structures (e.g., hash, B-tree) are not appropriate for range search • B+ tree • Represents sorted data in a way that allows for efficient insertion and removal of elements. • Dynamic, multilevel index with maximum and minimum bounds on the number of keys in each node. • Leaf nodes are linked together as a linked list to make range queries easy. • R-tree • R-tree is a foundation for spatial access method • A complex spatial object is represented by minimum bounding rectangles while preserving essential geometric properties • Over-lapping regions • Heuristic: minimize the area of each enclosing rectangle in the inner nodes.

  7. Principles of R-tree • Height-balanced tree similar to a B-tree with index records in its leaf nodes containing pointers to data objects. • Heuristic Optimization: minimize the area of each enclosing rectangle in the inner nodes. Reference: A Guttman ‘R-tree a dynamic index structure for spatial searching’, 1984

  8. Performance Parameters beyond R-tree • (Q1) The area covered by a directory rectangle should be minimized. • (Q2) The overlap between directory rectangles should be minimized. • (Q3) The margin of a directory rectangle should be minimized. • (Q4) Storage utilization should be optimized. • Intuitions: • Reduce overlap between sibling nodes. • Reduce traversal of multiple branches for point query • Reinsert old data changes entries between neighboring nodes and thus decreases overlap. • Due to more restructuring, less splits occur

  9. Difference between R-tree and R*-tree • Minimization of area, margin, and overlap is crucial to the performance of R-tree / R*-tree. • The R*-tree attempts to reduce the tree, using a combination of a revised node split algorithm and the concept of forced reinsertion at node overflow. This is based on the observation that R-tree structures are highly susceptible to the order in which their entries are inserted, so an insertion-built (rather than bulk-loaded) structure is likely to be sub-optimal. Deletion and reinsertion of entries allows them to "find" a place in the tree that may be more appropriate than their original location.  Improve retrieval performance

  10. R1 R1 R2 R2 R5 R5 R4 R4 R3 R3 Example Preferred by R-tree R1 R2 R5 R4 R3 Preferred by R*-tree

  11. Validation Methodology • Methodology • Experiments with simulated workloads • Evaluation of design decisions • Results • R*-tree outperforms variants of R-tree and 2-level grid file. • R*-tree is robust against non-uniform data distributions.

  12. Summary • Paper’s focus • R*-tree – implementations and performance • Ideas • Heuristic Optimizations (pp. 208) • Reduction of area, margin, and overlap of the directory rectangles • Better Storage Utilization (pp 211) • Forced Reinsertion (splits can be prevented) • Experimental comparison • Using many data distributions

  13. Assumptions, Rewrite today • Assumptions • Indexing data in two-dimensional space • Bulk load and bulk reorganization not available • Concurrency control and recovery costs are negligible • Reinserts during split! • Rewrite today • Bulk-load of rectangles • Compare with newer methods • R+ tree (disjoint sibling), Hilbert-R-tree • Analytical results • Formally compare R*-tree with alternatives

More Related