1 / 33

SD-Rtree: A Scalable Distributed Rtree

SD-Rtree: A Scalable Distributed Rtree. Witold Litwin & Cédric du Mouza & Philippe Rigaux. Plan. Introduction SDDS R-tree SD-Rtree Evolution Balancing Spatial Rotations Overlapping Redundant Coverage Queries Performance Conclusion. SDDS Principles (1993).

Télécharger la présentation

SD-Rtree: A Scalable Distributed Rtree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux

  2. Plan • Introduction • SDDS • R-tree • SD-Rtree Evolution • Balancing • Spatial Rotations • Overlapping • Redundant Coverage • Queries • Performance • Conclusion

  3. SDDS Principles (1993) • Data are at server nodes • Communicating through point-to-point messaging ; • Overloaded servers split over new servers • Queries go to client nodes use local images of the SDDS • No central addressing component • A node can be client and server (peer)

  4. SDDS Principles (1993) • An outdated image may send a query an incorrect server • Servers forward such a query to the correct server • Image gets adjusted • Image Adjustment Message (IAM) comes back • Client does not repeat the same error twice • Data are basically in the RAM of the servers

  5. SD-Rtree : a Spatial SDDS Distributed Spatial Data

  6. SD-Rtree : a Spatial SDDS • Distributed Index • No central component

  7. SD-Rtree : a Spatial SDDS • Point & Window Queries • kNN queries (future)

  8. SD-Rtree : Generalizes R-tree • R-tree: • Nodes are minimal bounding boxes • Leaf nodes point to data • Internal nodes bound subtrees • May overlap • Split when overflow • Generate balancedm-ary tree

  9. SD-Rtree : Generalizes R-tree • R-tree: • An insert may go through multiple paths • Ends up in the smallest bounding box • If there is any • One of the boxes gets enlarged • Box may split

  10. SD-Rtree : Generalizes R-tree • R-tree: • Search may go through multiple paths • All paths may bring relevant objects

  11. SD-Rtree: a Balanced Binary Tree • The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: • Each internal node (or routing node) has exactly two sons • Each leaf node stores a subset of the indexed dataset • At each node, the height of the subtrees differ by at most one • Each server stores one data node and one routing node

  12. Sd-tree: Binary Tree Structure • di = data node (leaf) • ri = routing node (internal node)

  13. Sd-tree: Tree Distribution

  14. SD-Rtree Balancing • The binary tree should be height-balanced • The heights of the two subtrees rooted at any node should not differ by more than 1 (cf. AVL trees) • The tree height is then logarithmic in the number of leaves

  15. SD-Rtree Balancing • SD-Rtree balancing occurs during splits • Messages are sent bottom-up to adjust the height of the ancestor nodes • Rotation occurs if an ancestor is imbalanced • SD-Rtree rotation are spatial • change rectangles of internal nodes • Best rotation minimizes rectangle overlapping • Tie breaking minimizes the « dead space »

  16. Properties The sons of a node are not ordered => more freedom for reorganizing the tree Any imbalanced node matches a rotation pattern A rotation pattern is a subtree a(b(e(f,g),d),c) such that: h(c) = h(d) = h(f ) = n − 1 (n > 0) h(g) = max(0, n − 2) Rotation Pattern

  17. SD-Rtree :Spatial Rotation

  18. Rotation Cost • Constant number of messages (3 or 6, depending on the choice) • Few rotations in practice • In particular when the dataset is uniformly distributed • See our experiments

  19. SD-Rtree : Images • Each image defines the addressing structure • Resides as cache on a client or on a peer • Starts with the address of the contact server • IAMs make it a subtree • Splits make images outdated • IAMs adjust it incrementally

  20. Image Adjustment • Client contacts a server with a query • Each incorrect server initiates a traversal of the tree • During the traversal, the description of the nodes is collected • The correct server sends the up-to-date tree structure • The client updates its image

  21. Out-of-range situation

  22. Insertion of objects

  23. Overlapping management • The directory rectangles in an Rtree may overlap • Local subtree does not suffice for locating all the nodes that contains the point (point query) or the window (window query) searched for. • SD-Rtree servers maintain data on node overlapping • Redundant Coverage • It avoids to systematically access the root node.

  24. Redundant Coverage • Example • The region common to A andB is stored on both nodes • If a point query sent to A falls in the region shared with B: A sends a point query message to B • For D: we must keep the intersection with C or B: here empty.

  25. Queries • Point queries and window queries. The technique is similar to the insertion algorithm: • Search in the client image a server whose mbb contains the point or intersects the window • Send the query to this server • If the server actually covers the point or the window; it answers to the client; else it sends the query to its parent node • A server uses the overlapping information to transmit the query

  26. Experiments • Synthetic data (points and rectangles) generated with GSTD • 50.000 to 500.000 objects • 0 to 3.000 queries • Server capacity: 3 000 objects • Comparison of three SD-Rtree variants: • BASIC: no image; every query is processed top-down from the root • IMSERVER: no IAMs among the servers • IMCLIENT: client images

  27. Per Insert Cost

  28. Cost of balancing

  29. Image convergence

  30. Distribution of messages

  31. Cost per Query

  32. Conclusion • SD-Rtree is an efficient scalable distributed Rtree • For very large spatial data collections • Can be processed in distributed RAM • Access time much faster than to disk data • Load balancing • Spatial rotations • Overlapping management • Redundant coverage • O(log n) worst insert cost • Future work • kNN-queries • Objects distribution balancing on servers

  33. SD-Rtree Thank You for Your Attention Questions: First.Last@dauphine.fr

More Related