220 likes | 336 Vues
This document presents a comprehensive overview of R-Trees, a dynamic index structure optimized for spatial searching. The R-Tree excels in multi-dimensional data handling, making it essential for applications in GIS, CAD, and location-based services (LBS). We explore R-Tree properties, including node structure, insertion algorithms, and search optimizations, which support efficient access to complex spatial datasets. The challenge of overlapping minimum bounding rectangles (MBRs) during searches is also addressed, and the document reviews query optimization techniques for multiway spatial joins.
E N D
Graduate CourseSpatial Data 한국기술대학교 민준기
Spatial Data • Traditional Data • Single Dimension • value, text • New Application • GIS, • CAD • LBS • Multimedia Data • Multi-dimensional Data
Spatial Access Method(SAM) • Support efficient access of Spatial Data • B-Tree • Only one dimensional Data • Not appropriate to multi-dimensional Data • One of famous spatial indexes • R-Tree
R-Trees : A Dynamic Index Structure for Spatial Searching • R-Tree • A Height-balanced Tree with index records in its leaf nodes containing pointers to data objects. • Dynamic structure: inserts and deletes can be intermixed with searches and no periodic reorganization is required.
R1 A1 A2 a3 a4 a1 a2 R-Trees : A Dynamic Index Structure for Spatial Searching • R-Tree • It is difficult to handle pure spatial data • Based On MBR (minimum bounding rectangle) approximation A2 a3 A1 a4 a1 a2
R-Tree Structure • Node = (E1,… ,EM) • Ei = (I, pointer) where I = (I0,..,Id) , d is dimension and Ij = [a,b] • Let M be the maximum number of entries, and m <= M/2 be the minimum number of entries of a node
Property of R-tree • Every leaf Node contains between m and M index record unless it is the root. • For each index record (I, pointer) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple. • Every non-leaf node has between m and M children unless it is the root. • For each entry (I, pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node. • The root node has at least two children unless it is a leaf. • All leaves appear on the same level.
Property of R-Tree • The height of an R-Tree containing N index records is at most [log_mN]-1 • The maximum number of nodes is [N/m]+[N/m^2]+...+1 • Worst case space utilization for all nodes except root node is m/M. #of leaf nodes
R-Tree Search • Due to the overlap of MBRs, many index nodes may be visited. Search(MBR) if(leaf node){ check all entries in this node which overlap MBR }else{ for each childnode nx which overlap MBR nx.seach(MBR) }
R-Tree Insertion • Algorithm Insertion (newMBR) • Find position for new record • ChooseLeaf Call to select a leaf node • Add record to leaf node • If full, SplitNode call • Propagate changes upward • AdjustTree • Grow tree taller
R-Tree Insert • Algorithm ChooseLeaf CL1 Set N to be a root CL2 If N is a leaf return N else Choose the entry in N whose rectangle needs least area enlargement to include the new data. Resolve ties by choosing the entry with the smallest rectangle CL3 Set N to be the childnode pointed to by the childpointer of the chosen entry. CL4 Repeat CS2.
R-Tree Insert • If there is no room invokes SplitNode • Splite MBR to minize the MBR size • Optimal SpliteNode -> cases that make two subset with M+1entries-> O(2M-1) bad good
R-Tree Insert • Approximation (see details) • Quadratic (O(M2)) • Linear • Select two entries whose lengh are fartest • Insert Remains intp groups
R-Tree Insertion • Adjust covering rectangles and propagating nodes splits as necessary • Ascend from leaf node L to the root AdjustTree Algorithm • [Initialize] N = L • [Check if done] if N is root, stop • [Adjust covering rectangle in parent entry] • Let P be the parent of N, E_N be N’s entry of P • Modify E_N MBR to enclose all MBRS in N. • [Propagate node split upward] • If N has a partnet NN resulting from an earlier split, • Create a new entry E_NN and add E_NN to P • If P has no room, invoke SplitNode • [Move up to next node] • Set N= P and NN= PP, goto step 2.
Processing and Optimization of Multiway Spatial Joins Using R-trees • Cost Based Query Optimizer • Join Selectivity • probability that a tuple is result • best efficient query execution plan generate • Spatial Join Selectivity • Multi-dimension attribute • commonly 2dimension • In this work, focus computation the cost of filer Step(= consider only MBR)
Previous Work • Assumption • [0,1)d • d-dimensional work space • data is uniformly distributed • each dimension is independent
q qy qx Previous Work • Window Query • find all points include window q • S(q) =|qi|d |qi| = size of q of dimension i
(|Sa,y|+|Sb,y|) (|Sa,x|+|Sb,x|) Previous Work • 2-Way Join Query • find Ra interset Rb S(Ra,Rb) = (|Sa|+ |Sb|)d (where |Si| = average size of Ri on one dimension d = dimension)
|Sa| |Sb| |Sc| Previous Work • M-Way Linear Queries(Acyclic Queries) • Ra intersect Rb and Rb intersect Rc S(Ra,Rb,Rc) = (|Sa|+ |Sb|)d (|Sb|+ |Sc|)d • Generalization ∏ (|Si|+|Sj|)d ∀i,j:Q(i,j) = TRUE
R1 R2 S1 S2 S3 R3 Previous Work • M-Way Clique Join Query(M≥3) • Papadias, Mamoulis, Theodoridis(ACM PODS99) • Clique: if a set of rectangles mutually intersect, then they must share a common area Query graph Spatial relationship
s1 s1 s1 s2 s2 s2 Previous Work • Common Area(qn) • Proof(by induction): 확률: 대표값 : |s1|
Previous Work • Selectivity of M-Way Clique Join Query Prob(s2 interset s1)*Prob(s3intersect s1∧s3 intersect s2|s1 s2 mutually intersect) = Prob(s2 intersect s1)*Prob(s3 intersects common intersection area of s1 s2) • General Case: