Efficient Spatial Data Access Using R-Trees for Multi-Dimensional Queries

Graduate CourseSpatial Data 한국기술대학교 민준기

Spatial Data • Traditional Data • Single Dimension • value, text • New Application • GIS, • CAD • LBS • Multimedia Data • Multi-dimensional Data

Spatial Access Method(SAM) • Support efficient access of Spatial Data • B-Tree • Only one dimensional Data • Not appropriate to multi-dimensional Data • One of famous spatial indexes • R-Tree

R-Trees : A Dynamic Index Structure for Spatial Searching • R-Tree • A Height-balanced Tree with index records in its leaf nodes containing pointers to data objects. • Dynamic structure: inserts and deletes can be intermixed with searches and no periodic reorganization is required.

R1 A1 A2 a3 a4 a1 a2 R-Trees : A Dynamic Index Structure for Spatial Searching • R-Tree • It is difficult to handle pure spatial data • Based On MBR (minimum bounding rectangle) approximation A2 a3 A1 a4 a1 a2

R-Tree Structure • Node = (E1,… ,EM) • Ei = (I, pointer) where I = (I0,..,Id) , d is dimension and Ij = [a,b] • Let M be the maximum number of entries, and m <= M/2 be the minimum number of entries of a node

Property of R-tree • Every leaf Node contains between m and M index record unless it is the root. • For each index record (I, pointer) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple. • Every non-leaf node has between m and M children unless it is the root. • For each entry (I, pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node. • The root node has at least two children unless it is a leaf. • All leaves appear on the same level.

Property of R-Tree • The height of an R-Tree containing N index records is at most [log_mN]-1 • The maximum number of nodes is [N/m]+[N/m^2]+...+1 • Worst case space utilization for all nodes except root node is m/M. #of leaf nodes

R-Tree Search • Due to the overlap of MBRs, many index nodes may be visited. Search(MBR) if(leaf node){ check all entries in this node which overlap MBR }else{ for each childnode nx which overlap MBR nx.seach(MBR) }

R-Tree Insertion • Algorithm Insertion (newMBR) • Find position for new record • ChooseLeaf Call to select a leaf node • Add record to leaf node • If full, SplitNode call • Propagate changes upward • AdjustTree • Grow tree taller

R-Tree Insert • Algorithm ChooseLeaf CL1 Set N to be a root CL2 If N is a leaf return N else Choose the entry in N whose rectangle needs least area enlargement to include the new data. Resolve ties by choosing the entry with the smallest rectangle CL3 Set N to be the childnode pointed to by the childpointer of the chosen entry. CL4 Repeat CS2.

R-Tree Insert • If there is no room invokes SplitNode • Splite MBR to minize the MBR size • Optimal SpliteNode -> cases that make two subset with M+1entries-> O(2M-1) bad good

R-Tree Insert • Approximation (see details) • Quadratic (O(M2)) • Linear • Select two entries whose lengh are fartest • Insert Remains intp groups

R-Tree Insertion • Adjust covering rectangles and propagating nodes splits as necessary • Ascend from leaf node L to the root AdjustTree Algorithm • [Initialize] N = L • [Check if done] if N is root, stop • [Adjust covering rectangle in parent entry] • Let P be the parent of N, E_N be N’s entry of P • Modify E_N MBR to enclose all MBRS in N. • [Propagate node split upward] • If N has a partnet NN resulting from an earlier split, • Create a new entry E_NN and add E_NN to P • If P has no room, invoke SplitNode • [Move up to next node] • Set N= P and NN= PP, goto step 2.

Processing and Optimization of Multiway Spatial Joins Using R-trees • Cost Based Query Optimizer • Join Selectivity • probability that a tuple is result • best efficient query execution plan generate • Spatial Join Selectivity • Multi-dimension attribute • commonly 2dimension • In this work, focus computation the cost of filer Step(= consider only MBR)

Previous Work • Assumption • [0,1)d • d-dimensional work space • data is uniformly distributed • each dimension is independent

q qy qx Previous Work • Window Query • find all points include window q • S(q) =|qi|d |qi| = size of q of dimension i

(|Sa,y|+|Sb,y|) (|Sa,x|+|Sb,x|) Previous Work • 2-Way Join Query • find Ra interset Rb S(Ra,Rb) = (|Sa|+ |Sb|)d (where |Si| = average size of Ri on one dimension d = dimension)

|Sa| |Sb| |Sc| Previous Work • M-Way Linear Queries(Acyclic Queries) • Ra intersect Rb and Rb intersect Rc S(Ra,Rb,Rc) = (|Sa|+ |Sb|)d (|Sb|+ |Sc|)d • Generalization ∏ (|Si|+|Sj|)d ∀i,j:Q(i,j) = TRUE

R1 R2 S1 S2 S3 R3 Previous Work • M-Way Clique Join Query(M≥3) • Papadias, Mamoulis, Theodoridis(ACM PODS99) • Clique: if a set of rectangles mutually intersect, then they must share a common area Query graph Spatial relationship

s1 s1 s1 s2 s2 s2 Previous Work • Common Area(qn) • Proof(by induction): 확률: 대표값 : |s1|

Previous Work • Selectivity of M-Way Clique Join Query Prob(s2 interset s1)*Prob(s3intersect s1∧s3 intersect s2|s1 s2 mutually intersect) = Prob(s2 intersect s1)*Prob(s3 intersects common intersection area of s1 s2) • General Case:

Efficient Spatial Data Access Using R-Trees for Multi-Dimensional Queries

Efficient Spatial Data Access Using R-Trees for Multi-Dimensional Queries

Presentation Transcript

Spatial Data Infrastructure

Spatial Data Analysis

Spatial Data What is special about Spatial Data?

Spatial Data Analysis

Spatial Data

Spatial Data Analysis

Spatial Data Analysis

Spatial Data

Graduate Course Characteristics

Spatial Data Diversity

Spatial data Visualization spatial data Ruslan Bobov

Spatial Data Formats

Spatial Data Structures

Spatial Data Management

Editing Spatial Data

Spatial Data Analysis: Course Outline

Spatial Data Formats

Spatial data models

Spatial Data Custodianship

Spatial Data What is special about Spatial Data?

Indexing Spatial Data

Spatial Data Analysis