1 / 10

Ch. 16: Sweep-Zones

Ch. 16: Sweep-Zones. Basic Question: Is it possible to compute nearest neighbors in expected time O(n*log(n)) ??? Basic Idea: Generalize sweep-lines to sweep-zones !!! Def.: The sweep-zone SZ of an area is the set of regions touching the upper boundary of an area from below.

ismail
Télécharger la présentation

Ch. 16: Sweep-Zones

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch. 16: Sweep-Zones Basic Question: Is it possible to compute nearest neighbors in expected time O(n*log(n)) ??? Basic Idea: Generalize sweep-lines to sweep-zones !!! Def.: Thesweep-zone SZ of an area is the set of regions touching the upper boundary of an area from below. R. Bayer, Ch. 16, DWH-SS2000

  2. UB-Tree Insertion 18/19 1 3 6 7 8 4 2 9 10 6 5 15 11 16 12 17 18 13 14 R. Bayer, Ch. 16, DWH-SS2000

  3. Sweep-Zone Algorithm 1: i { Z-regions have been read in increasing Z-order up to region Ri-1, i.e. area(R i-1) with upper boundary B(R i-1) } { set of cached regions C(R i) is the set of regions in SZi-1 = SZ(area(R i-1)) plus region Ri} 1. for every point p  Ri let l(p) and h(p) be the lower and higher neighbor of p on Z-curve, compute l(p) and h(p). 2. let q = l(p) if dist(p,l(p)) < dist (p, h(p)) = h(p) otherwise 3. Let Q(p) be the query box with center p and side length 2*dist(p,q) q p R. Bayer, Ch. 16, DWH-SS2000

  4. 4. Retrieve Q(p) from cache or disk and compute the nearest neighbor (p) { Note: retrieval of Q(p) should take time O(log n), finding (p) should be nearly constant } 5. Cache regions intersecting Q(p) to enforce linear I/O time 6. If Ri was the last region in Z-order then exit 7. Release all regions from C(Ri) which are not in SZi 8. i:= i+1;read next region R i in Z-order; 9. Goto step 1 { all nearest neighbors are known, now cluster } R. Bayer, Ch. 16, DWH-SS2000

  5. Sweep-Zone Algorithm 2: Basic Idea: run algorithm forward to compute lower (w.r. to Z-order) nearest neighbor (p) of p and backward to compute upper (w.r. to Z-order) nearest neighbor (p) of p, then (p) = closest of {(p), (p)} i.e. modify step 4 in Sweep-Zone algorithm 1 to compute Q(p)  area(Ri) Advantages: all pages are read in increasing or decreasing Z-order only (sequential reads) and cache requirements are smaller Disadvantage: data must be read twice, tradeoff??? R. Bayer, Ch. 16, DWH-SS2000

  6. Cache Contents for Algorithm 2: 1109 8 7 6 5 2 1 11 10 6 5 3 2 112 11 10 6 4 3 2 13 5 4 3 214 6 5 4 3 15 7 6 5 4 316 15 14 12 10 8 7 6 5 4 17 16 15 14 12 9 8 7 6 5 41817 16 R. Bayer, Ch. 16, DWH-SS2000

  7. Cache Modification 1. Determine extension of next region to be read using upper part of UB-index 2. Determine regions that can be released, i.e. SZi - SZi-1 3. Release regions from cache 4. Read next region, i.e. transfer it from disk to cache R. Bayer, Ch. 16, DWH-SS2000

  8. Observations: expected cache size ~ 1.5 * sqrt (18) = 6.4 maximaloccurring cache size = 6 average cache size = 4.28 Cache Organization: keep cache organized as a set of regions sorted in Z-order, e.g. AVL-tree with elementary operations append single element and delete set of elements R. Bayer, Ch. 16, DWH-SS2000

  9. Open Questions: • which algorithm is faster • which algorithm requires less resources • what are the tradeoffs between I/O, cache size, CPU-time, total time, etc. • analytic comparison of both algorithms? R. Bayer, Ch. 16, DWH-SS2000

  10. Algorithm 3 this is a local optimization of Algorithm 2: if Q(p)  area (Ri) then (p) = (p) and we can ignore the computation of (p) in the backward phase Algorithm 4 if (p) = (p) then discard p entirely from the backward phase, i.e. reduce the amount of data and computations for the second phase, but then we have to write out the non-discarded points Open Question: under what conditions is Algorithm 4 better than Algorithm 3? R. Bayer, Ch. 16, DWH-SS2000

More Related