1 / 30

CPSC 695 Week 6

CPSC 695 Week 6. Query Processing in Databases Dr. M. Gavrilova. Overview. Introduction I/O algorithms for large databases Complex geometric operations in graphical querying Applications. Introduction. Geometric algorithms studied before dealt with RAM

shalom
Télécharger la présentation

CPSC 695 Week 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 695 Week 6 Query Processing in Databases Dr. M. Gavrilova

  2. Overview • Introduction • I/O algorithms for large databases • Complex geometric operations in graphical querying • Applications

  3. Introduction • Geometric algorithms studied before dealt with RAM • In databases, a problem of accessing “pages” of memory stored on disk is encountered. • We will see how traditional algorithm design techniques can be useful.

  4. Example • 4 pages of memory, 10 items in each • To list all sequentially, 4 disks accesses is required • To randomly list items – up to 40 disk accesses is require if only 1 page is loaded in the memory once –too expensive!

  5. PART 1 Techniques for large data sets • External sorting • Distributed sweeping • Two-step processing

  6. “External” sorting problem • n pages in dataset on disk • m pages of memory, m < n

  7. “Divide and conquer” strategy • Step 1. Sorted “runs” of size m are created in memory, then written to disk. Used internal sorting algorithms. • Step 2. Load some number of first records from each run into memory, merge them in the sorted order. Once a block is sorted, write it back to disk. • Complexity: O(n logmn )

  8. Distributed sweeping Segment intersection problem, orthogonal segments. In RAM – sweep-line algorithm, O(n logn+k ) n – number of segments, k – number of intersections. In DB – O(n logmn + k ) algorithm, m – number of pages in RAM. Range query v Sweep-line

  9. Distributed sweeping • Idea: split space by m horizontal strips, each contains approximately n/m segments. • Active list is created for each strip: L1, L2, …, Lm . • When a vertical segment is met, it is tested against intersection with segment in active lists of strips that overlap with the segment. v 4 3 2 1

  10. end 4 3 middle part 2 end 1 Distributed sweeping • However, in the worst case, for all vertical segments all strips should be tested. • In the picture, segment v intersects strip 4, while no intersections are reported. • Solution: split each vertical segment into 3 parts: • One lies completely within some number of strips • Other two partially cover a strip. v

  11. Distributed sweeping • Test intersection between the vertical segment and all segments in “middle” strips • Then recursively do it for two “end” strips. • Recursion terminates when all processing can be carried out in RAM • O(n logmn + k )

  12. Rectangle intersection • The same idea is carried out to the case of rectangle intersection • Θ(n logmn + k ) bound is met again

  13. Two-step processing: Spatial Join • Spatial predicates: • Overlaps • Contains • Adjacent • etc. • 2 steps: • Filter step • Refinement step

  14. Additional Database Specifics • In databases: challenges with I/O (file access) are resolved using techniques discussed above. • Specific methods exist for: • Grid files (linear structures) • R-trees • Unindexed collections of objects

  15. r g r g PART 2 Computer Graphics Applications • DB operations: windowing and clipping • Windowing(g,r) is a Boolean operator: to test if object g intersects rectangle r. • Clipping (g,r) computes part of g inside r

  16. Computer Graphic Primitives • Windowing: • scan edges of g • test for intersection with r • checking vertices is not enough • O(n) • Clipping: • consider each edge of r as a half-plane • clip g against each of those • combine results • O(n)

  17. Computer Graphic Primitives • Polygon partitioning (for large data sets) • Polygon triangulation • Intersections (polyline, polygon)

  18. Polygon partitioning • Sort vertices of polygon P according to the X coordinate • Use sweep-line technique: vertical line L, for each vertex v compute the maximum vertical segment of L, internal to P and containing v. This is done by examining nearest edges above/below v. • The visibility segments define trapezoids. • Complexity O(n lg n) • Note: complex polygons can be triangulated, if trapezoids are further triangulated.

  19. Polygon partitioning • The visibility segments define trapezoids (geometric object with 2 parallel edges)

  20. Triangulation of a simple polygon • Triangulation involves finding diagonals within the polygon, i.e. segments vivj between vertices of P. • vi andvj are said to be visible to each other. • Each triangulation of a polygon has (n-3) diagonals and (n-2) triangles

  21. Triangulation of a monotone polygon • Idea: monotone polygons can be linearly triangulated. Simple polygon can be partitioned into monotone polygons. • Monotone Simple

  22. Triangulation of a monotone polygon • Idea: sweep-line, sort all vertices of P • If the angle between 3 previously processed points is convex  create a triangle, remove point from list L. • If reflex angle  add next point. • Partitioning a polygon into monotone polygons – similarly to trapezoidation, sweep-line by edges, find trapezoids, they represent monotone chains. O(n lg n)

  23. Convex partitioning • Convex partitioning – partitioning into convex components, can minimize the number of components, done in O(n).

  24. Geometric Relationships • Computing intersections: • Point in a polygon • Polyline intersection • Polygon intersection (general and convex)

  25. q p Point in a polygon (simple) • Draw a half-ray from p • Count # of intersections with the boundary • If odd  p is inside, even  outside • O(n) algorithm

  26. Polyline intersection • Given a set of line segments. Detect if any 2 segments intersect. • O(n2) – straightforward

  27. sweep Polyline intersection • Plane-sweep O(n lg n): • The line meets the leftmost point of S: S inserted in L, two neighboring segments below and above S are tested for intersection. • The line L meets the rightmost point of S: S is deleted, segments above and below S are tested for intersection.

  28. Polygon intersection • Two simple polygons P and Q. • Possible cases: • One edge of P intersects one edge of Q (use segment intersection test) • P is inside Q (point inside polygon) • Q is inside P (point inside polygon) • Otherwise, P and Q don’t intersect. • O(n lg n)

  29. Convex polygon intersection • Convexity allows to devise a faster O(n) algorithm. • Idea: synchronized scan of edges of P and Q, so that all intersection points are eventually found and “inner” intersection boundary is known at each step. • Scanned edges are advanced if they “point” at each other.

  30. Summary • Dealing with large data sets requires additional resources • Some methods such as below can be useful: • External sorting • Distributed sweeping • Two-step processing • Other applications (spatial map querying) require computer graphics primitives • Various intersection operations exist

More Related