1 / 35

Guest Lecture Tonight

Guest Lecture Tonight. Room 205 in EBII (under Dr. Keoghs office) Starts at 6:10pm Attendance is mandatory Best behavior is mandatory Having an intelligent question is mandatory. Spatial Access Methods Dr Eamonn Keogh

ginger
Télécharger la présentation

Guest Lecture Tonight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Guest Lecture Tonight • Room 205 in EBII (under Dr. Keoghs office) • Starts at 6:10pm • Attendance is mandatory • Best behavior is mandatory • Having an intelligent question is mandatory

  2. Spatial Access Methods Dr Eamonn Keogh Computer Science & Engineering DepartmentUniversity of California - RiversideRiverside,CA 92521eamonn@cs.ucr.edu

  3. Mathematical Preliminaries I y What is the distance between the points Blue (1,1) and Red (5,7) ? The formula for the Euclidian distance between two points is… (5,7) (1,1) x

  4. Consider a classic database… Assume that record(2).Name returns “Joes Bugers” and record(1).Location returns (244, 365) etc.

  5. Given such a database we can easily answer queries such as • List all Mexican restaurants. • List all Grade A restaurants. • However, classic databases do not allow queries such as • List all Mexican restaurants within five miles of UCR • List the 5 pizza restaurants nearest to 91 and 60. • These kinds of queries are called spatial queries.

  6. One possible way to do spatial queries Algorithm One_Nearest_Neighbor_Query(location) best_so_far = infinity; for i = 1 to number_items_in_database retrieve record(i) from database; if distance(location,record(i).location ) best_so_far =distance(location,record(i).location ) NNpointer= i; end; end; disp(‘The nearest neighbor is’,record(NNpointer ).name ); end; But this does not work well because…

  7. …so we need to index the data My informal definition of indexing. Organizing the data such that queries can be answered in sub linear time. • Indexing some kinds of data is trivial… • We can index the names field simply by creating 26 files that have the names plus pointers back to the original database. Adams Fish House 23 Al’s Burgers 34 Al’s Pizza 1 Amom Tacos 45 Bills Sushi 56 Bobs Burgers 78 Bombs Away! 12 Brian’s eats 99 Can o Beans Cocos Champagnes Chris's Fish House

  8. So, we call always index 1-dimensional data (if you can sort it, you can index it), such that we can answer 1-nearest neighbor queries by accessing just O(log(n) ) of the database. (n is the number of items in the database). But we can’t sort 2 dimensional data… This is an important problem which has attracted a great deal of research interest…. We can in fact index 2 (and higher) dimensional reasonably efficiently with special data structures known as Spatial Access Methods (SAM) or Multidimensional Access Methods. Although there are many variations we will focus on the R-Tree, introduced by Guttman in the 1984 SIGMOD conference.

  9. We have seen the formula for the distance between two points. We will find it useful to have a formula for the distance between a point and the closest possible point within an Minimum Bounding Rectangle MBR… 1) Suppose we have a query point Q and one known point R Could any of the points in the MBR be closer to Q than R is? R = (1,7) Q = (3,5) MBR = {(6,1),(8,4)}

  10. We have seen the formula for the distance between two points. We will find it useful to have a formula for the distance between a point and the closest possible point within an Minimum Bounding Rectangle MBR… Suppose we have a query point Q and one known point B Could any of the points in the MBR be closer to Q than B is? Q = (3,5) MBR = {(6,1),(8,4)} B = (1,1)

  11. The formula for the distance between a point and the closest possible point within an MBR MBR = {(L.x,L.y)(U.x,U.y)} Q = (x,y) MINDIST(Q,MBR) if L.x < x < U.x and L.y < y < U.y then 0 elseif L.x < x < U.x then min( (L.y -y)2 , (U.y -y)2 ) elseif ….

  12. We can visualize the locations of interest simply as points in space...

  13. Q1 …then both nearest neighbor queries and range queries have a simple geometric interpretation… Imagine that each point has a ID pointer back to the original database… Q2 2 1 23

  14. We can group clusters of datapoints into “boxes”, called Minimum Bounding Rectangles (MBRs). Each MBR can be represented with just two points. The lower left corner, and the upper right corner. We can further recursively group MBRs into larger MBRs…. R1 R4 R2 R5 R3 R6 R9 R7 R8

  15. R10 R11 R12 …these nested MBRs are organized as a tree (called a spatial access tree or a multidimensional tree). R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 Data nodes containing points

  16. R10 {(1,0),(7,9)} R1 R2 R3 R12 {(1,3),(5,4)} {(2,0),(7,9)} {(1,1),(7,8)} At the leave nodes we have the location, and a pointer to the record in question. At the internal nodes, we just have MBR information. (5,1) 32 (1,4) 45 (5,6) 27 (7,8) 73 (3,4) 77 (1,3) 88 (2,3) 22 (5,4) 13 (2,2) 47 (3,0) 86 (7,9) 52 Data nodes

  17. We can define a function, MINDIST(point, MBR), which tells us the minimum possible distance between any point and any MBR, at any level of the tree. MINDIST(point, MBR) = 5 MINDIST(point, MBR) = 0

  18. R10 R11 R12 We can use the MINDIST(point, MBR), to do fast search.. MINDIST( , R10) = 0 MINDIST( , R11) =10 MINDIST( , R12) =17 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 Data nodes containing points

  19. R10 R11 R12 We can use the MINDIST(point, MBR), to do fast search.. MINDIST( , R1) = 0 MINDIST( , R2) = 2 MINDIST( , R3) =10 0 10 17 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 Data nodes containing points

  20. We now go to disk, and retrieve all the data objects whose pointers are in the green node. We measure the true distance between our query and those objects. Let us imagine two scenarios, the closest object, the “best-so-far” has a value of.. • 1.5 units (we are done searching!) • 4.0 units (we have to look in R2, but then we are done) R10 R11 R12 0 10 17 R10 R11 R12 0 2 10 R1 R2 R3 R4 R5 R6 R7 R8 R9 Data nodes containing points

  21. R10 R11 R12 A simplifying assumption to make coding the R-Tree easier We have seen that searching R_trees is easy! But building the MBRs looks like a non-trivial problem. I will allow a simplifying assumption to make it much easer to understand and code…

  22. 12,12 We have a collection of datapoints in 2 dimensions. Before we begin, we must decide on the max number of points per leaf MBR, lets say 3… 0,0

  23. 12,12 The definition of an MBR is the smallest (axis parallel) rectangle that can contain a set of points. We can represent the MBR with just 2 points, the lower left corner (L) and the upper right corner (U). MBR1 = {(L.x,L.y)(U.x,U.y)} MBR1 = {(1,1)(9.6,8.9)} 0,0

  24. This is the root of our R-Tree MBR = {(1,1)(9.6,8.9)}

  25. MBR1 = {(1,1)(9.6,8.9)} MBR4 = MBR2 = {(1,1)(4.3,3.95)} MBR3 = {(1,3.95)(4.3,8.9)} MBR4 MBR3 MBR2 MBR5

  26. MBR1 = {(1,1)(9.6,8.9)} MBR4 = MBR2 = {(1,1)(4.3,3.95)} MBR3 = {(1,3.95)(4.3,8.9)} MBR10 = {(1,3.95)(2.65,6.42)} MBR11 = {(1,6.42)(2.65,8.9)} MBR12 MBR11 MBR4 (1.1, 4.4) 44 (1.3, 4.1) 34 (1.5, 4.3) 12 MBR13 MBR10 MBR2 MBR5

  27. Under this definition of an R-tree (which is not the classic definition) • Every internal node has exactly 4 children. • Every MBR at a given level of the tree is mutually exclusive (a point cannot be in two MBRs at a given level). • Every leaf node is in one of these two classes • It does not cover any points, and therefore contains a null pointer. • It covers at least one point, and has a pointer to a file which contains those points, together with the primary key.

  28. Project Report Format • A new file on the webpage gives details on the format of your final report (the white binders)

  29. Title Page • Requirements Analysis. • Requirements Elicitation. • System Specification. • System Design: • Program Design: • Coding: • Unit and Integration Testing: • System Testing: • Acceptance Testing: • A Maintenance Plan: Assume that like the vast majority of software that is commissioned, your contract requires you to maintain the software. • Conclusions: Assume that you write this just before handing in your document. Summarize briefly what you did for you project. Address the following; what most surprised you about the process of creating the project. What would you do differently if you had to do it again? • References: Every, book, paper, webpage you used must be cited in a standard format. You could use the American Psychological format [1, 2], or the IEEE standard [3], or any other format so long as you are consistent and complete. You should also reference any standard template libraries used in your code. • Appendices: Including source code, printed in 2 “pages” per page format. You should also list a professional quality, one-page resume for each member of the group (use the same template for each). • Acknowledgements: Your chance to thank anyone who helped you complete the project. • Two (identical) Cd roms in a high quality plastic wallet: Containing source code, test data, all the weekly archives, and a readme.txt file that contains a brief (one line) description of all the files.

  30. How I grade the project…

  31. Grading Rubric for CS179. Database project. Group name (all last names): The project binder is in the correct basic format, and is neat. The format is consistent. The project is well documented (the students carefully referenced, the books, papers, URLs that they used)

  32. The students made some effort to do elicitation. The came up with interview questions in advance, they noted the answers to the questions. (optional, they also use some other elicitation technique ) The students made some effort to do requirements specification. The specification considers things like the minimum hardware requirements, usability factors, and user scenarios, privacy issues. The specification actually says what the system will do under normal scenarios, and how it will handle errors.

  33. The students actually designed the project (as opposed to going straight to coding) Design may include flowcharts or pseudocode or English descriptions. The code is documented, indented etc..

  34. The project was tested. The students demonstrated how they tested the project. There was unit testing and integration testing. The overall quality of the finished product General remarks. Tentative grade: (may be modified after second pass). Final grade: Special Circumstances: Students in same group get different grades because…

More Related