Creating High Performance Spatial Databases with SQL Server 2008
Creating High Performance Spatial Databases with SQL Server 2008. Alastair Aitchison. About Me. Consultant, Trainer, Author, and Housedad. Session Plan. Do you need geometry and geography ? Constructing a spatial index (the theory ) Filtering spatial query results (the practice )
Creating High Performance Spatial Databases with SQL Server 2008
E N D
Presentation Transcript
Creating High Performance Spatial Databases with SQL Server 2008 Alastair Aitchison
About Me • Consultant, Trainer, Author, and Housedad
Session Plan • Do you need geometry and geography? • Constructing a spatial index (the theory) • Filtering spatial query results (the practice) • Optimising spatial queries
The “Two-Column” Model CREATE TABLE Customers ( Name varchar(32), Address varchar(255), Lat float, Long float );
Point-in-Polygon SELECT * FROM Customers WHERE Lat BETWEENLtMinANDLtMax AND Long BETWEENLnMinANDLnMax (LtMax, LnMax) (LtMin, LnMin)
Calculating Distance SELECT 3963.0 * ACOS( SIN(Lat1) * SIN(Lat2) + COS(Lat1) * COS(Lat2) * COS(Lon2 - Lon1) ) (Lat2, Lon2) (Lat1, Lon1)
“Two-Column” Model Limitations • Only stores points • Calculations on flat plane or perfect sphere • Limited range of methods
SQL Server 2008 • Points, Linestrings, Polygons • Accurate calculations • Ellipsoid model (geography) • Flat plane (geometry) • Full complement of spatial methods • Intersects, Contains, Crosses, Touches • Distance, Length, Area • DE-9IM
Do You Need geometry / geography? • Not all “spatial” apps need spatial datatypes • Example: Store locator
Session Plan a • Do you need geometry and geography? • Constructing a spatial index (the theory) • Filtering spatial query results (the practice) • Optimising spatial queries
Querying geometry and geography • SELECT * WHERE A.STIntersects(B) = 1 • Primary Filter (Based on Index) • Approximate • Fast • Superset of actual results • Secondary Filter (Based on Table) • Refine results of primary filter • Accurate
Assigning Order to Spatial Data • B-Tree indexing for linearly ordered data • decimal, float, money etc. – numeric order • char, varchar, nvarchar etc. – alphabetic order • datetime, date, time etc. – chronological order • How do we assign order to spatial data?
From Grid to Index • Covered, partially covered, or touched cells • Maximise accuracy - Minimise index size • Three Rules • Covering Rule • Deepest-Cell Rule • Cells Per Object Rule
Covering Rule “If a grid cell is completely covered by a geometry, don’t further subdivide that cell.”
Deepest-Cell Rule “Once a cell has been subdivided, only store the intersecting cell(s) at the deepest grid level.”
Cells Per Object Rule “If subdividing a cell would exceed the maximum allowed number of cells for each object, do not subdivide the cell.”
Session Plan a • Do you need geometry and geography? • Constructing a spatial index (the theory) • Filtering spatial query results (the practice) • Optimising spatial queries a
Creating a Spatial Index I CREATE TABLE Grid ( id char(1), shape geometry, CONSTRAINT [idxGridCluster] PRIMARY KEY CLUSTERED ( id ASC ) );
Add Some Points To The Table INSERT INTO Grid VALUES ('A', geometry::Point(0.5, 2.5, 0)), ('B', geometry::Point(2.5, 1.5, 0)), ('C', geometry::Point(3.25, 0.75, 0)), ('D', geometry::Point(3.75, 2.75, 0));
Creating a Spatial Index II CREATE SPATIAL INDEX idxGrid ON Grid(shape) USING GEOMETRY_GRID WITH ( BOUNDING_BOX = (0, 0, 4096, 4096), GRIDS = ( LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM, LEVEL_3 = MEDIUM, LEVEL_4 = MEDIUM), CELLS_PER_OBJECT = 16); -- Each L1 cell is 512 x 512-- Each L2 cell is 64 x 64-- Each L3 cell is 8 x 8-- Each L4 cell is 1 x 1
Grid Level 4 D A B C
Finding Intersecting Points DECLARE @Polygon geometry = 'POLYGON ((1.5 0.5, 3.5 0.5, 3.5 2.5, 1.5 2.5, 1.5 0.5)) '; SELECT * FROM Grid WHERE shape.STIntersects(@Polygon) = 1;
sp_help_spatial_geometry_index EXEC sp_help_spatial_geometry_index @tabname = Grid, @indexname = idxGrid, @verboseoutput = 1, @query_sample = 'POLYGON ((1.5 0.5, 3.5 0.5, 3.5 2.5, 1.5 2.5, 1.5 0.5))';
4 Number_Of_ObjectCells_In_Level4_In_Index D A B C
9 Number_Of_ObjectCells_In_Level4_For_QuerySample
Compare the Grid Cells D A B C
25% Percentage_Of_Rows_NotSelected_By_Primary_Filter D A B C
3 Number_Of_Rows_Selected_By_Primary_Filter D A B C
33 Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter D A B C
2 Number_Of_Times_Secondary_Filter_Is_Called D A B C
2 Number_Of_Rows_Output D A B C
66 Primary_Filter_Efficiency D A B C
50 Internal_Filter_Efficiency D A B C
Session Plan a • Do you need geometry and geography? • Constructing a spatial index (the theory) • Filtering spatial query results (the practice) • Optimising spatial queries a a
Making Sure the Index Is Used • Use a Supported Method • STIntersects() • STContains(), STWithin(), STTouches() • STDistance() • Filter() • Syntax must be A.STIntersects(B) = 1 • Upgrade to SP1 • Use a HINT where necessary
Making the Index Effective • Three possible outcomes: • Preselection (Internal Filter) • Discarding (Primary Filter) • Secondary Filter • Adjust Index Settings to fit data in the column and typical query samples
Improving Performance • Make bounding box as tight as possible • Grid Resolution ↑ ... Cells Per Object ↑ • Multiple Indexes (may need HINT) • Use non-spatial predicates • Reduce unnecessary detail • Experiment!
Want To Know More? Beginning Spatial with SQL Server 2008 MSDN Spatial Forum http://social.msdn.microsoft.com/ Forums/en-US/sqlspatial/threads alastair@beginningspatial.com
A Practical Demonstration • Geonames export • 6.9 million points • Search for those in Newport • Without Index: ~100 rows. 12,391,230 secs • With Index: ~100 rows. < 1 sec