SQL Server Indexing for Developers

SQL Server Indexing for Developers Greg Linwood Solid Quality Learning greg@SolidQualityLearning.com

About Me • Live in Melbourne, Australia • Director of SQL Servants • Director of Solid Quality Learning • Microsoft SQL Server MVP since 2003 • Australian SQL Server User Group • Using SQL Server since 1993

Agenda • The Dilemma of SQL • SQL set based logic vs serial execution • SQL Server data caching infrastructure 101 • SQL Server indexing tools • Table storage formats • Clustered Indexes (index organised) • Heaps (non-index organised) • Non-Clustered Indexes • When implemented on Heaps • When implemented on Clustered Indexes • Included Columns • Matching indexes to queries • Designing Covering Indexes • Limits of covering indexes • Analysing index usage via Execution Plans • Discussion

The Dilemma of SQL • SQL is a simple & easy language to learn • Developer concentrates only on WHAT data is being accessed & manipulated • Totally disconnected from HOW the DBMS executes the commands • DBMS hides Cost Based Optimisation process from developers • Optimisation process is largely undocumented • Developers have to “second guess” how it works • Developers have enough to learn already!

The Good News! • A little knowledge of index mechanisms… • A few easy to follow rules… • Can help you solve the majority of query tuning problems with indexes

Le Table Physical Memory (RAM) 8kb Buffer Page Data volume (HDD) Write ahead log (TLOG) SQL Server data caching infrastructure 101… Select * from authors where au_lname = ‘White’ update authors set au_fname = ‘Johnston’ where au_lname = ‘White’ au_id au_lname au_fname phone address city state 172-32-1176 White Johnson 408-496-7223 10932 Bigge Rd. Oakland CA update authors set au_fname = ‘Marj’ where au_lname = ‘Green’ UPDATE Data Cache Proc Cache MTL Data Cache Proc Cache UPDATE

Indexing Tools – Clustered Index Table rows stored in physical order of clustered index key column/s – CustID in this case. • Physical ordering of table row storage enforced • “Physical” meaning physical database model, not “on disk” CIX also provides b-tree lookup pages, similar to a regular index • Table rows stored in leaf level of clustered index, in order of index column/s (key/s) • Default table storage format for tables WITH a primary key • B-Tree index nodes also created • Each level contains entries based on the first row in pages from lower level Can only have one CIX per table (as table storage can only be sorted one way) • Query execution example: Select FName, Lname, PhNo from Customers where CustID = 23

Indexing Tools – Heap No physical ordering of table rows • No physical ordering of rows Scan cannot complete just because a row is located. Because data is not ordered, scan must continue through to end of table (heap) • New rows simply added to last page • NO B-Tree index nodes (not really an “index” • Unless other indexes added, only option is to scan table No b-tree with HEAPs, so no lookup method available unless other indexes are present. Only option is to scan heap • Query execution example: Select FName, Lname, PhNo from Customers where Lname = ‘Smith’

Indexing Tools – Non-Clustered Indexes • NCIXs are “real” indexes, rather than table storage structures • Implemented differently, depending on whether the base table is stored on a heap or a clustered index. • Nearly always more efficient for queries than CIXs • Both for “seeks” and “range scans” • Read further about this topic on my blog: • http://blogs.sqlserver.org.au/blogs/greg_linwood/archive/2006/09/25/387.aspx • Can only be 900 bytes & up to 16 columns “wide” • SQL 2005 allows “wider” NCIXs via new “Included Columns” feature • On SQL 2000, any queries that require wider indexes need a good CIX

Indexing Tools – NCIXs on Heaps (1st of 3) • create nonclustered index ncix_lname on customers (lname) • B-tree structure contains one leaf row for every row in base table, containing index columns, sorted by index column values. • Each row contains a “RowID”, which is an 8 byte “pointer” to the heap storage page • (RowID actually contains File, Page & Slot data) • Leaf pages “chained” via doubly linked list for intra index scan • Query execution example: Select Lname from Customers where Lname = ‘Smith’

Indexing Tools – NCIXs on Heaps (2nd of 3) • create nonclustered index ncix_lname on customers (lname) • Previous example “covered” the query. • Where index does NOT cover query, RowID lookups performed to obtain values for non-indexed columns • Query execution example: Select Lname, Fname from Customers where Lname = ‘Smith’ • Very important to “cover” queries where performance is critical • Impact or RowID lookups is far worse with clustered index “Bookmark Lookups” (covered next)

Indexing Tools – NCIXs on Clustered Indexes (1st of 3) • create nonclustered index ncix_lname on customers (lname) NCIX contains CIX keys in leaf pages for Bookmark lookups • B-tree structure contains one leaf row for every row in base table, containing index columns, sorted by index column values. (same as when NCIX is on a heap) • Instead of a RowID, each row’s clustered index “key” value is stored in the index leaf level instead. • This means RowID bookomarks cannot be performed (as RowID is not available). Instead, bookmark lookups are performed, which are considerably more expensive • Leaf pages “chained” via doubly linked list for intra index scan Bookmark Lookup • Query execution example: Select Lname, Fname from Customers where Lname = ‘Plumb’ • Bookmark lookups seriously degrade performance where many rows qualify for results

Indexing Tools – NCIXs on Clustered Indexes (2ndof 3) • create nonclustered index ncix_lname on customers (lname, fname) NCIX now “covers” query because all columns named in query are present in NCIX • B-tree structure contains one leaf row for every row in base table, containing index columns, sorted by index column values. (same as when NCIX is on a heap) • Instead of a RowID, each row’s clustered index “key” value is stored in the index leaf level instead. • Leaf pages “chained” via doubly linked list for intra index scan • Query execution example: Select Lname, Fname from Customers where Lname = ‘Saunders’ • Bookmark lookups seriously degrade performance where many rows qualify for results

Designing indexes to match queries • Cover filter predicates by indexing columns in where & join clauses • Provides SQL Server with efficient access path to identify rows that qualify filters • Inner Joins are filters – equivalent to WHERE • Column order is critical – most selective columns first • Ensure filter predicates are not accessed via Bookmark or RowID lookups • If many rows are being accessed, ensure entire query is “covered” • Include columns referenced by filter predicates first (WHERE, JOIN) then include columns referenced in SELECT list last)

How many indexes should I add? (1st of 2) • Index update overhead is often over-stated • OLTP systems usually ‘Read’ FAR more than they write. • eg, customers usually browse many website pages before actually placing an order • eg, even ‘pure’ system update / insert activity usually generates more read activity than write activity • PKs / FKs at least have to be ‘Read’ during inserts & updates • Usually far more important to tune reads than writes in an OLTP.

How many indexes should I add? (2nd of 2) • What happens when I have too many indexes? • When databases are over-indexed, the performance bottleneck is usually CPU or memory related rather than disk related. Why? • During insert, update & delete operations, SQL Server has to first “find” the pages that contain the rows being manipulated. Finding these pages usually involves multiple reads for every update..

Reference material • A few books with excellent performance tuning content • “SQL Server Query Performance Tuning Distilled”, Sajal Dam • http://www.apress.com/book/bookDisplay.html?bID=371 • “Inside SQL Server 2000”, Kalen Delaney • http://www.amazon.com/exec/obidos/ASIN/0735609985/104-7280867-1941549 • “Guru’s Guide to SQL Server Architecture & Internals”, Ken Henderson • http://www.amazon.com/exec/obidos/tg/detail/-/0201700476/ref=pd_bxgy_img_2/104-7280867-1941549?v=glance&s=books • “SQL Server 2000 Performance Tuning”, Ed Whalen et all • http://www.amazon.com/exec/obidos/tg/detail/-/0735612706/ref=pd_bxgy_img_2/104-7280867-1941549?v=glance&s=books

Questions? Greg Linwood greg@SolidQualityLearning.com http://blogs.sqlserver.org.au/blogs/greg_linwood

SQL Server Indexing for Developers

SQL Server Indexing for Developers

Presentation Transcript

UTS Short Course SQL Server 2005 for Developers

SQL Server 2008 for Developers

SQL Server 2012 for Developers

SQL Server 2008 R2 Update for Developers Part II – Introducing SQL Server 2008 R2

Intro to Sql Server Yukon for Developers

SQL Server 2008 New Features for Developers

SQL Server 2008 R2 for Developers

SQL Server 2008 R2 Update for Developers Part I – SQL Server 2008 Review

Spatial Data Types And Indexing SQL Server 2008

SQL Server 2008 for Developers

SQL Server Optimization for Developers

SQL Server Indexing for the Client Developer

Module 5 Planning for SQL Server ® 2008 R2 Indexing

SQL Server 2008 for .NET Developers

SQL Server performance tuning basics (for the developers)

SQL Server performance tuning basics (for the developers)

Microsoft SQL Server Administration for SAP SQL Server Architecture

PolyServe for SQL Server

for SQL Server Databases

SQL SERVER

SQL Server Indexing for the Client Developer