SQL Server Storage and Index Structures

SQL Server Storage andIndex Structures Physical Data Organization Indexes B-Trees SQL Server Data Access Clustered and Non-Clustered Creating, Altering, Dropping Indexes Choosing your Indexes Maintaining your Indexes

SQL Server Storage Hierarchy • Database • Extent • 8 contiguous 64K data pages • Once extent full, next record will take up a whole additional extent. • Pre-allocating space saves time.

SQL Server Storage Hierarchy • Page • 64K bytes • # of records/page varies with bytes/record • Types of Pages: Data and Index pages • Page Split • When page becomes full, it splits • New page allocated: ½ data from old page moved to new • Rows • 8060 bytes and 1024 columns

B-tree Key Points to Remember • Tree portion includes key attributes only • ordered as in create index statement • Keys packed in index pages • Fewer bytes per key -> more keys per page/extent -> fewer page faults per access. • Clustered indexes have records at leafs • Records are in data pages • Data pages sequentially linked • Non-Clustered indexes point into heap or tree portion of clustered index

Create Index Statement • Create [unique] [clustered | nonclustered] index <indexName> on <table or view> (col. Name> [asc|desc] [,…]) include (<col name> [,…]) (with … pad_index, fillfactor, ignore_dup_key drop_existing, statistics_norecompute sort_in_tempDB, online, allow_row_locks, allow_page_locks, maxdop

Create Index Details • Asc/Desc • Ascending & descending sort order for index • Include (cool!) • Includes col in leaf nodes of clustered index • Allows very fast access to non-key attribute • Useful with very large record – fewer page faults

Create Index “with” Details • Pad_Index= (on|off) • Initial fill-factor for index’s non-leaf pages • Fill Factor = <1% – 100%> • Default is index pages are as full as possible minus two records • Fill factor is how full after index is created • Once split goes to 50% • Ignore_dup_key • Circumvent unique key constraint somewhat • Still get error message, but no rollback • useful for storing unique values but trashing transactions

Create Index “with” Details • Drop_Existing • Any existing index with same name is dropped with this create statement • More efficient than drop index followed by create for clustered index as no need to touch non-clustered indexes or data pages • Statistics_nonrecompute • Default: sql server automates the process of updating the statistics on tables/ indexes • This option says you will maintain stats • DON’T USE THIS!

Create Index “with” Details • Sort_In_tempdb • Only useful when tempdb on physically separate drive • Reads/write for sort compete with read/writes to write data and index pages • This make sense if and only if you understand disk writes - discussion • Online • Keeps table available to users while creating index – sounds good, but ….!!

Create Index “with” Details • Allow row/page locks • Don’t use unless really good • MAXDOP • Overrides system setting for max degree of parallelism while building index • How many processes are used to construct an index. MAXDOP sets limit on how many processors per operation. • Compare and contrast these terms

Create Index “with” Details • ON • Can store index separately from data • Space for index spread across drives • I/O for indexes not compete with physical data retrieval

XML Indexes • Indexes into XML data • Xml VERY unstructured • Column can be of type xml in sql server • Create index on xml column • Page 276 for more details

Implied indexes • created by some constraints • Primary Key • Unique • Can easily end up with duplicate constraints and not realize it

Deciding what indexes go where? • Indexes speed access, but costly to maintain • Almost every update to table requires altering both data pages and every index. • All inserts and deletions affect all indexes • Many updates will affect non-clustered indexes • Sometimes less is more • Not creating an index sometimes may be best • Code for tranasaction have where clause? What columns used? Sort requried?

Deciding what indexes go where? • Selectivity • Indexes, particularly non-clustered indexes, are primarily beneficial in situations where there is a reasonably HIGH LEVEL of Selectivity within the index. • % of values in column that are unique • Higher percentage of unique values, the higher the selectivity • If 80% of parts are either ‘red’ or ‘green’ not very selective

Choosing Clustered Index • Only one per table! - Choose wisely • Default, primary key creates clustered index • Do you really want your prime key to be clustered index? • Option: create table foo myfooExample (column1 int identify primary key nonclustered column2 …. ) • Changing clustered index can be costly • How long? Do I have enough space?

Clustered Indexes Pros & Cons • Pros • Clustered indexes best for queries where columns in question will frequently be the subject of • RANGE query (e.g., between) • Group by with max, min, count • Search can go straight to particular point in data and just keep reading sequentially from there. • Clustered indexes helpful with order by based on clustered key

Clustered Indexes Pros & Cons • The Cons – two situations • Don’t use clustered index on column just because seems thing to do (e.g., primary key default) • Lots of inserts in non-sequential order • Constant page splits, include data page as well as index pages • Choose clustered key that is going to be sequential inserting • Don’t use a clustered index at all perhaps?

Column Order Matters • (P#, S#, Qty) • P# S# together are primary key • One index that includes all columns is not useful in all situations! • Only end up storing data a second time. • Clustered index of P#S# not same as S#P# • P#S# can lookup P# fairly easily, but looking up S# requires a linear search. • S#P# can lookup S# fairly easily, but not P#. • Note that even though key of S#P# means can’t lookup P# quickly, are some advantages in include P# in key.

Dropping Indexes • Sometimes makes sense to constantly re-analyze situation and add indexes • DON’T FORGET TO DROP INDEXES!! • Big overhead for inserts and deletes • Always ask yourself: “Can I get rid of any of these?” • Drop INDEX <TABLE NAME> <Index name>

Index Tuning Wizard • Hopefully you will evolve to the point you don’t need to use this gadget • But still can be quite handy • Uses workload file generated using sql server profiler (ch 19) • Not ideal to depend on this tool, but it may make some suggestions that you have not thought of.

Maintaining Indexes • Page Splits • Insert/delete order and rate critical • Fragmentation • Not OS fragementation – e.g. defrag tool • Happens when database grows, pages split, and then data eventually deleted. • Btrees great on maintaining balance on insertions, but with deletes, can end up with many pages containing small # of records.

Fragmentation Problems • Wasted space • Sql server allocates an extend at a time • Could end up with an extent, containing single page, with single record. • Thrashing (way too many disk hits) • Could end up with page 1 of data on one extend, page 2 on another, page 3 on the first, page 4 on another, …. • Records all over the place • Bit better for inserts but really bad for reads!

Identifying Fragmentation vs. page splits • DBCC SHOWCONTIG • Page 283 • Demo with northwind

DBREINDEX & Fillfactor • DBCC DBREINDEX • Can drop index and rebuild • Usually best to use drop-existing • Completely rebuilds the index • If supply table name, rebuilds all indexes on table. • Re-establishes base fillfactors etc. • Strongly recommend disallow transactions while doing this. • Rebuilding is probably better.

Summary • Clustered indexes usually faster than non-clustered • Only place non-clustered indexes on columns with high selectivity (>95% of rows are unique on that column) • All data manipulation language statements can benefit, from indexes, but inserts, deletes, and updates are slowed. • Indexes take up space and require page hits.

Summary • Index used only if first column in index is relevant to query • Indexes can hurt as much as they help • Make sure don’t add one by accident. • Indexes can provided structured data performance to unstructured XML, but overhead involved.

Summary • Is there a high level of selectivity on the data? • if yes and is frequently target of where clause, then add index • Have I dropped indexes I no longer need? • Why not? • Do I have a maintenance strategy established? • Why not?

Critical Questions • Are there lots of inserts of modifications to this table? • If yes, keep indexes to minimum • Is this a reporting table? • E.g. not many inserts but lots of reports run many different ways • If yes, more indexes are fine. • Is there a high level of selectivity on the data? • If yes and is frequently target of where clause, then add index

SQL Server Storage and Index Structures

SQL Server Storage and Index Structures

Presentation Transcript

Columnstore Index in SQL Server 2012

SQL Server 2012 xVelocity COLUMNSTORE Index

SQL Server Index Internals

Understanding Storage Systems and SQL Server

Index Tuning Strategies for SQL Server

Storage Performance on SQL Server

SQL Server, Storage And You Part 2: SAN, NAS and IP Storage

Understanding Storage Systems and SQL Server

Index Structures

SQL Server, Storage and You Part 1: Storage Basics

Index structures

Index Structures

B+-Trees and Hashing Techniques for Storage and Index Structures

Index Structures

SQL SERVER

SQL Server, Storage and You - Part III: Solid State Storage

Storage and Index

Browse and Modify Database Structures - SQL Server Query Builder