1 / 35

Index Tuning

Index Tuning. Index. An index is a data structure that supports efficient access to data. Set of Records. Matching records. Condition on attribute value. index. (search key). Performance Issues. Type of Query Index Data Structure Organization of data on disk Index Overhead

alka
Télécharger la présentation

Index Tuning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Index Tuning H. Galhardas

  2. Index An index is a data structure that supports efficient access to data Set ofRecords Matchingrecords Conditiononattributevalue index (search key) H. Galhardas

  3. Performance Issues • Type of Query • Index Data Structure • Organization of data on disk • Index Overhead • Data Distribution • Covering H. Galhardas

  4. Point QuerySELECT balanceFROM accountsWHERE number = 1023; Multipoint QuerySELECT balanceFROM accountsWHERE branchnum = 100; Range QuerySELECT numberFROM accountsWHERE balance > 10000; Prefix Match QuerySELECT *FROM employeesWHERE name = ‘Jensen’ and firstname = ‘Carl’ and age < 30; Types of Queries H. Galhardas

  5. Extremal QuerySELECT *FROM accountsWHERE balance = max(select balance from accounts) Ordering QuerySELECT *FROM accountsORDER BY balance; Grouping QuerySELECT branchnum, avg(balance)FROM accountsGROUP BY branchnum; Join QuerySELECT distinct branch.adresseFROM accounts, branchWHERE accounts.branchnum = branch.numberand accounts.balance > 10000; Types of Queries H. Galhardas

  6. Search Keys • A (search) key is a sequence of attributes. create index i1 on accounts(branchnum, balance); • Types of keys • Sequential: the value of the key is monotonic with the insertion order (e.g., counter or timestamp) • Non sequential: the value of the key is unrelated to the insertion order (e.g., social security number) H. Galhardas

  7. Data Structures • Most index data structures can be viewed as trees. • In general, the root of this tree will always be in main memory, while the leaves will be located on disk. • The performance of a data structure depends on the number of nodes in the average path from the root to the leaf. • Data structure with high fan-out (maximum number of children of an internal node) are thus preferred. H. Galhardas

  8. B+-Tree • A B+-Tree is a balanced tree whose leaves contain a sequence of key-pointer pairs. 96 75 83 107 33 48 69 75 80 81 83 92 95 96 98 103 107 110 120 H. Galhardas

  9. B+-Tree Performance • Key length influences fanout • Choose small key when creating an index • Key compression • Prefix compression (Oracle 8, MySQL): only store that part of the key that is needed to distinguish it from its neighbors: Smi, Smo, Smy for Smith, Smoot, Smythe. • Front compression (Oracle 5): adjacent keys have their front portion factored out: Smi, (2)o, (2)y. There are problems with this approach: • Processor overhead for maintenance • Locking Smoot requires locking Smith too. H. Galhardas

  10. values Hashed key key R1 R5 Hashfunction 0 1 R3 R6 R9 R14 R17 R21 R25 2341 n Hash Index • A hash index stores key-value pairs based on a pseudo-randomizing function called a hash function. The length of these chains impacts performance H. Galhardas

  11. Hash Index Performance • The best for answering point queries, provided there are no overflow chains • Good for multipoint queries • Useless for range, prefix or extremal queries • Must be reorganized (drop/add or use reorganize function) if there is a significant amount of overflow chaining • Avoiding overflow may require underutilize the hash space • Size of hash structure is not related to the size of a key, because hash functions return keys to locations or page identifiers • Hash functions take longer to execute on a long key H. Galhardas

  12. Clustered index (primary index) A clustered index on attribute X co-locates records whose X values are near to one another. Non-clustered index (secondary index) A non clustered index does not constrain table organization. There might be several non-clustered indexes per table. Clustered / Non clustered index Records Records H. Galhardas

  13. Sparse index Pointers are associated to pages Dense index Pointers are associated to records Non clustered indexes are dense Dense / Sparse Index P1 P2 Pi record record record H. Galhardas

  14. SQL Server B+-Tree data structure Clustered indexes are sparse Indexes maintained as updates/insertions/deletes are performed DB2 B+-Tree data structure, spatial extender for R-tree Clustered indexes are dense Explicit command for index reorganization Oracle B+-tree, hash, bitmap, spatial extender for R-Tree No clustered index until 10g Index organized table (unique/clustered) Clusters used when creating tables. MySQL B+-Tree, R-Tree (geometry and pairs of integers) Indexes maintained as updates/insertions/deletes are performed Index Implementations in some major DBMS H. Galhardas

  15. Index Tuning Knobs • Index data structure • Search key • Size of key • Clustered/Non-clustered/No index • Covering H. Galhardas

  16. Point QuerySELECT balanceFROM accountsWHERE number = 1023; Multipoint QuerySELECT balanceFROM accountsWHERE branchnum = 100; Range QuerySELECT numberFROM accountsWHERE balance > 10000; Prefix Match QuerySELECT *FROM employeesWHERE name = ‘Jensen’ and firstname = ‘Carl’ and age < 30; Types of Queries H. Galhardas

  17. Extremal QuerySELECT *FROM accountsWHERE balance = max(select balance from accounts) Ordering QuerySELECT *FROM accountsORDER BY balance; Grouping QuerySELECT branchnum, avg(balance)FROM accountsGROUP BY branchnum; Join QuerySELECT distinct branch.adresseFROM accounts, branchWHERE accounts.branchnum = branch.numberand accounts.balance > 10000; Types of Queries H. Galhardas

  18. Benefits of a clustered index • A sparse clustered index stores fewer pointers than a dense index. • This might save up to one level in the B-tree index. • Nb pointers dense index = nb pointers sparse index * nb records per page • If records small compared to pages, there will be many records per page, so sparse has one level less than dense • A clustered index is good for multipoint queries • White pages in a paper telephone book • A clustered index based on a B-Tree supports range, prefix, extremal and ordering queries well • A clustered index (on attribute X) can reduce lock contention: • Retrieval of records or update operations using an equality, a prefix match or a range condition based on X will access and lock only a few consecutive pages of data H. Galhardas

  19. Multipoint query that returns 100 records out of 1000000. Cold buffer Clustered index is twice as fast as non-clustered index and orders of magnitude faster than a scan. Evaluation of Clustered Index H. Galhardas

  20. Inconvenient of a clustered index • Benefits can diminish if there is a large number of overflow data pages • Accessing those pages will usually entail a disk seek • Overflow pages can result from two kinds of updates: • Inserts may cause data pages to overflow • Record replacements that increase the size of a record (e.g., the replacement of a NULL values by a long string) H. Galhardas

  21. Index is created with fillfactor = 100. Insertions cause page splits and extra I/O for each query Maintenance consists in dropping and recreating the index With maintenance performance is constant while performance degrades significantly if no maintenance is performed. Evaluation of clustered indexes with insertions (1) H. Galhardas

  22. Index is created with pctfree = 0 Insertions cause records to be appended at the end of the table Each query thus traverses the index structure and scans the tail of the table. Performances degrade slowly when no maintenance is performed. Evaluation of clustered indexes with insertions (2) H. Galhardas

  23. In Oracle, clustered index are approximated by an index defined on a clustered table No automatic physical reorganization Index defined with pctfree = 0 Overflow pages cause performance degradation Evaluation of clustered indexes with insertions (3) H. Galhardas

  24. Redundant tables • Because there is only one clustered index per table, it might be a good idea to replicate a table in order to use a clustered index on two different attributes • Yellow and white pages in a paper telephone book • Works well if low insertion/update rate H. Galhardas

  25. Covering Index - definition • A nonclustering index can eliminate the need to access the underlying table through covering (composite index) • For the query: Select name from employee where department = “marketing” • Good covering index would be on (department, name) • Index on (name, department) less useful. • Index on department alone moderately useful. Inconvenients (of composite indexes): • Tend to have a large key size • Update to one of the attributes causes index to be modified H. Galhardas

  26. Covering index performs better than clustering index when first attributes of index are in the where clause and last attributes in the select. When attributes are not in order then performance is much worse. Covering Index - impact H. Galhardas

  27. A dense index can eliminate the need to access the underlying table through covering. It might be worth creating several indexes to increase the likelihood that the optimizer can find a covering index A non-clustered index is good if each query retrieves significantly fewer records than there are pages in the table. Point queries always useful Multipoint queries useful if: number of distinct key values > c * number of records per page Where c is the number of pages retrieved in each prefetch Benefits of non-clustered indexes H. Galhardas

  28. IBM DB2 v7.1 on Windows 2000 Range Query If a query retrieves 10% of the records or more, scanning is often better than using a non-clustering non-covering index. Table Scan Can Sometimes Win H. Galhardas

  29. Joins, Foreign Keys and Indexes R |X| R.A = S.B S can be executed through: (NR = nb tuples R, NS = nb tuples S, BR = nb pages R, BS = nb pages S) • Nested Loop (NL) • Cost I/O: NR *BS + BR, can be reduced to BR + BS if the smaller relation fits entirely in memory • Indexed Nested Loop(INL) • Cost: BR + c * NR, c: cost of traversing index and fetching all matching s tuples for one tuple or r • Hash Join (HJ) • Cost I/O: 3(BR + BS), can be reduced to BR + BS if the entire build input can be kept in main memory H. Galhardas

  30. Choice of join algorithms (1) • If there is no index present, the system will choose to use a hash join • If there is an index present: • An INL based on an index on S.B works better than a HJ if the nb of distinct values in S.B is almost equal to the nb of rows of S • Common case because most joins are FK joins • The same, regardless the nb of distinct values of S.B, if the index covers the join • The only accesses to S data occur within the index • The same, regardless the nb of distinct values of S.B, if S is clustered based on B • All S rows having equal B values will be colocated • Otherwise, the hash join may be better H. Galhardas

  31. Choice of join algorithms (2) An index may be particularly useful in two other situations: • In a non-equi-join (R.A > S.B), an index (using a Btree) on the join attribute avoids a full table scan in the NL. • To support the FK constraint when R.A is a subset of R.B, so A is a FK in R; B is a PK in S • An index in S speeds up insertions on R: for every record inserted in R, check the foreign constraint on S • Similarly, an index on R.A speeds up deletions in S H. Galhardas

  32. Index on Small Tables • Tuning manuals suggest to avoid indexes on small tables (containing fewer than 200 records) • This number depends on the size of records compared with the size of the index key • If all data from a relation fits in one page (or in a single disk track and can be read into memory through a single physical read by prefetching) • an index page adds at least an I/O • If each record fits in a page, then 200 records may require 200 disk accesses or more. • an index helps performance • If many inserts execute on a table with a small index, the index itself may become a concurrency control bottleneck • Lock conflicts near the root H. Galhardas

  33. Small table: 100 records Two concurrent processes perform updates (each process works for 10ms before it commits) No index: the table is scanned for each update. No concurrent updates. A clustered index allow to take advantage of row locking. Index on Small Tables and Updates If transactions update a single record, without an index, each transaction scans through many records before it locks the relevant record, thus reducing update concurrency H. Galhardas

  34. Table organization and index selection: basic rules (1) • Use a hash index for point queries only. Use a B-tree if multipoint queries or range queries are used • Use clustering • if your queries need all or most of the fields of each records returned, but the records are too large for a composite index on all fields • if multipoint or range queries are asked • Use a dense index to cover critical queries • Don’t use an index if the time lost when inserting and updating overwhelms the time saved when querying H. Galhardas

  35. Table organization and index selection: basic rules (2) • Use key compression • If you are using a B-tree • Compressing the key will reduce the number of levels in the tree • The system is disk-bound but not CPU-bound • Updates are relatively rare H. Galhardas

More Related