1 / 48

Indexing Techniques

Indexing Techniques. The Problem. What can we introduce to make search more efficient? Indices ! What is an index?. …. …. Anna. Paul. Tim. Page i. Page i+1. Definitions. Index: an auxiliary data structure to speed up record retrieval

Télécharger la présentation

Indexing Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing Techniques

  2. The Problem • What can we introduce to make search more efficient? • Indices! • What is an index? … … Indexing Techniques

  3. Anna Paul Tim Page i Page i+1 Definitions • Index: an auxiliary data structure to speed up record retrieval • Search key: the field/s of a table which is/are indexed • Storage: index files that contain index records • Each entry storing • Actual data record • or, search key value k and record ID <k,rid> • or, search key value k and list of records IDs <k,rid list> • Types: ordered and unordered (hash) indices Indexing Techniques

  4. 00112233 00112233 Paul Anna 00112235 00112234 Anna Carol 00112236 00112235 Matt Paul 00112238 00112236 Tim Tim 00112237 Carol 00112238 Rob Types of Ordered Indices (1/3) • Assuming ordered data files • Depending on which field is indexed • Primary index: search key is ordering key field • Pointer for each page • Secondary index: search key is non ordering field secondary primary Indexing Techniques

  5. 00112233 00112234 00112235 00112233 00112236 00112235 00112237 00112236 00112238 00112238 00112233 Paul 00112234 Anna 00112235 Matt 00112236 Tim 00112237 Carol 00112238 Rob Types of Ordered Indices (2/3) • Depending on the density of index records • Dense index: an index record for each distinct search key value, ie every record • Sparse index: index records for only some search key values • search key value for first record in page • pointer to page dense sparse Indexing Techniques

  6. Anna Carol Matt 00112233 Paul 00112234 Anna 00112234 Rob 00112237 Carol 00112235 Tim 00112235 Matt 00112236 00112237 00112233 Paul 00112238 01112233 Paul 01112233 00112238 Rob 01112236 00112236 Tim 02112236 01112236 Tim 02112236 Tim Types of Ordered Indices (3/3) • Ordering field is nonkey (may have duplicates) • Clustered index • Unclustered index clustered unclustered Indexing Techniques

  7. Indices Exercise • 215 records • 128 bytes/record • 210 bytes/page • ordered file equality search on ordering field, unspanned organization • without an index • with a primary index • on field of size 12 bytes • assume pointer 4 bytes long Indexing Techniques

  8. 00112233 00112233 00112234 00112235 00112235 00112236 00112236 00112237 00112233 Paul 00112238 00112234 Anna 00112235 Matt 00112236 Tim 00112237 Carol 00112238 Rob Multi-level Indices (1/2) • If access using first-level index is still expensive • Build a sparse index on the first-level index • Multi-level Index • Fan-out: index blocking factor first-level index second-level index Indexing Techniques

  9. Multi-level Indices (2/2) • 26 index records/page (fan-out) • 215 index records • 1st-level • 29 pages • 2nd-level • 29 index records • 23 pages • 3rd-level • 23 index records • 1 page • 1 <= 215 / (26)t • t = ceil(log26 215 ) = 3 • t = ceil(logfo#index-records) Indexing Techniques

  10. Dynamic multi-level indices • So far assumed indices are physically ordered files • expensive insertions and deletions • Dynamic multi-level indices • B trees • B+ trees Indexing Techniques

  11. Tree-structured Indices • For each node: K1 < K2 < … Kq-1 • For each value X in subtree pointed to by Pi • Ki-1< X < Ki, 1<i<q • X < Ki, i=1 • Ki-1< X, i=q P1 K1 … Ki-1 Pi Ki … Kq-1 Pq X X X Indexing Techniques

  12. … … … … … … … … B tree • Problems: empty nodes, unbalanced trees • solution: B trees Indexing Techniques

  13. B tree: Definition • Each node: <P1,<K1, Pr1>, P2,…,<Kq-1, Prq-1>, Pq> • Pi tree pointer, Ki search value, Pri data pointer • For each node: K1 < K2 < … Kq-1 • For each value X in subtree pointed to by Pi • Ki-1< X < Ki, 1<i<q • X < Ki, i=1 • Ki-1< X, i=q • Each node at most q pointers • B tree is order q • Each node at least ceil(q/2) tree pointers • except from root • Internal node with p pointers has p-1 values • All leaves at the same level • balanced tree Indexing Techniques

  14. B tree: Example 5 8 ø 1 ø 3 ø ø 6 ø 7 ø ø 9 ø 12 ø tree pointer data pointer ø null pointer Indexing Techniques

  15. B+ tree • Most implementations of B tree are B+ tree • Data pointers only in leaves • more entries in internal nodes than regular B trees • less internal nodes • less levels • faster access Indexing Techniques

  16. 150 100 156 101 179 110 B+ tree: Definition • Internal nodes: <P1,K1, P2,…, Pq-1, Kq-1, Pq> • Leaf nodes: <<K1, Pr1>, <K2, Pr2>,…,<Kp-1, Prp-1>, Pnext> • Pri points a data records or block of pointers of such records • leaf order 120 150 180 120 130 180 200 Indexing Techniques

  17. 120 150 180 3 100 150 101 156 5 179 11 110 B+ tree: Search • At each level, find smallest Ki larger than search key • Follow associated pointer Pi 100 30 30 35 120 130 180 200 Indexing Techniques

  18. B+ tree: Insert • Nodes may overflow or underflow • Ignoring overflow or underflow • Inserting data record with with search key value k • find leaf node • if k found • add record to file, create indirect block if there isn’t one • add record pointer to indirect block • if k not found • add data record to file • insert record pointer in leaf node (all search keys in order) Indexing Techniques

  19. B+ tree: Delete • Ignoring overflow or underflow • Find leaf node with search key value k • Find data record pointer, delete record • delete index record • and indirect block, if any, if empty Indexing Techniques

  20. 100 k < 100 120 150 180 30 30 35 120 130 180 200 100 150 3 156 5 101 11 110 179 B+ tree: Simple Insert • Insert 42 42 Indexing Techniques

  21. 100 k < 100 120 150 180 30 30 35 42 120 130 180 200 3 150 100 5 156 101 11 110 179 B+ tree: Leaf Overflow (1/2) • Insert 9 Indexing Techniques

  22. 120 150 180 150 100 101 156 110 179 B+ tree: Leaf Overflow (2/2) • first ceil(n/2) in existing node, rest in new leaf node • n=3+1=4 100 k < 100 9 30 3 5 9 11 30 35 42 120 130 180 200 Indexing Techniques

  23. 120 150 180 100 150 156 101 179 110 B+ tree: Internal Node Overflow (1/3) • Insert 210, insert 205 100 k < 100 9 30 3 5 9 11 30 35 42 120 130 180 200 210 Indexing Techniques

  24. 120 150 180 100 150 156 101 179 110 B+ tree: Internal Node Overflow (2/3) • Leaf Split 100 k < 100 9 30 3 5 9 11 30 35 42 120 130 180 200 205 210 Indexing Techniques

  25. 120 180 205 150 100 156 101 179 110 B+ tree: Internal Node Overflow (3/3) 100 150 k < 100 9 30 3 5 9 11 30 35 42 120 130 180 200 205 210 Indexing Techniques

  26. 120 150 180 150 100 101 156 179 110 B+ tree: New Root (1/2) • Insert 210, insert 205 120 130 180 200 205 210 Indexing Techniques

  27. 150 180 120 205 100 150 101 156 179 110 B+ tree: New Root (2/2) 120 130 180 200 205 210 Indexing Techniques

  28. Index Insert Exercise • Insert 8, 7, 41 9 30 3 5 9 11 30 35 42 Indexing Techniques

  29. B+ tree: Delete • Simple delete case • Underflow case: • redistribute records • coalesce with siblings • update parents Indexing Techniques

  30. 150 180 120 205 100 150 101 156 179 110 B+ tree: Simple Delete (1/2) • Delete 110 120 130 180 200 205 210 215 Indexing Techniques

  31. 150 120 180 205 150 156 179 B+ tree: Simple Delete (2/2) • Leaf Updated 100 101 120 130 180 200 205 210 215 Indexing Techniques

  32. 150 120 180 205 150 156 179 B+ tree: Delete Redistribution (1/2) • Delete 180 100 101 120 130 180 200 205 210 215 Indexing Techniques

  33. 150 120 B+ tree: Delete Redistribution (2/2) • Redistribute entries • left or right sibling 179 205 100 101 120 130 150 156 179 200 205 210 Indexing Techniques

  34. 150 120 B+ tree: Delete Coalesce (1/4) • Delete 101 179 205 100 101 120 130 150 156 179 200 205 210 215 Indexing Techniques

  35. 150 120 B+ tree: Delete Coalesce (2/4) • Leaf updated • No redistribution • sibling coalesce 179 205 100 120 130 150 156 179 200 205 210 215 Indexing Techniques

  36. 150 B+ tree: Delete Coalesce (3/4) • Leaf updated • No redistribution • sibling coalesce 179 205 100 120 130 150 156 179 200 205 210 215 Indexing Techniques

  37. 150 179 B+ tree: Delete Coalesce (4/4) • Redistribution 205 100 120 130 150 156 179 200 205 210 215 Indexing Techniques

  38. Hashing Techniques

  39. null null h Static Hashing (1/2) • Store records in buckets with overflow chains • Allocate a fixed number of buckets M • Problems: • small M • long overflow chains, slow search-delete-insert Indexing Techniques

  40. Static Hashing (2/2) • Problems: • large M • wasted space, slow scan null null h null Indexing Techniques

  41. Dynamic Hashing • Splitting and coalescing buckets as the database grows-shrinks • One scheme: Extendible Hashing • Hash function generates large values, eg 32 bits • use i bits, change i as database size changes • If overflow, double the number of buckets • use i+1 bits of the hash function • but, expensive: read all pages M and distribute records in 2*M pages • solution: use a directory and double the size of the directory • only split bucket that overflowed Indexing Techniques

  42. 2 16 20 A 2 00 2 01 1 B 10 2 11 C 2 3 7 Directory D Buckets Extendible Hashing (1/4) h(18) = 10010 18 Indexing Techniques

  43. Extendible Hashing (2/4) 2 16 20 A 2 00 2 01 1 B 10 h(4) = 00100 2 11 18 C 2 3 7 D Indexing Techniques

  44. Extendible Hashing (3/4) 3 16 A 2 00 2 01 1 B 10 2 11 18 C 2 3 7 D 3 20 4 A1 Indexing Techniques

  45. Extendible Hashing (4/4) 3 • Global Depth • Local Depth • If bucket full: • split bucket • increment LD • If GD=LD • increment GD • double directory 16 A 3 000 2 001 1 B 010 2 011 18 C 100 101 2 110 3 7 D 111 3 20 4 A1 Indexing Techniques

  46. Extendible Hashing: Delete • If deletion make bucket empty • merge with split image • If directory pointers point to same bucket as split image • directory halved Indexing Techniques

  47. Extendible Hashing: Summary • Avoids overflow pages • Directory can get large • Key search requires just 2 page reads • Space utilization fluctuates • 59-90% for uniformly distributed records Indexing Techniques

  48. Extendible Hashing: Exercise • Initially GD = LD = 1 • M = 2 buckets • Hash function: h(k) = k mod 2i • inserts: 14, 18, 22, 3, 9 • deletes 9, 22, 3 1 12 8 1 00 1 01 5 Indexing Techniques

More Related