1 / 32

CS4432: Database Systems II

CS4432: Database Systems II. Lecture #10. Professor Elke A. Rundensteiner. Chapter 4 – INDEXING Wrap-up. B+-tree Odds and Ends Hashing (briefly). record. ?. value. B+Tree Example n=3. 100. Root. 120 150 180. 30. 3 5 11. 120 130. 180 200. 100 101 110. 150 156 179.

zacharee
Télécharger la présentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner lecture #10 - indexing & hashing

  2. Chapter 4 – INDEXING Wrap-up • B+-tree Odds and Ends • Hashing (briefly) record ? value lecture #10 - indexing & hashing

  3. B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 lecture #10 - indexing & hashing

  4. Less space, so lookup faster Inserts managed by overflow area Requires temporary restructuring Unpredictable performance ComparisonB-tree vs. indexed seq. file • Consumes more space, so lookup slower • Each insert/delete potentially restructures • Build-in restructuring • Predictable performance lecture #10 - indexing & hashing

  5. B-trees better … • DBA does not know when to reorganize • DBA does not know how full to load pages of new index lecture #10 - indexing & hashing

  6. A la buffering… Is LRU a good policy for B+tree buffers? Of course not!  Should try to keep root in memory at all times (and perhaps some nodes from second level) lecture #10 - indexing & hashing

  7. Interesting problem: For B+tree, how large should n be? … n is number of keys / node lecture #10 - indexing & hashing

  8. assumptions:n children per node and N records in database • Time to read B-Tree node from disk is (tseek + tread*n) msec. • Once in main memory, use binary search to locate key, (a + b log_2 n) msec • Need to search (read) log_n (N) tree nodes • t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N) lecture #10 - indexing & hashing

  9. Can get:f(n) = time to find a record f(n) noptn  FIND nopt by f’(n) = 0 • What happens to nopt as: • Disk gets faster? CPU get faster? … lecture #10 - indexing & hashing

  10. Bulk Loading of B+ Tree • For large collection of records, create B+ tree. • Method 1: Repeatedly insert records  slow. • Method 2: Bulk Loading more efficient. lecture #10 - indexing & hashing

  11. 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Bulk Loading of B+ Tree • Initialization: • Sort all data entries • Insert pointer to first (leaf) page in new (root) page. Root Sorted pages of data entries; not yet in B+ tree lecture #10 - indexing & hashing

  12. Bulk Loading (Contd.) Root 10 20 • Index entries for leaf pages always entered into right-most index page • When this fills up, it splits. (Split may go up right-most path to root.) Faster than repeated inserts, especially when one considers locking! Data entry pages 6 12 23 35 not yet in B+ tree 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Root 20 10 Data entry pages 35 not yet in B+ tree 6 23 12 38 lecture #10 - indexing & hashing 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35*

  13. Summary of Bulk Loading • Method 1: multiple inserts. • Slow. • Does not give sequential storage of leaves. • Method 2:Bulk Loading • Has advantages for concurrency control. • Fewer I/Os during build. • Leaves will be stored sequentially (and linked) • Can control “fill factor” on pages. lecture #10 - indexing & hashing

  14. Hashing key  h(key) <key> Buckets (typically 1 disk block) . . . lecture #10 - indexing & hashing

  15. Example hash function • Key = ‘x1 x2 … xn’ n byte character string • Have b buckets • h: add x1 + x2 + ….. xn • compute sum modulo b lecture #10 - indexing & hashing

  16.  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets lecture #10 - indexing & hashing

  17. Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical & Inserts/Deletes not too frequent lecture #10 - indexing & hashing

  18. Next: example to illustrate inserts, overflows, deletes h(K) lecture #10 - indexing & hashing

  19. d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1 lecture #10 - indexing & hashing

  20. d maybe move “g” up EXAMPLE: deletion Delete:ef 0 1 2 3 a b d c c e f g lecture #10 - indexing & hashing

  21. If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit lecture #10 - indexing & hashing

  22. Extensible hashing • Others … How do we cope with growth? • Overflows and reorganizations • Dynamic hashing lecture #10 - indexing & hashing

  23. Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K)  use i grows over time…. 00110101 lecture #10 - indexing & hashing

  24. Extensible hashing : idea 2 (b) Use directory h(K)[i ] to bucket . . . . . . lecture #10 - indexing & hashing

  25. i = 2 00 01 10 11 1 1 2 1010 New directory 2 1100 Example: h(k) is 4 bits; 2 keys/bucket 1 0001 i = 1 0 1 1001 1100 Insert 1010 lecture #10 - indexing & hashing

  26. 2 0000 0001 2 0111 2 2 Example continued i = 2 00 01 10 11 1 0001 0111 1001 1010 Insert: 0111 0000 1100 lecture #10 - indexing & hashing

  27. i = 3 000 001 010 011 100 101 110 111 3 1001 1001 2 1001 1010 3 1010 1100 2 Example continued 0000 2 0001 i = 2 00 01 10 11 0111 2 Insert: 1001 lecture #10 - indexing & hashing

  28. Extensible hashing: deletion • Merge blocks and cut directory if possible (Reverse insert procedure) lecture #10 - indexing & hashing

  29. Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - Summary Extensible hashing + Can handle growing files - with less wasted space - with no full reorganizations lecture #10 - indexing & hashing

  30. Indexing vs Hashing • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 lecture #10 - indexing & hashing

  31. Indexing vs Hashing • INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 lecture #10 - indexing & hashing

  32. The BIG picture…. • Chapters 2 & 3: Storage, records, blocks... • Chapter 4 & 5: Access Mechanisms - Indexes - B trees - Hashing - Multi key • Chapter 6 & 7: Query Processing lecture #10 - indexing & hashing

More Related