1 / 69

Data Structures

Data Structures. Lecture 5 B-Trees. Haim Kaplan and Uri Zwick November 2012. A 4 -node. 10. 25. 42. key < 10. 10 < key < 25. 25 < key < 42. 42 < key. 3 keys. 4 -way branch. An r -node. …. k 0. k 1. k 2. k r−3. k r−2. c 0. c 1. c 2. c r −2. c r −1.

burke
Télécharger la présentation

Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures Lecture 5 B-Trees Haim Kaplan and Uri ZwickNovember 2012

  2. A 4-node 10 25 42 key< 10 10 < key < 25 25 < key < 42 42 < key 3 keys 4-way branch

  3. An r-node … k0 • k1 • k2 • kr−3 • kr−2 c0 c1 c2 cr−2 cr−1 r−1 keys r-way branch

  4. B-Trees (with minimum degree d) Each node holds between d−1 and 2d −1 keys Each non-leaf node has between d and 2d children The root is special:has between 1 and 2d −1 keys and between 2 and 2d children (if not a leaf) All leaves are at the same depth

  5. A 2-4 tree B-Tree with minimal degree d=2 13 4 6 10 15 28 1 3 30 40 50 14 5 7 11 16 17

  6. Node structure … k0 • k1 • k2 • kr-3 • kr-2 r –the degree c0 c1 c2 cr−2 cr−1 key[0],…key[r−2] –the keys item[0],…item[r−2] –the associated items child[0],…child[r−1] –the children leaf –is the node a leaf? Possibly a different representation for leafs

  7. The height of B-Trees • At depth 1 we have at least 2 nodes • At depth 2 we have at least 2dnodes • At depth 3 we have at least 2d2nodes • … • At depth h we have at least 2dh−1nodes

  8. Look for k in node x Look for k in the subtree of node x Number of nodes accessed - logdn Number of operations – O(d logdn) Number of ops with binary search – O(log2d logdn) = O(log2n)

  9. B-Trees vs binary search trees • Wider and shallower • Access less nodes during search • But may take more operations

  10. B-Trees – What are they good for?

  11. The hardware structure CPU Cache Disk Each memory-level much larger but much slower RAM  Information moved in blocks

  12. A simplified I/O model CPU RAM Disk Each block is of size m. Count both operations and I/O operations

  13. Data structures in the I/O model Each node (struct) is allocated continuously. Harder to control the disk blocks containing different nodes  Linked list and search trees behave poorly in the I/O model. Each pointer followed may cause a disk access Pick d such that a node fits in a block  B-trees reduce the worst case # of I/Os

  14. Look for k in node x Look for k in the subtree of node x I/Os Number of nodes accessed - logdn Number of operations – O(d logdn) Number of ops with binary search – O(log2d logdn) = O(log2n)

  15. Red-BlackTrees vs. B-Trees n = 230  109 30 ≤ height of Red-BlackTree ≤ 60 Up to 60pages read from disk Height of B-Tree with d=1000 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access  1 millisecond (10-3 sec) Memory access 100 nanosecond (10-7 sec)

  16. B-Trees – What are they good for? • Large degree B-treesare used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block. • Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties. • B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.

  17. Updates to a B-tree

  18. Rotate/Steal right A B B A       Rotate/Steal left Number of operations – O(d) Number of I/Os – O(1)

  19. Split B A C B A C d−1 d−1 d−1 d−1     Join Number of operations – O(d) Number of I/Os – O(1)

  20. Insert 13 5 10 15 28 1 3 30 40 50 14 6 11 16 17 Insert(T,2)

  21. Insert 13 5 10 15 28 1 2 3 30 40 50 14 6 11 16 17 Insert(T,2)

  22. Insert 13 5 10 15 28 1 2 3 30 40 50 14 6 11 16 17 Insert(T,4)

  23. Insert 13 5 10 15 28 1 2 3 4 30 40 50 14 6 11 16 17 Insert(T,4)

  24. Split 13 5 10 15 28 1 2 3 4 30 40 50 14 6 11 16 17 Insert(T,4)

  25. Split 13 5 10 15 28 2 30 40 50 14 1 3 4 6 11 16 17 Insert(T,4)

  26. Split 13 2 5 10 15 28 1 30 40 50 14 3 4 6 11 16 17 Insert(T,4)

  27. Splitting an overflowing node B A C B A C d d−1 d d−1    

  28. Another insert 13 2 5 10 15 28 1 30 40 50 14 3 4 6 11 16 17 Insert(T,7)

  29. Another insert 13 2 5 10 15 28 1 30 40 50 14 6 7 3 4 11 16 17 Insert(T,7)

  30. and another insert 13 2 5 10 15 28 1 30 40 50 14 6 7 3 4 11 16 17 Insert(T,8)

  31. and another insert 13 2 5 10 15 28 1 30 40 50 14 3 4 11 16 17 6 7 8 Insert(T,8)

  32. and the last for today 13 2 5 10 15 28 1 30 40 50 14 3 4 11 16 17 6 7 89 Insert(T,9)

  33. Split 13 2 5 10 15 28 7 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  34. Split 13 2 5 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  35. Split 13 5 2 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  36. Split 5 13 2 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)

  37. Insert – Bottom up • Find the insertion point by a downward search • Insert the key in the appropriate place • If the current node isoverflowing, split it • If its parent is now overflowing, split it, etc. • Disadvantages: • Need both a downward scan and an upward scan • Need to keep parents on a stack • Nodes are temporarily overflowing

  38. Insert – Top down • While conducting the search,splitfull children on the search pathbefore descending to them! • When the appropriate leaf it reached,it is not full, so the new key may be added!

  39. Split-Root(T) T.root C T.root C d−1 d−1 d−1 d−1    

  40. Split-Child(x,i) x key[i] x key[i] B A C B A x.child[i] x.child[i] C d−1 d−1 d−1 d−1    

  41. Insert – Top down • While conducting the search,splitfull children on the search pathbefore descending to them! Number of I/Os – O(logdn) Number of operations – O(d logdn)

  42. Deletions from B-Trees 7 15 3 10 13 22 28 30 40 50 20 24 26 14 1 2 4 6 11 12 8 9 delete(T,26)

  43. Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,26)

  44. Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)

  45. Delete (Replace with predecessor) 7 15 3 10 12 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)

  46. Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,13)

  47. Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,24)

  48. Delete 7 15 3 10 12 22 28 30 40 50 20 11 14 1 2 4 6 8 9 delete(T,24)

  49. Delete (steal from sibling) 7 15 3 10 12 22 30 40 50 20 11 28 14 1 2 4 6 8 9 delete(T,24)

  50. Rotate/Steal right A B B A       Rotate/Steal left

More Related