1 / 55

B-Trees

B-Trees. Design and Analysis of Algorithms. Why do we need B-Trees ?. Data Retrieval from External Storage. In database programs, the data is too large to fit in memory, therefore, it is stored on secondary storage (disks or tapes).

mea
Télécharger la présentation

B-Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. B-Trees Design and Analysis of Algorithms

  2. Why do we need B-Trees ?

  3. Data Retrieval from External Storage • In database programs, the data is too large to fit in memory, therefore, it is stored on secondary storage (disks or tapes). • Disk access is very expensive, the disk I/O operation takes milliseconds while CPU processes data on the order of nanoseconds, one million times faster. • When dealing with external storage the disk accesses dominate the running time.

  4. Balanced Binary Search Trees • Balanced binary search trees (AVL & Red-Black) have good performance if the entire data can fit in the main memory. • These trees are not optimized for external storage and require many disk accesses, thus give poor performance.

  5. Reduce Disk Accesses • Data is transfer to and from the disk in block. (typically block are of 512, 2048, 4096 or 8192 bytes) • We can reduce disk accesses by • Storing multiple records in a block on the disk. • Reducing the height of the tree by increasing the number of subtrees of a node. • To achieve above goals we use Multiway (m-way) search tree, which is a generalization of BST, binary search tree. 25 62 12 19 32 39 73 84 3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94

  6. Multiway(m-way) Search Trees

  7. Multiway(m-way) Search Trees • In an m-way tree all the nodes have degree ≤ m. • Each node has the following structure: • Ki are the keys, where 1 ≤ i ≤ q-1, q<=m • Pi are pointers to subtrees, where 1 ≤ i ≤ q, q<=m • The keys in each node are in ascending order K1 ≤ K2 ≤ ... ≤ Ki • The key Ki is larger than keys in subtree pointed by Pi and smaller than keys in subtree pointed by Pi+1 . • The subtrees are the m-way trees. P1 K1 P2 Ki-1 Pi Ki Kq-1 Pq keys<K1 Ki-1<keys<Ki Kq-1<keys

  8. Multiway(m-way) Search Trees • M-way tree is a generalization of BST, its working, benefits & issues are same. • Problems • The tree is not balanced. • Leaf nodes are on different levels. • Bad space usage, tree can become skew. • Benefits • Fast information retrieval. • Fast update. M-way tree

  9. B-Trees

  10. B-Trees • B-Tree is a balanced m-way tree that is tuned to minimize disk accesses. • The node size of B-Tree is usually equal to the disk block size and the number of keys in a node depends on • Key size • Disk block size • Data organization (keys or entire data records are store in nodes) • Access paths from root to leaf nodes are small. 25 62 12 19 32 39 73 84 3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94

  11. B-Tree: Definition • A B-tree of order 2t has following properties • Each node can have at most 2t subtrees • The root has at least two subtrees unless it is a leaf. • Each non-root and each non-leaf node holds t-1 keys and t pointers to subtrees • Each leaf node holds t-1 keys where . • All leaves are on the same level. • It is clear that B-tree is always at least half full, has fewer levels and is perfectly balanced. 25 62 12 19 32 39 73 84 3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94

  12. B-Tree • B-Tree node can have a field KeyTally, n, to indicate the number of keys currently stored in the node. • B-Tree node usually contains key and data pointer pair. The data pointer points to the data record which is not stored in the node, with this scheme we can pack more keys & pointers in a B-Tree node. n P1 Ki-1 Pi Ki Kq-1 Pq K1 D1 Ki-1 Di-1 Ki Di Kq-1 Dq-1 keys<K1 Ki-1<keys<Ki Kq-1<keys Data pointer Data pointer

  13. Height of B-Tree • Maximum height of the B-Tree with n keys is important as it bound the number of disk accesses. • The height of the tree is maximum when each node has minimum number of the subtree pointers, i.e. t.

  14. Height of B-Tree • The height of B-tree is maximum if all nodes have minimum number of keys. 1 key in the root + 2(t-1) keys on the level 1 + 2t(t-1) keys on the level 2 + • 2t2(t-1) keys on the level 3+ • . • . • . 2th-1(t-1) keys in the leaves (level h).

  15. Height of B-Tree • If number of nodes in B-tree equal 2,000,000 (2 million) and t=100 then maximum height of B-tree is 3, where as the binary tree would be of height 20. • Note: Order t is chosen so that B-tree node size is nearly equal to the disk block size.

  16. Search in a B-Tree • Search in a B-tree is similar to the search in BST except that in B-tree we make a multiway branching decision instead of binary branching in BST. 25 62 12 19 32 39 73 84 3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94 Search key 71

  17. X is a subtree and k is searching element Complexity=O(t) Complexity= O(logt n)

  18. B-Tree Insert Operation • Insertion in B-tree is more complicated than in BST. • In BST, the keys are added in top down fashion resulting in a unbalanced tree. • B-tree is built bottom up, • the keys are added in the leaf node, if the leaf node is full another node is created, keys are evenly distributed and middle key is promoted to the parent. • If parent is full, the process is repeated. • B-tree can also be built in top down fashion using pre-splitting technique. This way can avoid the reverse pass from leaf to root

  19. Basic Idea Check the number of keys in the current node Traverse the Appropriate child No Is node full ? Yes • Split node: • Create a new node • Move half of the keys from the full node to • the new node • Promote the median key (before split) • to the parent. • Split guarantees that each node has • keys. Is node leaf ? No Yes Insert the key

  20. Insert Example G M P X t = 3 A C D E J K N O R S T U V Y Z Insert B G M P X A B C D E J K N O R S T U V Y Z

  21. Insert Example (Continued) G M P X A B C D E J K N O R S T U V Y Z Insert Q G M P T X A B C D E J K N O Q R S Y Z U V

  22. Insert Example (Continued) G M P T X A B C D E J K N O Q R S Y Z U V Insert L P G M T X A B C D E J K L N O Q R S Y Z U V

  23. Insert Example (Continued) P G M T X A B C D E J K L N O Q R S Y Z U V Insert F P C G M T X A B D E F J K L N O Q R S Y Z U V

  24. Exercise in Inserting a B-Tree Insert the following keys in B-tree when t=3 : 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56 26

  25. Bottom Up B-Tree Insert Operation • In B-tree insertion we have the following cases: • Case 1: The leaf node has room for the new key. • Case 2: The leaf in which key is to be placed is full. • This case can lead to the increase in tree height. • Now we explain these cases in detail.

  26. B-Tree Insert Operation • Case 1: The leaf node has room for the new key. Find appropriate leaf node for key 3 Insert 3 10 25 5 8 14 19 20 23 24 32 38 3 Insert 3 in order

  27. B-Tree Insert Operation • Case 2: The leaf in which key is to be placed is full. Find appropriate leaf node for key 16 19 Insert 16 10 25 3 5 8 14 19 20 23 32 38 No room for key 16 in leaf node 16 Insert key 19 in parent node in order Move median key 19 up and Split node: create a new node and move keys to the new node. 14 20 23

  28. B-Tree Insert Operation • Case 2: The leaf in which key is to be placed is full and this lead to the increase in tree height.

  29. 55 B-Tree Insert Operation • Case 2: The height of the tree increases. Insert 16 Insert 27 in parent in order No room for 27 in parent, Split node 45 55 67 81 19 27 No room for 19 in parent, Split parent node 48 52 57 61 72 77 86 92 13 27 33 38 3 3 4 5 5 7 3 3 4 5 5 7 3 3 4 5 5 7 3 3 4 5 5 7 2 8 7 1 9 5 2 8 7 1 9 5 2 8 7 1 9 5 2 8 7 1 9 5 9 12 14 19 20 23 29 31 35 36 41 42 Insert 19 in parent node in order No room for key 16, Move median key 19 up & Split node 55 14 16 20 23

  30. Deleting from B-Trees

  31. B-Tree: Definition • A B-tree of order 2t has following properties • Each node can have at most 2t subtrees • The root has at least two subtrees unless it is a leaf. • Each non-root and each non-leaf node holds t-1 keys and t pointers to subtrees • Each leaf node holds t-1 keys where . • All leaves are on the same level. • It is clear that B-tree is always at least half full, has fewer levels and is perfectly balanced. 25 62 12 19 32 39 73 84 3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94

  32. The Concept • You can delete a key entry from any node. • ->Therefore, you must ensure that before/after deletion, the B-Tree maintains its properties. • When deleting, you have to ensure that a node doesn’t get too small (minimum node size is T – 1). We prevent this by combining nodes together.

  33. Lets look at an example: We’re given this valid B-Tree Note: T = 3 Source: Introduction to Algorithms, Thomas H. Cormen

  34. Deletion Cases • Case 1: If the key k is in node x and x is a leaf node having atleast t keys - then delete k from x. x x leaf … … …k …  t–1 keys  t keys

  35. Simple Deletion Case 1: We delete “F” Result: We remove “F” from the leaf node. No further action needed. F Source: Introduction to Algorithms, Thomas H. Cormen

  36. Deletion Cases (Continued) • Case 2: If the child key k is in node x and x is an internal node, do the following: x not a leaf … k … z y

  37. Deletion Cases (Continued) • Subcase a: If the child y that precedes k has at least t keys then find predecessor k´ of k in subtree rooted at y, recursively delete k´ and replace k by k´ in x. x x not a leaf … k´… … k… y y  t keys  t keys pred of k k´

  38. Deleting and shifting Case 2a: We deleted “M” Result: We remove “M” from the parent node. Since there are four nodes and two letters, we move “L” to replace “M”. Now, the “N O” node has a parent again. M L Source: Introduction to Algorithms, Thomas H. Cormen

  39. Deletion Cases (Continued) Subcase B: Symmetrically, if the child z that follows k in node x has at least t keys then find successor k´ of k in subtree rooted at z, recursively delete k´and replace k by k´ in x. x x not a leaf … k´… … k … z z  t keys  t keys succ of k k´

  40. Deletion Cases (Continued) Subcase C: y and z both have t–1 keys -- merge k and z into y, free z, recursively delete k from y. x x not a leaf not a leaf … … … k … y y’s keys, k, z’s keys y z t–1 keys t–1 keys 2t–1 keys

  41. Combining and Deleting Case 2c: Now, we delete “G” Result: First, we combine nodes “DE” and “JK”. Then, we push down “G” into the “DEJK” node and delete it as a leaf. C L D E J K G Source: Introduction to Algorithms, Thomas H. Cormen

  42. Deletion Cases (Continued) Case 3: k not in internal node. Let ci[x] be the root of the subtree that must contain k, if k is in the tree. If ci[x] has at least t keys, then recursively descend; otherwise, execute 3.A and 3.B as necessary.

  43. Deletion Cases (Continued) Subcase A: ci[x] has t–1 keys, some sibling has at least t keys. recursively descend x x not a leaf k1 k2 … … … … ci[x] ci[x] k2 k1 … … t–1 keys t keys k k

  44. Deleting “B” Before: After: Deleted “B”, Demoted “C”, Promoted “E”

  45. Deletion Cases (Continued) Subcase B: ci[x] and sibling both have t–1 keys. recursively descend x x … … not a leaf k1 … … ci[x] k1 ci[x] ci[x]’s keys, , ci+1[x]’s keys ci+1[x] t–1 keys 2t–1 keys t–1 keys k k

  46. Combining and Deleting Case 3b: Now, we delete “D” Result: First, we combine nodes “DE” and “JK”. Then, we push down “G” into the “DEJK” node and delete “D” as a leaf. C L E G J K D Source: Introduction to Algorithms, Thomas H. Cormen

  47. 12 2 56 29 7 69 52 9 72 15 31 22 43 Type #1: Simple leaf deletion Assuming a 5-way B-Tree, as before... Delete 2: Since there are enough keys in the node, just delete it

  48. 12 56 29 69 52 72 7 9 15 31 22 43 Type #2: Simple non-leaf deletion 56 Delete 52 Borrow the predecessor or (in this case) successor

More Related