1 / 39

B-Trees: Balanced Trees for Use with Random Access Secondary Storage

B-Trees: Balanced Trees for Use with Random Access Secondary Storage. Gerda Kamberova Department of Computer Science Hofstra University. Overview. Dynamic Set/Dictionary on a Disk Drive: B-trees Memory Motivation Memory hierarchy

Télécharger la présentation

B-Trees: Balanced Trees for Use with Random Access Secondary Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. B-Trees:Balanced Trees for Use with Random Access Secondary Storage Gerda Kamberova Department of Computer Science Hofstra University G.Kamberova, Algorithms

  2. Overview • Dynamic Set/Dictionary on a Disk Drive: B-trees • Memory • Motivation • Memory hierarchy • Impact of memory organization on the running time of algorithms • B-trees • Definition and examples • bounding the height of a B-tree • Operations on a B-tree: search, insert, delete G.Kamberova, Algorithms

  3. Memory Hierarchy • Up to now we assumed that read and write are done from/to main memory and that it takes fixed minimal amount of time to complete them those operations • Some applications deal with huge amounts of data that cannot all fit into the main memory: • analysis of sci data • processing financial transactions, • organization and maintenance of databases • telephone directories, • library catalogs, etc. • B-Trees are balanced search trees designed to work well on direct access secondary storage devices (minimizing disk I/O operations) G.Kamberova, Algorithms

  4. Memory Hierarchy • Computers have hierarchy of different memories which vary in size and speed (in increasing size and decreasing speed order): • CPU registers, slower, larger in size, • Cache, order of magnitude slower than cache • Main memory (RAM): about 2 orders of magnitude slower than cache • SRAM • DRAM • Disks: 100,000 to 1,000,000 times slower than main memory • OS support general mechanisms that allow most memory accesses to be fast. The mechanism are based on the property locality-of-reference of property of most software. G.Kamberova, Algorithms

  5. Locality-of-Reference and Memory Access • Locality-of-reference • Temporal locality (TL): if a program accesses a certain memory location now, it is likely it will access it in the near future • Spatial locality (SL): if a program accesses a certain memory location now, it is likely it will access other close-by locations in near future • Caching and Blocking • design choices for two-level memory systems • present in interfaces • between main memory and cache memory • between external memory and main memory • Caching: motivated by TL, • bring data from main memory into cache , hoping that they will be needed soon, and then the response will be fast then going to main memory • Data are accessed in blocks called cache lines • Blocking: motivated by SL, • If location x is required from secondary memory, bring to main memory not only data from x , but also data from close by locations to x • Data are accessed in blocks, called pages (disk blocks). G.Kamberova, Algorithms

  6. Implications of Locality-of-Reference • In addition, the blocking for external memory is motivated by hardware characteristics of external storage devices • By using blocking the secondary memory is perceived much faster then it is. • Implications of locality-of-reference for programmers. • The programmer usually does not have to be overly concerned with memory hierarchy and how blocking and caching are implemented, still one should try to • Use TL: if an algorithm calls for several accesses to the same variable, try to group these accesses as close as possible in execution order . • Use SL: if an algorithm calls for accessing a certain location x in an array or a certain field in an object, try to group access to locations spatially close to x as close as possible in execution order • When selecting an algorithm can we take an advantage of the locality of reference? G.Kamberova, Algorithms

  7. Dynamic Set on Secondary Storage • Goal:minimize disk accesses needed to perform search or updates. • It is preferable to do many main memory accesses instead of one disk access. • Disk accesses complexity on various implementations of a dynamic set • Use # pages (blocks) read from disk as crude approximation of time spent accessing the disk • doubly linked list: search O(n), each successive linc requires a different block • Sorted array: search is O(log n), still require Theta(n/B) accesses for insert and delete. • Balanced BST, skip lists or other structures with logarithmic times: worst case, each accessed node is in a different block –O(log n) accesses. • B-Trees: O(log n/log B) • Idea: Trade 1 slow disk access for O(B) very fast , where B is the block size. G.Kamberova, Algorithms

  8. B-Trees • Balanced search trees designed to work well on data stored on disks. Multiple keys are stored sorted in a node. If a node keeps m keys, it has m+1 children. • Property: n-node B-tree has height O(log n) • Max branching factor (BF) depends on disk block size. • For large B-trees stored on disk, branching factor (BF) between 50 an 2000 often used. • With BF=1001 (1000 keys per node), • how many nodes are in tree of height 2? • how many keys can be stored in tree of height 2? • since the root is kept in main memory, at most 2 disk accesses will be necessary to locate any key. Root(T) is M M height 2 D H Q T X B C F G S K L N P R S V W Y Z G.Kamberova, Algorithms

  9. Conventions • Modify pseudo code language by adding • DiskRead(x): reads page containing object x into main memory • DiskWrite(x): writes page containing object x into secondary storage • Assume pages no more in used are flushed from main memory • Usually want B-tree node to be the size of a whole disk page • For simplicity, ignore “data” information, in practice most common to store with each key a pointer to another disk page with the data. G.Kamberova, Algorithms

  10. B-Tree Definition B-Tree is a rooted tree with nodes having the following properties. • Every x has the following fields: • n[x], number keys stored in x • The n[x] keys are sorted key1[x]<=key2[x]<=…<= keyn[x][x] • leaf[x] is TRUE if x is a leaf, and FALSE otherwise • If x is an internal node, x has n[x]+1 children which are accessed by pointers c1[x] <= c2[x] <= … <= cn[x]+1[x] (analogy with left[x] and right[x] on binary tree) • The keys in a node x separate the ranges of the keys stored in the children key1[x] key2[x] … keyn[x][x] C1[x] C2[x] Cn[x]-1[x] Cn[x][x] … <key1[x] >key1[x] <key2[x] >key n[x]-1 [x] <key n[x] [x] >keyn[x][x] G.Kamberova, Algorithms

  11. B-Tree Definition (cont) 4. Every leaf has the same depth, the height of the tree h 5. Let t>=2 be an integer the minimum branching factor (the minimum out-degree of the B-Tree). • Every node except the root must have >= t-1 keys and thus >= t children (n[x]>=t-1) • If the tree is not empty, n[root[T]]>=1, and thus the root has at least 2 children • Every node contains <= 2t-1 keys, and thus has at most 2t children Thus: BF of the root is between 2 and 2t, each node other than the root has BF between t and 2t Example: t=2, every internal node has between 2 and 4 children (2-3-4 tree) G.Kamberova, Algorithms

  12. The Height of B-Tree • The number disk accesses for the operation is bounded by the height, thus O(h) • Theorem: If n >= 1, then for any n-key B-tree T of height h and minimum BF t >=2, Proof: If we prove the statement for the min number-key B-tree of height h, M , then it will be true for any tree of height h. G.Kamberova, Algorithms

  13. B-Tree Height • Proof: (cont) Root(T) t t t t t t G.Kamberova, Algorithms

  14. Basic Operations Assume root(T) always in main memory, so never do DiskRead on the root,however must do DiskWrite when the root is changed. • Searching: stright forward generalization of BST search • Ex: search S • Complexity: • to find/not find the node <= log(n+1)/log t • At each node, O(log t) to do Binary search on the sorted keys and decide which child to go to • 1 DiskRead to get the page containing the child Root(T) M D H Q T X B C F G S K L N P R S V W Y Z G.Kamberova, Algorithms

  15. Basic B-Tree Operations • Creating an empty tree: O(1) time • Splitting a node • Important operation for insertion is splitting a full node y (with 2t-1 keys) around its median key into 2 nodes having t-1 keys each. • The median key moves into y’s parent (which must not be full prior to splitting y) • Ex: t=4, max 7 keys in node, max BF 8 • If y is the root, the tree grows in height by 1 … N S W … … N W … Tc Ta Tc Ta Tb Tb2 Tb1 P Q R S T U V P Q R T U V 8 8 4 5 6 7 1 5 6 7 1 2 3 2 3 4 G.Kamberova, Algorithms

  16. Basic Operations: Split (cont) • B_Tree_split-child(x,i,y), • splits the full child y of the non-full node x already read into memory into two subtrees, • Median key moves into x • Complexity of Split: • Time: to copy half of pointers and keys into new nodes and remove y • Disk access: allocate one node on disk + write 3 to disk, O(1) G.Kamberova, Algorithms

  17. Basic Operations: Insert • Idea: • Use a single pass going down the tree, as for search; search is performed to locate the leaf in which to insert the new key. At each non-full node a binary search will be performed to decide which subtree to follow • whenever full nodes are encountered on the search path, split them, and continue recursively insert on one of the newly created subtrees. • Start at the root, • if it is full, prior to continuing, create a new node and split the root pushing the median key of the root up into the new node. (This is the only way B-Tree height grows.) Root[T] root[T] H A D F H L N P A D F L N P G.Kamberova, Algorithms

  18. Basic Operations: Insert (cont) The procedure in the textbook implements this one-pass insert. • It starts from the root, • if it is full, prior to continuing, it will create a new node and split the root pushing the median key of root up into the new node. The key is inserted always in a non full leaf (terminating condition for the recursion). • During the search the procedure detects a full child that must be visited and splits it prior to making a recursive call to one of the two new children. This will guarantee, that when a key is inserted into a leaf, the leaf is non full. • Complexity of Insert: • The number of disk accesses (nodes read) is O(h), at most h splits, thus at most O(h) nodes allocated. • The CPU time O(th). G.Kamberova, Algorithms

  19. G M P X G M P X A B D E A B C D E J K J K R S T U V R S T U V Y Z Y Z N O N O Insert Example • Given, t=3, full node has 5 keys • Insert C Insert in non-full leaf G.Kamberova, Algorithms

  20. G M P X G M P X A B C D E A B C D E J K J K R S T U V R S T U V Y Z Y Z N O N O Insert Example • Given, t=3 • Insert Q split G M P T X A B C D E J K Q R S U V N O Y Z G.Kamberova, Algorithms

  21. Insert Example Full root, split G M P T X • Given, t=3 • Insert L A B C D E J K QR S U V N O Y Z P G M T X A B C D E J K QR S U V Y Z N O Insert here P G M T X A B C D E J K L QR S U V Y Z N O G.Kamberova, Algorithms

  22. P G M T X A B C D E J K L QR S U V Y Z N O P C G M T X A B J K L QR S U V Y Z N O D E F Insert Example • Given, t=3 • Insert F Full , split G.Kamberova, Algorithms

  23. Basic Operation: Deletion • Key ideas is to • ensure as you move down the tree that the node to visit (i.e. on the path from the root to the node with the key to be deleted) has at least 1+(t-1) = t, keys, if not we’ll rearrange the tree before continuing • this way, if a key is deleted from a node still the min # keys is maintained • Let x be the current node when searching for the node with key k to delete • Case 1:x is a leaf, just delete k from x • Case 2: k is in x; • let y is the child before k and z is the child after • Case2a: at least t keys in y • Case2b: at least t keys in z • Case2c:t-1 keys in both y and z (will have to rearrange) • Note: y and z could be leaves G.Kamberova, Algorithms

  24. Deletion, Cases 2a,b,c: x has the key Keep in memory to put pred x x y y z z … … … … … … …c k o… …c j o… (y has at least t) 2a … … … … f m f m Delete of pred of k (recursively) Ta Tb Ta Tb j i j i k’s pred k’s succ Merge the nodes y and z moving k as a median key In the new node x, delete k recursively from x. Note that if x was the root with single (y and z have t-1 each)key k, the height shrinks. Delete of succ of k (recursively) 2b 2c (z has at leasl t) x y z … … … …c o… … … … …c i o… Keep in memory to put pred …f km… … … x f m Ta Tb Ta Tb G.Kamberova, Algorithms j i j i

  25. Basic Operation: Deletion (cont) • Key idea is to • ensure as you move down the tree that at the nodes visited, the number of keys is always at least one more than the minimum number allowed, t • Let x be the current node when searching for the node with key k to delete • Case 1:x is a leaf, just delete k from x • Case 2: k is in x; • Case 3: k is not in the current node x andthe node zwe want to go to next has t-1keys(need to rearrange the tree) • Case 3a: at least t keys in y • Case 3b:t-1 keys in each y and z • Note: roles of y and z can be interchanged (the rotation will change direction, see next) • Also y and z could be leaves G.Kamberova, Algorithms

  26. Deletion, Cases 3a,c: x does not have the key, node to go next has t-1 keys x (current node, has at least t) Example Delete k: y z (we want to come here, but z has t-1 keys) … … … …c i o… … … g h j Do left-to-right-like rotation around y Merge nodes y and z, drop i as median key in the merged node Ta Tc Tb t -1 keys in y (+z) At least t keys In y Rearrange so the node to visit has t keys 3a 3c has >= t-1 y newborn sib x (where you want to go, has t keys now) … … … …c h o… x … … … …c o… … … g i j … g h i j … Then continue search at x Ta Tb Tc Tb Tc Ta G.Kamberova, Algorithms

  27. Example 1 X not leaf X does not have key All on path to leaf 3 keys • Given B-tree rooted in x, t=3, delete F X P T x C G M A B D E F J K L N O Q R S U V Y Z 1 P T x C G M A B D E J K L N O Q R S U V Y Z G.Kamberova, Algorithms

  28. X not leaf X has key Y , left, has t keys Put pred of key up Example 1 • t=3, delete M X P X C G M T x A B D E J K L N O Q R S U V Y Z y 2a P T x C G L A B D E J K N O Q R S U V Y Z G.Kamberova, Algorithms

  29. X not leaf X has key Y an z have t-1 keys Merge and drop Recursively delete Example 1 x • t=3, delete G P x T x C G L A B D E J K N O Q R S U V Y Z y z 2c P T x C L A B D E G J K N O Q R S U V Y Z G.Kamberova, Algorithms

  30. X not leaf, X does not have key Z to go next has t-1 keys Y has t-1 keys too Merge y and z drop root Recursively delete Example 1 x • t=3, delete D P z y T x C L A B D E J K N O Q R S U V Y Z 3c x C L P T X D E J K N O Q R S U V Y Z A B x C L P T X D E J K N O Q R S U V Y Z A B G.Kamberova, Algorithms h shrinks

  31. X not leaf, X does not have key Z to go next has t-1 keys Y has t keys Rotate-like R to L Recursively delete Example 1 • t=3, delete B x C L P T X z A B E J K N O Q R S U V Y Z y 3a E L P T X A B C J K N O Q R S U V Y Z G.Kamberova, Algorithms h shrinks

  32. Example 2 x does not have h, node to visit has 1 key, its sibling has 2 • Given B-tree rooted in x, t=2, delete H X K F P W B H M S U Y A C D G J L N Q R T V X Z 3a x P F K W B H M S U Y A C D G J L N Q R T V X Z G.Kamberova, Algorithms

  33. Example 2 x does not have h, node to visit has 1 key, its sibling has 1 too • delete H (cont) x P F K W B H M S U Y A C D G J L N Q R T V X Z merge+drop 3b P F x W B H K M S U Y A C D G J L N Q R T V X Z G.Kamberova, Algorithms

  34. Example 2 x has h, is not leaf, y and z have 1 key each • delete H (cont) P F x W B H K M S U Y A C D G J L N Q R T V X Z y z merge+drop 2c P F W B K M S U Y G H J A C D L N Q R T V X Z x is a leaf with h, delete (case 1) G.Kamberova, Algorithms

  35. Example 2 x does not have L, node to visit has 1 key, its sibling has 1 too • Result from Delete H, now delete L x P F W B K M S U Y G J A C D L N Q R T V X Z merge+drop 3c x F P W K M B S U Y G J A C D L N Q R T V X Z G.Kamberova, Algorithms Height shrinks

  36. Example 2 x not leaf x does not have key need to go to z with 1 key into a node with 1 key, Y has 2 x • Delete L F P W x B SU Y K M G J A C D L N Q R T V X Z z y 3a F P W B SU Y J M G A C D K L N Q R T V X Z G.Kamberova, Algorithms

  37. B-Tree Delete • Recall BST delete: delete key from • leaf • internal node with one child • internal node with two children • Delete a key k from B-Tree T rooted at x. • The node x is in memory. • Go in one pass, from the root down • The procedure is always called recursively on a tree rooted in a node with at least t keys, one of these keys might have to be pushed down to a child before continuing down • If it ever happens that the root x becomes with no keys (may happen in 2c or 3b), the only child of x becomes the root, decreasing the height. Only the root may become empty (all others have > 1 key after manipulation) • Next, we just sketch the pseudo-code with the above understanding G.Kamberova, Algorithms

  38. G.Kamberova, Algorithms

  39. G.Kamberova, Algorithms

More Related