Télécharger la présentation
## B-trees

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**B-trees**Eduardo Laber David Sotelo**What are B-trees?**• Balanced search treesdesigned for secondarystoragedevices • Similar to AVL-treesbutbetter at minimizing disk I/O operations • Main data structureusedby DBMS to store and retrieveinformation**What are B-trees?**• Nodesmayhavemanychildren (from a few to thousands) • Branchingfactorcanbe quite large • EveryB-tree of n keyshasheight O(log n) • In practice, its height is smallerthantheheight of anAVL-Tree**Definition of B-trees**B-tree is a rootedtreecontainingthefollowingfiveproperties: • Everynode x hasthefollowingattributes: • x.n, thenumber of keysstored in node x • Thex.n keys: x.key1 ≤ x.key2≤ ... ≤ x.keyx.n • Theboolenx.leafindicatingif x is a leaforaninternalnode**Definition of B-trees**• If x is aninternalnode it containsx.n + 1 pointers x.p1 , x.p2, ... , x.p(x.n + 1) to its children • Thekeysx.keyiseparate ranges of treesstored in eachsubtree(x.pi , x.pi+1 ) • Allleaveshavethesamedepth == tree’sheight.**Definition of B-trees**• Bounds on thenumber of keys of a node: • LetB be a positive integerrepresentingtheorder of theB-tree. • Everynode (excepttheroot) musthaveat leastBkeys. • Everynode (excepttheroot) musthaveat most2Bkeys. • Root is free to containbetween1 and 2Bnodes (why?)**Exercise 1**EnumerateallvalidB-trees of order 2 thatrepresentthe set {1, 2, ... , 8}**Exercise 1**Solution: 4 5 1 2 3 5 6 7 8 1 2 3 4 6 7 8 3 6 1 2 4 5 7 8**Theheight of a B-tree**Theorem: Lethbetheheight of a B-tree of nkeys and orderB > 1. Then:h ≤ logB (n+1)/2 Proof: • Rootcontains at leastonekey. • Allothernodescontain at least B keys • At leastonekey at depth 0 • At least 2B keys at depth 1 • At least 2B2 + B keys at depth 2 • At least 2Bi+ Bi-1 + Bi-2 + ... + B keys at depth i**Searching a B-tree**• Similar to searching a binary search tree. • Multiwaybranchingdecisionaccording to thenumber of thenode’schidren. • Recursiveprocedurewith a time complexity of O(B logB n) for a tree of nkeys and orderB.**Searching a B-tree**B-TREE-SEARCH (x, k) • i = 1 • while i ≤ x.n and k > x.keyido i = i + 1 • if i ≤ x.n and k == x.keyithenreturn (x, i) • ifx.leafthenreturnNIL • else DISK-READ(x.pi) returnB-TREE-SEARCH (x.pi, k)**Searching a B-tree**• Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O**Searching a B-tree**• Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O**Searching a B-tree**• Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O**Searching a B-tree**• Search for thekeyN J KP S C G D E F H I L M Q R T U A B N O**Searching a B-tree**Lemma: The time complexity of procedure B-TREE-SEARCH is O(B logB n) Proof: • Number of recursivecalls is equal to tree’sheight. • Theheight of a B-tree is O(logB n) • CostbetweenB and 2Biterations per call. • Total of O(B logB n) steps. ■**Exercise 2**• Supposethat B-TREE-SEARCH is implemented to use binary search ratherthan linear search withineachnode. • Show thatthischangesmakesthe time complexity O(lg n), independently of howBmightbechosen as a function of n.**Exercise 2**Solution: • Byusingbinary search thenumber of steps of thealgorithmbecomesO(lg B logB n) . • Observe thatlogB n = lg n / lg B . • ThereforeO(lg B logB n) = O(lg n).**Linear orBinaryB-tree search ?**Lemma: If 1 < B < n thenlg n≤B logB n Proof:**Inserting a key into a B-tree**• The new key is always inserted into an existing leaf node (why?) • Firstly we search for the leaf position at which to insert the new key. • If such a node is full we split it. • A split operation splits a full node around its median key into two nodes having B keys each. • Median key moves up into splitted node’s parent (insertionrecursivecall).**Split operation**• Inserting key F into a full node (B = 2) J A C E G K M O Q**Split operation**• Node found but already full J A C E F G K M O Q**Split operation**• Median key identified J A C E F G K M O Q**Split operation**• Splitting the node E J A C F G K M O Q**Inserting a key into a B-tree**• Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M O Q**Inserting a key into a B-tree**• Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M N O Q**Inserting a key into a B-tree**• Insertion can be propagated upward (B = 2) E J N T X A C F G K M O Q U W Y Z SPLIT**Inserting a key into a B-tree**• Insertion can be propagated upward (B = 2) N SPLIT E J T X A C F G K M O Q U W Y Z**Inserting a key into a B-tree**B-TREE-INSERT (x, k, y) • i = 1 • while i ≤ x.n and k < x.keyido i = i + 1 • x.n = x.n + 1 • x.keyi = k • x.pi+1 = y • for j = x.n downto i+1 do • x.keyj = x.keyj-1 • x.pj = x.pj-1 • end-for • DISK-WRITE(x)**Inserting a key into a B-tree**B-TREE-INSERT (x, k) • if x.n > 2*B then • [m, z] = SPLIT (x) • ifx.parent != NIL then • DISK-READ (x.parent) • end-if • else • x.parent = ALLOCATE-NODE() • DISK-WRITE (x) • root = x.parent • end-else • B-TREE-INSERT (x.parent, m, z) • end-if**Inserting a key into a B-tree**SPLIT (x) • z = ALLOCATE-NODE() • m = FIND-MEDIAN (x) • COPY-GREATER-ELEMENTS(x, m, z) • DISK-WRITE (z) • COPY-SMALLER-ELEMENTS(x, m, x) • DISK-WRITE (x) • return [m, z]**Inserting a key into a B-tree**• Function B-TREE-INSERT has three arguments: • The node x at which an element of key k should be inserted • The key k to be inserted • A pointer y to the left child of k to be used as one of the pointers of x during insertion process. • There is a global variable named root which is a pointer to the root of the B-Tree. • Observe that the field x.parent was not defined as an original B-tree attribute, but is considered just to simplify the process. • The fields x.leaf should also be updated accordingly.**Inserting a key into a B-tree**Lemma: The time complexity of B-TREE-INSERTis O(B logB n) Proof: • Recall that B-TREE-SEARCH function is calledfirst and costs O(log n) byusingbinary search. Then, B-TREE-INSERT starts byvisiting a node and proceedsupward. • At mostonenode is visited per level/depth and onlyvisitednodescanbesplitted. A mostonenode is createdduringtheinsertionprocess. Cost for splitting is proportional to 2B. • Number of visitednodes is equal to tree’sheight and theheight of a B-tree is O(logB n). CostbetweenB and 2Biterations per visitednode. Total of O(B logB n) steps. ■**Some questions on insertion**• Whichsplitoperationincreasesthetree’sheight? Thesplit of theroot of thetree. • HowmanyDISK-READoperations are executedbytheinsertionalgorithm? Everynodewas read at leasttwice. • Does binary search makesensehere? Notexactly. Wealreadypay O(B) to split a node (for findingthemedian).**Drawbacks of ourinsertionmethod**• Oncethatthekey’sinsertionnode is found it maybenecessary to read its parentnodeagain (due to splitting). • DISK-READ/WRITEoperations are expensive and wouldbeexecutedalleasttwice for eachnode in thekey’s path. • It wouldbenecessary to store a nodes’sparentor to use therecursionstack to keep its reference. • (Mond and Raz, 1985)provide a solution thatspendsoneDISK-READ/WRITE per visitednode (See at CLRS)**Exercise 3**• Show theresults of insertingthekeys E, H, B, A, F, G, C, J, D, I in orderintoanemptyB-tree of order 1.**Exercise 3**Solution: (final configuration) E B G I A C D F H J**Exercise 4**• Does a B-tree of order 1 is a goodchoice for a balanced search tree? • Whatabouttheexpressionh ≤ logB (n+1)/2when B = 1?**Deleting a key from a B-tree**• Analogous to insertionbut a little more complicated. • A keycanbedeleted from anynode (notjust a leaf) and canaffect its parent and its children (insertionoperationjustaffectparents). • Onemustensurethat a node does notget to small duringdeletion (lessthanBkeys). • As a resultdeleting a node is themostcomplexoperation on B-trees. It willbeconsidered in 4 particular cases.**Deleting a key from a B-tree**• Case 1: The key is in a leaf node with more thanB elements. Procedure: • Just remove thekey from thenode.**Deleting a key from a B-tree**• Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A C D F G K M O Q U W Y Z**Deleting a key from a B-tree**• Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A D F G K M O Q U W Y Z**Deleting a key from a B-tree**Case 2: Thejoinprocedure • The key k1 to be deleted is in a leaf x with exactly B elements. • Let y be a node that is an “adjacent brother” of x. • Suppose that y has exactly B elements. Procedure: • Remove thekeyk1. • Let k2bethekeythatseparatesnodesx and y in theirparent. • Jointhethenodesx and y and move thekeyk2 from theparent to thenewjoinednode. • Iftheparent of x becomeswithB-1 elements and alsohasan “adjacent brother” withBelements, applythejoinprocedurerecursively for theparent of x (seen as x) and its adjacent brother (seen as y).**Deleting a key from a B-tree**• Case 2: Delete key Q (B = 2) F K T X ... H I O Q U W Y Z**Deleting a key from a B-tree**• Case 2: Delete key Q (B = 2) F Parent K T X ... H I O Q U W Y Z Node x Node y**Deleting a key from a B-tree**• Case 2: Delete key Q (B = 2) F Parent K T X ... H I O U W Y Z Node x Node y**Deleting a key from a B-tree**• Case 2: Delete key Q (B = 2) F Parent K T X ... H I O U W Y Z Node x Node y