540 likes | 663 Vues
Search Structures. CHAPTER 10. All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data Structures in C”, Computer Science Press, 1992. AVL Trees. Dynamic tables may also be maintained as binary search trees.
E N D
Search Structures CHAPTER 10 All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data Structures in C”, Computer Science Press, 1992.
AVL Trees • Dynamic tables may also be maintained as binary search trees. • Depending on the order of the symbols putting into the table, the resulting binary search trees would be different. Thus the average comparisons for accessing a symbol is different.
Binary Search Tree for The Months of The Year Input Sequence: JAN, FEB, MAR, APR, MAY, JUNE, JULY, AUG, SEPT, OCT, NOV, DEC JAN FEB MAR JUNE MAY APR JULY SEPT AUG DEC OCT Max comparisons: 6 Average comparisons: 3.5 NOV
A Balanced Binary Search Tree For The Months of The Year JAN Input Sequence: JULY, FEB, MAY, AUG, DEC, MAR, OCT, APR, JAN, JUNE, SEPT, NOV Max comparisons: 4 Average comparisons: 3.1 JULY FEB MAY AUG MAR OCT APR DEC JUNE NOV SEPT
Degenerate Binary Search Tree APR AUG Input Sequence: APR, AUG, DEC, FEB, JAN, JULY, JUNE, MAR, MAY, NOV, OCT, SEPT DEC FEB JAN JULY JUNE MAR MAY NOV Max comparisons: 12 Average comparisons: 6.5 OCT SEPT
Minimize The Search Time of Binary Search Tree In Dynamic Situation • From the above three examples, we know that the average and maximum search time will be minimized if the binary search tree is maintained as a complete binary search tree at all times. • However, to achieve this in a dynamic situation, we have to pay a high price to restructure the tree to be a complete binary tree all the time. • In 1962, Adelson-Velskii and Landis introduced a binary tree structure that is balanced with respect to the heights of subtrees. As a result of the balanced nature of this type of tree, dynamic retrievals can be performed in O(log n) time if the tree has n nodes. The resulting tree remains height-balanced. This is called an AVL tree.
AVL Tree • Definition: An empty tree is height-balanced. If T is a nonempty binary tree with TL and TR as its left and right subtrees respectively, then T is height-balanced iff (1) TL and TR are height-balanced, and (2) |hL – hR| ≤ 1 where hL and hR are the heights of TL and TR, respectively. • Definition: The Balance factor, BF(T) , of a node T is a binary tree is defined to be hL – hR, where hL and hR, respectively, are the heights of left and right subtrees of T. For any node T in an AVL tree, BF(T) = -1, 0, or 1.
Balanced Trees Obtained for The Months of The Year -2 0 0 RR MAR MAY MAR -1 0 0 MAY NOV MAR 0 (a) Insert MARCH NOV (c) Insert NOVEMBER -1 +1 MAR MAY 0 0 +1 MAY NOV MAY 0 (b) Insert MAY AUG (d) Insert AUGUST
Balanced Trees Obtained for The Months of The Year (Cont.) +2 +1 MAY LL MAY 0 +2 0 0 NOV MAR NOV AUG +1 0 0 AUG APR MAR 0 (e) Insert APRIL APR 0 +2 MAR MAY -1 0 0 -1 LR MAY NOV AUG AUG 0 0 0 0 +1 NOV APR APR JAN MAR 0 JAN (f) Insert JANUARY
Balanced Trees Obtained for The Months of The Year (Cont.) +1 +1 MAR MAR -1 -1 -1 -1 MAY AUG MAY AUG 0 0 0 0 0 +1 NOV APR JAN NOV APR JAN 0 0 0 JULY DEC DEC (h) Insert JULY (g) Insert DECEMBER
Balanced Trees Obtained for The Months of The Year (Cont.) +2 +1 MAR MAR RL -2 -1 -2 0 MAY MAY AUG DEC 0 0 0 +1 +1 0 NOV NOV APR AUG JAN JAN 0 0 0 -1 0 JULY APR DEC JULY FEB 0 FEB (i) Insert FEBRUARY
Balanced Trees Obtained for The Months of The Year (Cont.) DEC MAY AUG AUG FEB JAN JULY MAY 0 APR FEB +2 MAR 0 LR -1 -1 JAN 0 +1 DEC MAR 0 -1 +1 NOV 0 +1 -1 -1 0 -1 JULY APR 0 0 0 0 NOV JUNE JUNE (j) Insert JUNE
Balanced Trees Obtained for The Months of The Year (Cont.) AUG AUG FEB FEB -1 -1 JAN JAN RR +1 -1 +1 0 DEC MAR DEC MAR -2 -1 +1 0 0 -1 +1 0 JULY MAY JULY NOV 0 0 0 0 0 -1 0 JUNE APR OCT MAY NOV JUNE APR 0 OCT (k) Insert OCTOBER
Balanced Trees Obtained for The Months of The Year (Cont.) DEC MAR AUG FEB JULY NOV -1 JAN -1 +1 -1 -1 0 +1 0 -1 0 0 APR JUNE OCT MAY 0 SEPT (i) Insert SEPTEMBER
Rebalancing Rotation of Binary Search Tree • LL: new node Y is inserted in the left subtree of the left subtree of A • LR: Y is inserted in the right subtree of the left subtree of A • RR: Y is inserted in the right subtree of the right subtree of A • RL: Y is inserted in the left subtree of the right subtree of A. • If a height–balanced binary tree becomes unbalanced as a result of an insertion, then these are the only four cases possible for rebalancing.
Rebalancing Rotation LL LL +1 A +2 A 0 B 0 B BL 0 A +1 B AR AR h+2 h+2 h BL BR BL BR BR AR height of BL increases to h+1
Rebalancing Rotation RR RR -1 A -2 A 0 B 0 B 0 A BR -1 B AL AL h+2 h+2 BR BL BR BL AL BL height of BR increases to h+1
Rebalancing Rotation LR(a) +1 A +2 A 0 C LR(a) 0 B -1 B 0 B 0 A 0 C
Rebalancing Rotation LR(b) 0 B -1 A CR CR LR(b) +1 A +2 A 0 C 0 B -1 B AR AR h+2 h+2 0 C +1 C h BL BL h BL AR CR CL CL CL h
Rebalancing Rotation LR(c) +1 B 0 A CR 0 C +2 A LR(c) -1 B AR h+2 -1 C BL BL AR CR CL h CL
AVL Trees (Cont.) • Once rebalancing has been carried out on the subtree in question, examining the remaining tree is unnecessary. • To perform insertion, binary search tree with n nodes could have O(n) in worst case. But for AVL, the insertion time is O(log n).
AVL Insertion Complexity • Let Nh be the minimum number of nodes in a height-balanced tree of height h. In the worst case, the height of one of the subtrees will be h-1 and that of the other h-2. Both subtrees must also be height balanced. Nh = Nh-1 + Nh-2 + 1, and N0= 0, N1 = 1, and N2 = 2. • The recursive definition for Nh and that for the Fibonacci numbers Fn= Fn-1 + Fn-2, F0=0, F1= 1. • It can be shown that Nh= Fh+2 – 1. Therefore we can derive that . So the worst-case insertion time for a height-balanced tree with n nodes is O(log n).
Probability of Each Type of Rebalancing Rotation • Research has shown that a random insertion requires no rebalancing, a rebalancing rotation of type LL or RR, and a rebalancing rotation of type LR and RL, with probabilities 0.5349, 0.2327, and 0.2324, respectively.
Comparison of Various Structures • Doubly linked list and position of x known. • Position for insertion known
2-3 Trees • If search trees of degree greater than 2 is used, we’ll have simpler insertion and deletion algorithms than those of AVL trees. The algorithms’ complexity is still O(log n). • Definition: A 2-3 tree is a search tree that either is empty or satisfies the following properties: (1) Each internal ndoe is a 2-node or a 3-node. A 2-node has one element; a 3-node has two elements. (2) Let LeftChild and MiddleChild denote the children of a 2-node. Let dataL be the element in this node, and let dataL.key be its key. All elements in the 2-3 subtree with root LeftChild have key less than dataL.key, whereas all elements in the 2-3 subtree with root MiddleChild have key greater than dataL.key. (3) Let LeftChild, MiddleChild, and RightChild denote the children of a 3-node. Let dataL and dataR be the two elements in this node. Then, dataL.key < dataR.key; all keys in the 2-3 subtree with root LeftChild are less than dataL.key; all keys in the 2-3 subtree with root MiddleChild are less than dataR.key and greater than dataL.key; and all keys in the 2-3 subtree with root RightChild are greater than dataR.key. (4) All external nodes are at the same level.
2-3 Tree Example A 40 B C 10 20 80
The Height of A 2-3 Tree • Like leftist tree, external nodes are introduced only to make it easier to define and talk about 2-3 trees. External nodes are not physically represented inside a computer. • The number of elements in a 2-3 tree with height h is between 2h - 1 and 3h - 1. Hence, the height of a 2-3 tree with n elements is between and
2-3 Tree Data Structure typedef struct two_three *two_three_ptr; struct two_three { element data_l, data_r; two_three_ptr left_child, middle_child, right_child; };
Searching A 2-3 Tree • The search algorithm for binary search tree can be easily extended to obtain the search function of a 2-3 tree (Search()23). • The search function calls a function compare that compares a key x with the keys in a given node p. It returns the value 1, 2, 3, or 4, depending on whether x is less than the first key, between the first key and the second key, greater than the second key, or equal to one of the keys in node p. Program 10.4: Function to search a 2-3 tree
Insertion Into A 2-3 Tree • First we use search function to search the 2-3 tree for the key that is to be inserted. • If the key being searched is already in the tree, then the insertion fails, as all keys in a 2-3 tree are distinct. Otherwise, we will encounter a unique leaf node U. The node U may be in two states: • the node U only has one element: then the key can be inserted in this node. • the node U already contains two elements: A new node is created. The newly created node will contain the element with the largest key from among the two elements initially in p and the element x. The element with the smallest key will be in the original node, and the element with median key, together with a pointer to the newly created node, will be inserted into the parent of U.
Insertion to A 2-3 Tree Example A A 20 40 40 C B D B C 70 80 10 30 10 20 70 80 (b) 30 inserted (a) 70 inserted
Insertion of 60 Into Figure 10.15(b) 60 80 10 30 G 40 A F 20 70 E B D C
Node Split • From the above examples, we find that each time an attempt is made to add an element into a 3-node p, a new node q is created. This is referred to as a node split. Program 10.5: Insertion into a 2-3 tree (P.501)
Deletion From a 2-3 Tree • If the element to be deleted is not in a leaf node, the deletion operation can be transformed to a leaf node. The deleted element can be replaced by either the element with the largest key on the left or the element with the smallest key on the right subtree. • Now we can focus on the deletion on a leaf node.
Deletion From A 2-3Tree Example A A 50 80 50 80 B C D B C D 90 95 10 20 60 90 95 10 20 60 70 (b) 70 deleted A (a) Initial 2-3 tree 50 80 B C D 95 10 20 60 (c) 90 deleted
Deletion From A 2-3Tree Example (Cont.) A A (e) 95 deleted (d) 60 deleted 20 20 80 B C B C D 10 50 80 95 10 50 A (g) 10 deleted (f) 50 deleted 20 B 20 80 B C 10 80
Rotation and Combine • As shown in the example, deletion may invoke a rotation or a combine operations. • For a rotation, there are three cases • the leaf node p is the left child of its parent r. • the leaf node p is the middle child of its parent r. • the leaf node p is the right child of its parent r.
Three Rotation Cases d d r r r w z y ? x ? p q q p q p x z x y y z a c a d c c b b e b (a) p is the left child of r r r r w y y ? z ? q p q p q p a x z x y x z a d c c a d b c b b e d (b) p is the middle child of r (c) p is the right child of r
Steps in Deletion From a Leaf Of a 2-3 Tree • Step 1: Modify node p as necessary to reflect its status after the desired element has been deleted. • Step 2: while( p has zero elements && p is not the root ) { let r be the parent of p; let q be the left or right sibling of p ( as appropriate ); if( q is a 3-node ) rotate; else combine; p=r; } • Step 3: If p has zero elements, then p must be the root. The left child of p becomes the new root, and node p is deleted.
Combine When p is the Left Child of r r r z x z p p q x y y a c b a c b (a) r r x z z q p p d y x d a a b c b c (b)
M-Way Search Tree Definition: An m-way search tree, either is empty or satisfies the following properties: • The root has at most m subtrees and has the following structures: n, A0, (K1, A1), (K2, A2), …, (Kn, An) where the Ai, 0 ≤ i ≤ n ≤ m, are pointers to subtrees, and the Ki, 1 ≤ i ≤ n ≤ m, are key values. (2) Ki < Ki +1, 1 ≤ i ≤ n (3) All key values in the subtree Ai are less than Ki +1 and greater then Ki , 0 ≤ i ≤ n (4) All key values in the subtree An are greater than Kn , and those in A0 are less than K1. (5) The subtrees Ai, 0 ≤ i ≤ n , are also m-way search trees.
Searching an m-Way Search Tree • Suppose to search a m-Way search tree T for the key value x. Assume T resides on a disk. By searching the keys of the root, we determine i such that Ki ≤ x < Ki+1. • If x = Ki, the search is complete. • If x ≠ Ki, x must be in a subtree Ai if x is in T. • We then proceed to retrieve the root of the subtree Ai and continue the search until we find x or determine that x is not in T.
Searching an m-Way Search Tree • The maximum number of nodes in a tree of degree m and height h is • Therefore, for an m-Way search tree, the maximum number of keys it has is mh - 1. • To achieve a performance close to that of the best m-way search trees for a given number of keys n, the search tree must be balanced.
B-Tree Definition: A B-tree of order m is an m-way search tree that either is empty or satisfies the following properties: • The root node has at least two children. • All nodes other than the root node and failure nodes have at least children. • All failure nodes are at the same level.
B-Tree (Cont.) • Note that 2-3 tree is a B-tree of order 3 and 2-3-4 tree is a B-tree of order 4. • Also all B-trees of order 2 are full binary trees. • A B-tree of order m and height l has at most ml -1 keys. • For a B-tree of order m and height l, the minimum number of keys (N) in such a tree is • If there are N key values in a B-tree of order m, then all nonfailure nodes are at levels less than or equal to l, . The maximum number of accesses that have to be made for a search is l. • For example, a B-tree of order m=200, an index with N ≤ 2x106-2 will have l ≤ 3.
The Choice of m • B-trees of high order are desirable since they result in a reduction in the number of disk accesses. • If the index has Nentries, then a B-tree of order m=N+1 has only one level. But this is not reasonable since all the N entries can not fit in the internal memory. • In selecting a reasonable choice for m, we need to keep in mind that we are really interested in minimizing the total amount of time needed to search the B-tree for a value x. This time has two components: • the time for reading in the node from the disk • the time needed to search this node for x.
The Choice of m (Cont.) • Assume a node of a B-tree of order m is of a fixed size and is large enough to accommodate n, A0 , and m-1 triple (Ki , Ai , Bi), 1 ≤ j < m. • If the Ki are at most charactersα long and Ai and Bi each characters βlong, then the size of a node is about m(α+2β). Then the time to access a node is ts + tl +m(α+2β) tc = a+bm where a = ts + tl = seek time + latency time b = (α+2β) tc , and tc = transmission time per character. • If binary search is used to search each node of the B-tree, then the internal processing time per node is c log2 m+d for some constants c and d. • The total processing time per node is τ= a + bm + c log2 m+d • The maximum search time is where f is some constant.
Figure 10.37: Plot of (35+0.06m)/log2m Total maximum search time 6.8 5.7 50 125 400 m
Insertion into a B-Tree • Instead of using 2-3-4 tree’s top-down insertion, we generalize the two-pass insertion algorithm for 2-3 trees because 2-3-4 tree’s top-down insertion splits many nodes, and each time we change a node, it has to be written back to disk. This increases the number of disk accesses. • The insertion algorithm for B-trees of order m first performs a search to determine the leaf node p into which the new key is to be inserted. • If the insertion of the new key into p results p having m keys, the node p is split. • Otherwise, the new p is written to the disk, and the insertion is complete. • Assume that the h nodes read in during the top-down pass can be saved in memory so that they are not to be retrieved from disk during the bottom-up pass, then the number of disk accesses for an insertion is at most h (downward pass) +2(h-1) (nonroot splits) + 3(root split) = 3h+1. • The average number of disk accesses is approximately h+1 for large m.