Download Presentation
## BST Data Structure

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**BST Data Structure**• A BST node contains: • A key (used to search) • The data associated with that key • Pointers to children, parent • Leaf nodes have NULL pointers for children • A BST contains • A pointer to the root of the tree.**BST Operations: Insert**• BST property must be maintained • Algorithm sketch: • To insert data with key k • Compare k to root.key • If k < root.key, go left • If k > root.key, go right • Repeat until you reach a leaf. That's where the new node should be inserted. • Note: keep track of prospective parent along the way.**BST Operations: Insert**• Running time: • The new node is inserted at a leaf position, so this depends on the height of the tree. • Worst case: • Inserting keys 1,2,3,... in this order will result in a tree that looks like a chain: • Tree has degenerated to list • Height : linear • Note also that such a tree is worsethan a linked list since it takes upmore space (more pointers) 1 2 3**BST Operations: Insert**• Running time: • The new node is inserted at a leaf position, so this depends on the height of the tree. • Best case • The top levels of the tree are filled up completely • The height is then lognwhere n is the numberof nodes in the tree. 12 4 14 2 8 16**BST Operations: Insert**• The height of a complete (i.e. all levels filled up) BST with n nodes is logarithmic. Why? • Level i has 2i nodes, for i=0 (top level) through h (=height) • The total number of nodes, n, is then:n = 20+21+...+2h = (2h+1-1)/(2-1) = 2h+1-1Solving for h gives us h logn**BST Operations: Insert**• Analysis conclusion • An insert operation consists of two parts: • Search for the position • best case logarithmic • worst case linear • Physically insert the node • constant**BST Operations: Insert**• What if we allow duplicate keys? • Idea #1 : Always insert in the right subtree • Results in very unbalanced tree • Idea #2 : Insert in alternate subtrees • Makes it difficult to search for all occurrences • Idea #3 : All elements with the same key are inserted in a single node • Good idea! • Easy to search, does not affect balance any more than non-duplicate insertion.**BST Operations: Insert**• What if we allow variable number of children? (n-ary tree) • Idea : Use a vector/list of pointers to children.**BST Operations: Search**• Take advantage of the BST property. • Algorithm sketch: • Compare target to root • If equal, return success • If target < root, search left • If target > root, search right • Running time: • Similar to insert**BST Operations: Delete**• The Delete operation consists of two parts: • Search for the node to be deleted • best case constant (deleting the root) • worst case linear • Delete the node • best case? • worst case?**BST Operations: Delete**• CASE #1 • The node to be deleted is a leaf node. • Easy! • Physically remove the node. • Constant time • We are just resetting its parent's child pointer and deallocating memory**BST Operations: Delete**• CASE #2 • The node to be deleted has exactly one child • Easy! • Physically remove the node. • Constant time • We are just resetting its parent's child pointer, its child's parent pointer and deallocating memory**BST Operations: Delete**• CASE #3 • The node to be deleted has two children • Not so easy • If we physically delete the node, we'll have to place its two children somewhere. This seems to require too much tree restructuring. • But we know it's easy to delete a node that has at most one child. What if we find such a node whose contents can be copied over without violating the BST property and then physically delete that node?**BST Operations: Delete**• CASE #3, continued • The node to be deleted, x, has two children • Idea: • Find the x's immediate successor, y. It is guaranteed to have at most one child • Copy the y's contents over to x • Physically delete y.**BST Operations: Delete**• Finding the immediate successor: • We know that the node has two children. Due to the BST property, the immediate successor will be in the right subtree. • In particular, the immediate successor will be the smallest element in the right subtree. • The smallest element in a BST is always the leftmost leaf.**BST Operations: Delete**• Finding the immediate successor: • Since it requires traveling down the tree from the current node to a leaf, it may take up to linear time in the worst case. • In the best case it will take logarithmic time. • The time to perform the copy and delete the successor is constant.**Binary Search Trees**• Traversing a tree = visiting its nodes • Three major ways to traverse a binary tree: • preorder • visit root • visit left subtree • visit right subtree • postorder • visit left subtree • visit right subtree • visit root When applied on a BST, it visits the nodes in order from smaller to larger • inorder • visit left subtree • visit root • visit right subtree**Binary Search Trees**void print_inorder(Node *subroot ) { if (subroot != NULL) { print_inorder(subroot left); cout << subrootdata; print_inorder(subroot right); } } How long does this take? There is exactly one call to print_inorder() for each node of the tree. There are n nodes, so the running time of this operation is(n)**Binary Search Trees**• A tree may also be traversed one "level" at a time (top to bottom, left to right). This is usually called a level-order traversal. • It requires the use of a temporary queue: enqueue root while (queue is not empty) { get the front element, f print f enqueue f's children dequeue }**Binary Search Trees**12 4 14 2 8 16 6 10 in-order : 2 - 4 - 6 - 8 - 10 - 12 - 14 pre-order: 12 - 4 - 2 - 8 - 6 - 10 - 14 - 16 post-order: 2 - 6 - 10 - 8 - 4 - 16 - 14 - 12 level-order: 12 - 4 - 14 - 2 - 8 - 16 - 6 - 10**Binary Search Trees**• Idea for sorting algorithm: • Given a sequence of integers, insert each one in a BST • Perform an inorder traversal. The elements will be accessed in sorted order. • Running time: • In the worst case, the tree will degenerate to a list. Creation will take quadratic time and traversal will be linear. Total: O(n2) • On average, the tree will be mostly balanced. Creation will take O(nlogn) and traversal will again be linear. Total: O(nlogn)**BSTs vs. Lists**• Time • In the worst case, all dictionary operations are linear. • On average, BSTs are expected to do better. • Space • BSTs store an additional pointer per node. • The BST seemed like a good idea, but in the end it doesn't offer much improvement. • We must find a way to keep the tree balanced and guarantee logarithmic height.**Balanced Trees**• There are several ways to define balance • Examples: • Force the subtrees of each node to have almost equal heights • Place upper and lower bounds on the heights of the subtrees of each node. • Force the subtrees of each node to have similar sizes (=number of nodes)