CSE 326: Data Structures Lecture #7 Binary Search Trees

CSE 326: Data StructuresLecture #7Binary Search Trees Alon Halevy Spring Quarter 2001

Binary Trees A • Many algorithms are efficient and easy to program for the special case of binary trees • Binary tree is • a root • left subtree (maybe empty) • right subtree (maybe empty) B C D E F G H I J

A B C D E F Data left pointer right pointer Representation A B C D E F

Properties of Binary Trees Max # of leafs in a tree of height h = Max # of nodes in a tree of height h = A B C D E F G

Operations create destroy insert find delete Dictionary: Stores values associated with user-specified keys keys may be any (homogenous) comparable type values may be any (homogenous) type implementation: data field is a struct with two parts Search ADT: keys = values kim chi spicy cabbage kreplach tasty stuffed dough kiwi Australian fruit Dictionary & Search ADTs insert • kohlrabi • - upscale tuber find(kreplach) • kreplach • - tasty stuffed dough

Naïve Implementations Goal: fast find like sorted array, dynamic inserts/deletes like linked list

Search tree property all keys in left subtree smaller than root’s key all keys in right subtree larger than root’s key result: easy to find any given key inserts/deletes by changing links Binary Search Tree Dictionary Data Structure 8 5 11 2 6 10 12 4 7 9 14 13

Example and Counter-Example 5 8 4 8 5 18 1 7 11 2 6 10 11 7 3 4 BINARY SEARCH TREE NOT A BINARY SEARCH TREE

In Order Listing visit left subtree visit node visit right subtree 10 5 15 2 9 20 17 7 30 In order listing: 25791015172030

Finding a Node Node *& find(Comparable x, Node * root) { if (root == NULL) return root; else if (x < root->key) return find(x, root->left); else if (x > root->key) return find(x, root->right); else return root; } 10 5 15 2 9 20 17 7 30 runtime:

Insert Concept:proceed down tree as in Find; if new key not found, then insert a new node at last spot traversed void insert(Comparable x, Node * root) { assert ( root != NULL ); if (x < root->key){ if (root->left == NULL) root->left = new Node(x); else insert( x, root->left ); } else if (x > root->key){ if (root->right == NULL) root->right = new Node(x); else insert( x, root->right ); } }

BuildTree for BSTs Suppose a1, a2, …, an are inserted into an initially empty BST: • a1, a2, …, an are in increasing order • a1, a2, …, an are in decreasing order • a1 is the median of all, a2 is the median of elements less than a1, a3 is the median of elements greater than a1, etc. • data is randomly ordered

Examples of Building from Scratch • 1, 2, 3, 4, 5, 6, 7, 8, 9 • 5, 3, 7, 2, 4, 6, 8, 1, 9

Analysis of BuildTree • Worst case is O(n2) 1 + 2 + 3 + … + n = O(n2) • Average case assuming all orderings equally likely is O(n log n) • not averaging over all binary trees, rather averaging over all input sequences (inserts) • equivalently: average depth of a node is log n • proof: see Introduction to Algorithms, Cormen, Leiserson, & Rivest

Find minimum Findmaximum Bonus: FindMin/FindMax 10 5 15 2 9 20 17 7 30

Deletion 10 5 15 2 9 20 17 7 30 Why might deletion be harder than insertion?

Deletion - Leaf Case Delete(17) 10 5 15 2 9 20 17 7 30

Deletion - One Child Case Delete(15) 10 5 15 2 9 20 7 30

Deletion - Two Child Case Delete(5) 10 5 20 2 9 30 7 replace node with value guaranteed to be between the left and right subtrees: the successor Could we have used the predecessor instead?

Finding the Successor Find the next larger node in this node’s subtree. • not next larger in entire tree Node * succ(Node * root) { if (root->right == NULL) return NULL; else return min(root->right); } 10 5 15 2 9 20 17 7 30 How many children can the successor of a node have?

Predecessor Find the next smaller node in this node’s subtree. Node * pred(Node * root) { if (root->left == NULL) return NULL; else return max(root->left); } 10 5 15 2 9 20 17 7 30

Deletion - Two Child Case Delete(5) 10 5 20 2 9 30 7 always easy to delete the successor – always has either 0 or 1 children!

Delete Code void delete(Comparable x, Node *& p) { Node * q; if (p != NULL) { if (p->key < x) delete(x, p->right); else if (p->key > x) delete(x, p->left); else { /* p->key == x */ if (p->left == NULL) p = p->right; else if (p->right == NULL) p = p->left; else { q = successor(p); p->key = q->key; delete(q->key, p->right); } } } }

Lazy Deletion • Instead of physically deleting nodes, just mark them as deleted • simpler • physical deletions done in batches • some adds just flip deleted flag • extra memory for deleted flag • many lazy deletions slow finds • some operations may have to be modified (e.g., min and max) 10 5 15 2 9 20 17 7 30

Lazy Deletion Delete(17) Delete(15) Delete(5) Find(9) Find(16) Insert(5) Find(17) 10 5 15 2 9 20 17 7 30

Dictionary Implementations BST’s looking good for shallow trees, i.e. the depth D is small (log n), otherwise as bad as a linked list!

Beauty is Only (log n) Deep • Binary Search Trees are fast if they’re shallow: • e.g.: perfectly complete • e.g.: perfectly complete except the “fringe” (leafs) • any other good cases? Problems occur when one branch is much longer than the other! What matters here?

CSE 326: Data Structures Lecture #7 Binary Search Trees