CS 3343: Analysis of Algorithms

CS 3343: Analysis of Algorithms Lecture 16: Binary search trees & red-black trees

Review: Hash tables T |U| >> K & |U| >> m • Problem: collision 0 U(universe of keys) h(k1) k1 h(k4) k4 K(actualkeys) k5 collision h(k2) = h(k5) k2 h(k3) k3 m - 1

Chaining • Chaining puts elements that hash to the same slot in a linked list: T —— U(universe of keys) k1 k4 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k5 k2 k7 —— —— k3 k2 k3 —— k8 k6 k8 k6 —— ——

Hashing with Chaining • Chained-Hash-Insert (T, x) • Insert x at the head of list T[h(key[x])]. • Worst-case complexity – O(1). • Chained-Hash-Delete (T, x) • Delete x from the list T[h(key[x])]. • Worst-case complexity – proportional to length of list with singly-linked lists. O(1) with doubly-linked lists. • Chained-Hash-Search (T, k) • Search an element with key k in list T[h(k)]. • Worst-case complexity – proportional to length of list.

Analysis of Chaining • Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot • Given n keys and m slots in the table, the load factor = n/m = average # keys per slot • Average cost of an unsuccessful search for a key is (1+) (Theorem 11.1) • Average cost of a successful search is(2 + /2) = (1 + ) (Theorem 11.2) • If the number of keys n is proportional to the number of slots in the table,  = n/m = O(1) • The expected cost of searching is constant if  is constant

Hash Functions:The Division Method • h(k) = k mod m • In words: hash k into a table with m slots using the slot given by the remainder of k divided by m • Example: m = 31 and k = 78 => h(k) = 16. • Advantage: fast • Disadvantage: value of m is critical • Bad if keys bear relation to m • Or if hash does not depend on all bits of k • Pick m = prime number not too close to power of 2 (or 10)

Hash Functions:The Multiplication Method • For a constant A, 0 < A < 1: • h(k) = m (kA mod1) =  m (kA - kA)  • Advantage: Value of m is not critical • Disadvantage: relatively slower • Choose m = 2P, for easier implementation • Choose A not too close to 0 or 1 • Knuth: Good choice for A = (5 - 1)/2 • Example: m = 1024, k = 123, A  0.6180339887… h(k) = 1024(123 · 0.6180339887 mod1) = 1024 · 0.018169...  = 18. Fractional part of kA

A Universal Hash Function • Choose a prime number p that is larger than all possible keys • Choose table size m≥n • Randomly choose two integers a, b, such that 1 ap -1, and 0 bp -1 • ha,b(k) = ((ak+b)mod p)mod m • Example: p = 17, m = 6 h3,4 (8) = ((3*8 + 4) % 17) % 6 = 11 % 6 = 5 • With a random pair of parameters a, b, the chance of a collision between x and y is at most 1/m • Expected search time for any input is (1)

Today • Binary search trees • Red-black trees

Binary Search Trees • Data structures that can support dynamic set operations. • Search, Minimum, Maximum, Predecessor, Successor, Insert, and Delete. • Can be used to build • Dictionaries. • Priority Queues. • Basic operations take time proportional to the height of the tree – O(h).

BST – Representation • Represented by a linked data structure of nodes. • root(T) points to the root of tree T. • Each node contains fields: • Key • left – pointer to left child: root of left subtree (maybe nil) • right – pointer to right child : root of right subtree. (maybe nil) • p – pointer to parent. p[root[T]] = NIL (optional). • Satellite data

Stored keys must satisfy the binary search tree property. y in left subtree of x, then key[y]  key[x]. y in right subtree of x, then key[y]  key[x]. 26 200 28 190 213 18 12 24 27 Binary Search Tree Property 56

26 200 28 190 213 18 12 24 27 Inorder Traversal The binary-search-tree property allows the keys of a binary search tree to be printed, in (monotonically increasing) order, recursively. Inorder-Tree-Walk (x) 1. ifx NIL 2. then Inorder-Tree-Walk(left[x]) 3. print key[x] 4. Inorder-Tree-Walk(right[x]) 56 • How long does the walk take?  (n)

190 213 12 24 Tree Search Tree-Search(x, k) 1. ifx =NIL or k = key[x] 2. then return x 3. ifk <key[x] 4. then return Tree-Search(left[x], k) 5. else return Tree-Search(right[x], k) Example: search for 27 56 26 200 28 18 Running time: O(h) 27

26 200 28 190 213 18 56 12 24 27 Iterative Tree Search Iterative-Tree-Search(x, k) 1. whilex  NILandk  key[x] 2. doifk <key[x] 3. thenx left[x] 4. elsex right[x] 5. returnx The iterative tree search is more efficient on most computers. The recursive tree search is more straightforward.

Finding Min & Max • The binary-search-tree property guarantees that: • The minimum is located at the left-most node. • The maximum is located at the right-most node. Tree-Minimum(x)Tree-Maximum(x) 1. whileleft[x] NIL 1. whileright[x] NIL 2. dox left[x] 2. dox right[x] 3. returnx 3. returnx Q: How long do they take?

Predecessor and Successor • Successor of node x is the node y such that key[y] is the smallest key greater than key[x]. • The successor of the largest key is NIL. • Search consists of two cases. • If node x has a non-empty right subtree, then x’s successor is the minimum in the right subtree of x. • If node x has an empty right subtree, then: • As long as we move to the left up the tree (move up through right children), we are visiting smaller keys. • x’s successor y is the node that x is the predecessor of (x is the maximum in y’s left subtree). • In other words, x’s successor y, is the lowest ancestor of x whose left child is also an ancestor of x.

28 18 12 24 27 Pseudo-code for Successor Tree-Successor(x) • if right[x] NIL 2. then return Tree-Minimum(right[x]) 3. yp[x] 4. whiley  NIL and x = right[y] 5. dox y 6. yp[y] 7. returny Example: successor of 56 56 26 200 190 190 213 Code for predecessor is symmetric. Running time:O(h)

190 213 12 24 27 Pseudo-code for Successor Tree-Successor(x) • if right[x] NIL 2. then return Tree-Minimum(right[x]) 3. yp[x] 4. whiley  NIL and x = right[y] 5. dox y 6. yp[y] 7. returny Example: successor of 28 56 56 26 200 28 18 Code for predecessor is symmetric. Running time:O(h) Lowest node whose left child is an ancestor of x.

Tree-Insert(T, z) y NIL x  root[T] whilex  NIL doy  x ifkey[z] < key[x] thenx  left[x] elsex  right[x] p[z]  y if y = NIL thenroot[t]  z elseifkey[z] < key[y] then left[y]  z elseright[y]  z Change the dynamic set represented by a BST. Ensure the binary-search-tree property holds after change. Similar to Tree-Search Insert z in place of NIL 28 18 12 24 27 BST Insertion – Pseudocode 56 e.g. insert 195 26 200 190 213 195 Running time: O(h)

Tree-Delete (T, x) if x has no children  case 0 then remove x if x has one child  case 1 then make p[x] point to child if x has two children (subtrees)  case 2 then swap x with its successor perform case 0 or case 1 to delete it  TOTAL: O(h) time to delete a node

26 200 28 18 12 24 27 Case 0 • X has no children • e.g. delete 190 56 190 213

26 200 12 24 Case 1 • X has one child • e.g. delete 28 56 28 190 213 18 27

26 200 12 24 Case 2 • X has two children • e.g. delete 26 56 28 190 213 18 27

27 200 12 24 Case 2 • X has two children • e.g. delete 26 56 28 190 213 18 26 Swap with successor

27 200 12 24 Case 2 • X has two children • e.g. delete 26 56 28 190 213 18 26 Case 0

26 200 12 24 Case 2 • X has two children • e.g. delete 26 56 33 190 213 18 27 28

27 200 12 24 Case 2 • X has two children • e.g. delete 26 56 33 190 213 18 26 Swap with successor 28

27 200 12 24 Case 2 • X has two children • e.g. delete 26 56 33 190 213 18 26 Case 1 28

27 200 12 24 Case 2 • X has two children • e.g. delete 26 56 33 190 213 18 28 Case 1

Correctness of Tree-Delete • How do we know case 2 should go to case 0 or case 1 instead of back to case 2? • Because when x has 2 children, its successor is the minimum in its right subtree, and that successor has no left child (hence 0 or 1 child). • Equivalently, we could swap with predecessor instead of successor. It might be good to alternate to avoid creating lopsided tree.

Deletion – Pseudocode Tree-Delete(T, z) /* Determine which node to splice out: either z or z’s successor. */ • if left[z] = NIL or right[z] = NIL • theny  z // case 0 or 1 • elsey  Tree-Successor[z] // case 2 /* Set x to a non-NIL child of x, or to NIL if y has no children. */ • ifleft[y]  NIL • then x  left[y] • elsex  right[y] /* y is removed from the tree by manipulating pointers of p[y] and x */ • if x  NIL • thenp[x]  p[y] /* Continued on next slide */

Deletion – Pseudocode Tree-Delete(T, z) (Contd. from previous slide) • if p[y] = NIL • thenroot[T]  x • else if y  left[p[i]] • then left[p[y]]  x • else right[p[y]]  x /* If z’s successor was spliced out, copy its data into z */ • ify  z • then key[z]  key[y] • copy y’s satellite data into z. • returny

Querying a Binary Search Tree • All dynamic-set search operations can be supported in O(h) time. • h = (lg n) for a balanced binary tree (and for an average tree built by adding nodes in random order.) • h =(n) for an unbalanced tree that resembles a linear chain of n nodes in the worst case.

Red-black trees: Overview • Red-black trees are a variation of binary search trees to ensure that the tree is balanced. • Height is O(lg n), where n is the number of nodes. • Operations take O(lg n) time in the worst case.

Red-black Tree • Binary search tree + 1 bit per node: the attribute color, which is either red or black. • All other attributes of BSTs are inherited: • key, left, right, and p. • All empty trees (leaves) are colored black. • We use a single sentinel, nil, for all the leaves of red-black tree T, with color[nil] = black. • The root’s parent is also nil[T ].

nil[T] Red-black Tree – Example 26 Remember: every internal node has two children, even though nil leaves are not usually shown. 17 41 30 47 38 50

Red-black Properties • Every node is either red or black. • The root is black. • Every leaf (nil) is black. • If a node is red, then both its children are black. • For each node, all paths from the node to descendant leaves contain the same number of black nodes.

Height of a Red-black Tree • Height of a node: • Number of edges in a longest path to a leaf. • Black-height of a node x, bh(x): • bh(x)is the number of black nodes (including nil[T]) on the path from x to leaf, not counting x. • Black-height of a red-black tree is the black-height of its root. • By Property 5, black height is well defined.

Example: Height of a node: h(x) = # of edges in a longest path to a leaf. Black-height of a node bh(x)= # of black nodes on path from x to leaf, not counting x. How are they related? bh(x) ≤h(x) ≤ 2 bh(x) Height of a Red-black Tree h=4 bh=2 26 h=3 bh=2 h=1 bh=1 17 41 h=2 bh=1 h=2 bh=1 30 47 h=1 bh=1 38 h=1 bh=1 50 nil[T]

INTUITION: • Merge red nodes into their black parents. Height of a red-black tree Theorem. A red-black tree with n keys has height h£ 2 log(n + 1). Proof. (The book uses induction. Read carefully.)

Height of a red-black tree Theorem. A red-black tree with n keys has height h£ 2 log(n + 1). Proof. (The book uses induction. Read carefully.) • INTUITION: • Merge red nodes into their black parents.

h Height of a red-black tree Theorem. A red-black tree with n keys has height h£ 2 log(n + 1). Proof. (The book uses induction. Read carefully.) • INTUITION: • Merge red nodes into their black parents. • This process produces a tree in which each node has 2, 3, or 4 children. • The 2-3-4 tree has uniform depth h of leaves.

The number of leaves in each tree is n • n³ 2h‘ – 1 •  log(n + 1) ³h'³h/2 •  h £2 log(n + 1). h Proof (continued) • We have • h ³h/2, since • at most half • the leaves on any path are red. h

Operations on RB Trees • All operations can be performed in O(lg n) time. • The query operations, which don’t modify the tree, are performed in exactly the same way as they are in BSTs. • Insertion and Deletion are not straightforward. Why?

Left-Rotate(T, x) y x   x Right-Rotate(T, y) y     Rotations

Left-Rotate(T, x) y x   x Right-Rotate(T, y) y     Rotations • Rotations are the basic tree-restructuring operation for almost all balanced search trees. • Rotation takes a red-black-tree and a node, • Changes pointers to change the local structure, and • Won’t violate the binary-search-tree property. • Left rotation and right rotation are inverses.

CS 3343: Analysis of Algorithms