Trees

Trees Based on slides by Harry Zhou Read Sections 12, 13, 6, 18, 16.3

a d b c e f g h i j Lists – one parent & one child (at most) Trees – one parent & one or more children Graphs – one or more parents and one or more children. Tree (math definition): connected acyclic graph Trees (inductive definition, more useful in comp. science): • A single node is a tree (called the root of the tree) • A node (called the root) +links to a finite number of trees are a tree Nodes (except the root) have one parent and any # of children (roots of the subtrees attached to the root) 2 Trees

3 Binary Trees: each node has at most 2 children public class treenode{ Object data; treenode left; treenode right; public treenode() { }; public treenode(Object j) {data = j; left = null; right = null;} } Tree operations: insert(t,e); delete(t,e); find(t,e) tree traversals: inorder, postorder, preorder, breadth-first

a b w c u d s 4 Tree traversals Inorder - left, root, right Preorder - root, left, right Postorder - left, right, root public void Inorder(treenode t){ if(t != null){ Inorder(t.left); System.out.println(““ + t.data); Inorder(t.right);} } public void Postorder(treenode t){ if(t != null){ Postorder(t.left); Postorder(t.right); System.out.println(““ + t.data);} } output: c b s d a w u output: c s d b u w a

5Convert a recursive algorithm to an iterative version public void Inorder(treenode t){ // recursive version if(t != null){ Inorder(t.left); System.out.println(““ + t.data); Inorder(t.right);} } For iterative version, use a stack; each stack frame has two fields: (node, action) public void Inorder (treenode root) {//iterative version stack.push(root, ‘inorder’); while (stack.IsEmpty()!=null) { (c,action) = stack.pop(); if (action == ‘visit’) visit(c ) else {if(c.right != NULL) stack.push(c.right, ‘inorder’); stack.push(c, ‘visit’) if(c.left != NULL) stack.push(c.left, `inorder’); } } }

a b w c u d s 6 Level order (or breadth-first order) traversal Traverse the tree level by level, each level from left to right Question: what kind of data structures is needed here, queues or stacks? Algorithm: current = root while(current) System.out.print(current.data) if(current.left) Enq(current.left) if(current.right)Enq(current.right) current = Deq(queue) Output: a b w c d u s

10 23 7 23 19 33 32 43 9 15 15 25 21 7Binary Search trees • A node’s key is larger or equal than that of any node in its left subtree, and smaller than that of any node in its right subtree. 2) Operations: Search, Insert, Delete, Min, Max Also: Succesor, Predecessor

17 26 8 11 31 4 35 27 8 Algorithm Find public find (int x, treenode T) { if (!T) return ‘NOT FOUND’ message; if ( T.data == x) return x; if ( X > T.data) return find (x, T.rightchild) else return find (x, T.leftchild) } find(27,t) 1) Since x > 17, search ( 27, ->26) 2) Since x > 26, search (27, ->31) 3) Since x < 31, search (27, ->27) 4)Since x = 27, return 27

9 Insertions in Java public treenode insert(treenode t, int k){ if (t == null) {t = new treenode(); t.data = k; t.left=null; t.right=null; return t;} else if( k > t.data) { t.right = insert(t.right,k); return t;} else{t.left = insert(t.left, k); return t;} }

10 Look for the smallest number in the BST public treenode min ( treenode t ) { if ( t == null ) return null; else if ( t. left == null ) return t; else return min ( t.left ); }

Tree deletion operations: Three cases: (1) The node is a leaf node (2) The node has one child (3) The node has two children

20 20 10 30 10 30 3 13 35 3 13 20 20 10 35 10 30 3 13 3 13 35 12delete operation (cont.) (1) The node is a leaf node: just delete it (2) The node has one child: connect its child to its parent

20 20 20 10 40 10 46 10 46 3 16 35 56 3 16 35 56 3 16 35 56 40 58 58 46 58 49 49 49 13The node has two children Algorithm: - Replace the value of the node with the smallest value of its right subtree - Delete the node with the smallest value in the right subtree (has at most one child) connect its child with its parent replace its value with the smallest

Complexity for BST- height of a tree: number of edges on a longest path root-> leaves- Find, Insert, Delete – O(h), where h is the height of the tree.- h can vary a lot (as a function of n = number of nodes), depending on how well balanced the BST is. - h can be O(log n) : we say that the tree is balanced - h can be O(n)What is the relation between h and n? h +1 ≤ n ≤ 2h+1 - 1The left inequality: obvious (why?)Right inequality: Proof by induction on n.We will see later how to maintain the tree balanced while doing the tree operations.

Search problems: one of the most common applications of computers. If information is static, then use binary search – works in O( log n) time. Note that insert and delete would be slow (O(n)) If information is dynamic: • we want to implement search, insert, and delete, ideally all in O( log n) time. • approach: use binary search trees; we have seen that search, insert, and delete are done in O(h) time, where h is the tree’s height. • but the tree can grow imbalanced and h can be Omega(n). • There are variants of binary search tree that do not become unbalanced: AVL trees (store in every node the height difference between the right subtree and left subtree), red-black trees (store in every node the color red or black), splay trees (no extra storage needed, but operations are in O( log n) in the amortized sense (not the worst-case sense).

Relations between n and h in a binary tree

17Red-Black Trees (Chapter 13 in textbook) Done on board.

18B trees (Chapter 18 in the textbook) Binary search trees are not appropriate for data stored on external memory: each vertical move in the tree involves a mechanical movement of the disk head. Memory operation: on 500 MIPS machine, there are 500 millions of instructions per second Disk: 3600 rot/min  1 rotation in 1/60 sec ~= 16.7 ms. on average we do half a spin  8.3 ms So ~=120 op. per second So: time for one disk access ~= time for 4 * 106 memory operations Idea: use ‘multiway trees’ instead of binary trees; at each node we need to choose to continue among more than two children. Nodes are fatter, but trees are shallower. More work at a node (this is in internal memory), but fewer nodes are visited (each new visited node typically implies a different disk access). This idea is implemented via B-trees. There are more variants of B-trees. We discuss one of them: B+ - tree

Definition of B trees of order m: the root is either a leaf or has between 2 and m children Any non-leaf node (except the root) has between m/2 and m children if it has j children, it contains j-1 key values (to guide the search) All leaves are at the same level, each leaf contains between m/2 and m actual values stored in a sorted m-array (some slots in the array may be empty)

A B-tree of order m with n values has height h  log(n/2)/log(m/2). Search: Obvious (just follow the guiding keys). Insert and Delete: there are many cases. I’ll just illustrate by examples. All the operations work in O(h).

Insert and Delete in a B-tree

With 99 deleted the number of values in the leaf gets below min.; so two leaves are merged; now the parent has children below the min; then a child is adopted from the left sibling

4 9 10 10 20 20 21 16 10 25 60 50 13 100 31 45 24 19 17 25Heaps (Chapter 6 in the textbook) Main application: priority queues • The value of each node is less than that of its children. (MIN-HEAP) • A heap is a complete binary tree except the bottom level which is adjusted to the left and has no “holes” • Height of the tree is  log n, where n is the number of values (proof on board) heap Not a heap Not a heap

A 1 2 3 4 5 6 7 8 9 B C A B C D E F G H I G D E F H I 26Heap Implementation We can use an array (due to the regular structure of the binary tree): Left child of i: 2 * i. Right child of i: 2 * i + 1 Parent of i: [i/2]

4 5 5 5 9 4 10 10 10 20 40 9 20 40 4 20 40 9 1) Insertion 2) DeleteMin 3) BuildHeap (organize an arbitrary array as a heap) • Insertion: find the left-most open position and insert the new element NewElem while ( NewElem < its parent ) exchange them. 27Heap operations Question: Is it possible that some elements at the same level but different branch may be smaller than the newly moved up value?

10 10 5 21 13 13 18 13 13 21 10 10 21 40 20 18 40 20 18 40 20 21 18 40 20 28Delete Min - Replace the node by the right-most node at the bottom level - While ( new root > its children ) Exchange it with the smaller of its children The complexity analysis: deletion & insertion – O ( log n ) Reasons: 1) no of operations = O( height of the heap) 2) Height of the heap: O ( log n )

BuildHeap operation Build a heap from an arbitrary array Bottom –up procedure: start with subtrees having roots at level h-1 and make them heaps; then move to subtrees with roots at level h-2 and make them heaps, and so on till level 0 is reached.

Time complexity: 1*2h-1 + 2* 2h-2 + 3*2h-3 + …. + h 20 < 2 * 2h • Since h =  log n, time complexity is O(n).

34Heap Questions (1) Do you need to specify which element of a heap to be deleted? (2) What is the difference between a binary search tree and a heap?

Heaps- Insert (pseudocode) minHeapInsert(A, key) { A.heap_size = A.heap_size+1; A[A.heap_size] = key; i=A.heap_size; while (i > 1 and A[parent[i]] > A[1]) { swap(A[i],A[parent(i)]); i = parent(i); } }

HEAPIFY an element (move it down in the heap) minHeapify(A,i) { //move-down A[i] l = left(i); r= right(i); if(l <=A. heap_size and A[l] < A[i]) smallest = l; else smallest = i; if(r <=A. heap_size and A[r] < A[smallest]) smallest = r; if(smallest != i){ swap(A[i], A[smallest]); minHeapify(A.smallest) }

DELETE in a heap heapExtract(A) { if(A.heap_size) < 1) error “heap is empty” else{ min= A[1]; A[1]=A[A.heap_size]; //move last element in the root A.Heap_size --; minHeapify(A,1); // move down the root return min; } }

BUILD HEAP - pseudocode buildMinHeap(A) { // A is initially an arbitrary array // procedure converts it into a heap i = A.heap_size/2; while(i > 0) { minHeapify(A,i); i--; } }

Heap Applications (1) priority queues (2) Heap Sort: (a) BuildHeap from the initial array– O(n) (b) Delete-Min n times – n x O(log n) = O(n log n) Time complexity : O(n log n)

40Compression – Huffman coding – Section 16.3 Data compression: consists of two phases: (1) Encoding (compression) (2) Decoding (decompression) Example: Convert a sequence of characters into binary sequences of equal length a…000 b --- 001 c --- 010 d --- 011 e --- 100 Problem: not efficient since letter frequencies can be vastly different

41 Potential problems Method 2: Use a variable length code letter frequency code a 0.30 0 b 0.26 1 c 0.20 00 d 0.14 01 e 0.10 10 Problem: how to convert 00110110 or 1010010010 to letters? Solution: We want the code to be prefix-free (no codeword is a prefix of another codeword).

42Huffman Code General Strategy: - allow the code length to vary - guarantee the decoding to be unambiguous, by making the code to be prefix-free. Note: A prefix-free binary code corresponds to binary tree. Huffman Algorithm: Initially-Each character and its frequency are represented by a tree with a single node (1) find 2 trees with smallest weights, merge them as left and right children (2) The root weight is the sum of two children Repeat it until there is only one tree left Assign a 0 to each left edge and a 1 to each right edge. The path from the root to the leaf node representing a character is the codeword for the character.

T1 0.24 d e 0.44 T2 T1 c d e T3 0.56 a b 43Trace Huffman algorithm Frequency: a - 0.30 b - 0.26 c - 0.20 d - 0.14 e - 0.10 (1) merge d and e (2) merge T1 and c (3) merge a and b

T4 0 1 T2 T3 1 0 0 1 T1 c a b 1 0 d e 44Trace Cont. (4) merge T2 and T3 Encode: a --- 00 b --- 01 c --- 10 d --- 110 e --- 111 Decode: Only one way to convert any string to letters Example: 0011001111 -> a d b e Why? No prefix code of a code is used to represent another letter How to implement the algorithm?

Compression problem Input: symbols a1 , a2 , … , an , with probabilities p1 , p2 , … , pn GOAL : Find prefix-free set of codewords C = {c1 , c2 , … , cn } having lengths l1 , l2 , … , ln such that the average length L(C) = p1 l1 + p2 l2 + … + pn ln is small. Theorem. Huffman algorithm finds an optimal code.

Trees

Trees

Presentation Transcript

Trees, Binary Trees, and Binary Search Trees

Trees

Trees

Trees 4: AVL Trees

Trees

Trees

Trees

Trees

Trees

Trees

Trees

Trees

Trees, Trees, and More Trees

Trees! Trees! Trees!

Trees

Trees

Trees

Trees

Trees

Trees, Trees, & More Trees

TREES

Trees

Trees

Trees

Presentation Transcript

Trees, Binary Trees, and Binary Search Trees

Trees

Trees

Trees 4: AVL Trees

Trees

Trees

Trees

Trees

Trees

Trees

Trees

Trees

Trees, Trees, and More Trees

Trees! Trees! Trees!

Trees

Trees

Trees

Trees

Trees

Trees, Trees, &amp; More Trees

TREES

Trees

Trees, Trees, & More Trees