COP 3540 Data Structures with OOP

COP 3540 Data Structures with OOP Chapter 8 - Part 1 Binary Trees

Why Trees? • Trees are one of the fundamental data structures. • Many real-world phenomena cannot be represented in the data structures we’ve had so far. • Think of arrays: • Easy to search, especially if ordered. • O(log2n) performance! Great. (binary search) • Inserting; Deleting? Horrible!!! Must find item.

Why Trees? • How about Linked Lists? • Inserts and deletes? Great. Take O(1) time – the best you can get! (if inserting / deleting from one end) • Searching? Searching to insert / delete/ change? Not nearly as good as O(1) or even O(log2n)! (can you draw this log graph?) • On average, must search n/2 items! • Process requires O(n) time. • Ordering the linked list doesn’t help, as we must still search to find.

Trees • Especially binary trees. • First, trees in general. • Consists of nodes connected by edges. • Trees have indegree <=1, no cycles; • Implies one ‘path’ to a node. Trees with indegree > 1 and cycles = graphs. • More later on graphs.

Trees A • General Tree root node B D C edge F G E H Edges connect nodes. The only way to get from one node to another is along an indicated path (edges) and these are downward. Edges are likely to be represented in a program by references; nodes as likely objects. Typically, one node at top of tree (sometimes called the distinguished symbol, or root.) Trees are typically small on top and large as we progress down. Binary Trees have ‘outdegree’ <= 2 for ALL nodes. Multi-way tree can have outdegree >= 2 for at least one node (see above). Can only be one root in a tree

Trees A • General Tree root node B D C edge F G E H Parent: All nodes have exactly one parent. Child: Any node may have one or more lines coming from it: children Binary tree has at most two children emanating from a node. Leaf: node with no children Subtree: any node may be considered to be a root of a subtree, even leaves. Visiting: term used to indicate that a node is visited under program control – usually to process the data at that node. Merely passing over a node does not constitute a visit. Traversing: refers to visiting all nodes in a tree in a prescribed manner. Levels: start with level 0. Keys: normally what is displayed on the tree, like A, B, C, … above.

Trees A • Binary Tree root node B D edge F G E H BinaryTree: every node has no more than two children. A child node is called a left child or right child, but may, in turn, be the root of a subtree! A node in a binary tree may have no children. We normally talk in terms of binary search trees. In theory (logical; abstraction), can exist to any number of levels; in practice (i.e. implementation), can run out of memory space.

How Do Binary Search Trees Work? • Need to carry out basic tree operations such as • finding a node, • traversing a tree (get around in the tree), • adding a node, • deleting a node, etc. • This is what this chapter is all about.

Unbalanced Trees Note: tree is ‘unbalanced.’ This means nodes are mostly on one side or the other. Tree may be nearly balanced until certain levels Then, it may become quite unbalanced. Trees can become unbalanced due to the way they were created. Generally, they are more balanced, if randomly developed. We greatly prefer balanced trees due to a number of very welcome properties balanced trees offer. Unbalanced trees present problems in efficiently processing them (implementing them). Red-Black trees address unbalanced trees (further ahead). A root D F G H

Trees in Java Code • So, how do we implement binary trees in Java? • Normally, we will store the nodes at unrelated places in memory with references to children as we are accustomed to do. • Can also represent a tree in memory as an array, with children located in specific positions within this array. • Will look later at this. • Let’s look at some code segments now.

The Node Class We need a class of node objects. These will contain appropriate data and up to two references to children (can have MORE, as we shall see later…) class Node { int iData; // data used as key value double fData // other data node leftChild; // this node’s left child node rightChild; // this node’s right child. public void displayNode() { // whatever… }// end display() } // end class Node.

Sometimes the data might be objects rather than primitives, and better simply referenced in the Node itself, as shown below. class Node { Person p; // reference to a person object Node leftChild; // this node’s left child Node rightChild; // this node’s right child. public void displayNode() { // whatever… }// end display() } // end class Node. class person { int iData; double fData; … } // end class person This will certainly work, but is a bit more complicated such that the node and the data item aren’t the same. Your author will stick with the simpler version.

The Tree Class • Need a tree class from which a tree object can be instantiated. • Will call class Tree with one field: a Node variable that references the root. • Identical to ‘first’ and ‘last’ for linked lists…(get started) • Also, since we do not allocate the entire structure at one time (like an array), we can ONLY have a pointer to the first element. • Consider the basic format of a Tree class:

Tree Class class Tree { private Node root; // only data field in Tree // so what does this guy do??? public void find (int key) { // not showing details of this method here }// end find() public void insert (int id, double dd) { // not showing details of this method here }// end insert() pubic void delete (int id) { // not showing details of this method here }// end delete() // additional other methods as needed } // end class Tree.

The TreeApp Class class TreeApp { public static void main (String [ ] args) { Tree theTree = new Tree; // make a tree. // creates an object of type Tree. (previous slide) theTree.insert (50, 1.5); //insert three nodes theTree.insert(25, 1.7); // invoking tree methods… theTree.insert(75, 1.9); Node found = theTree.find(25); // find node with key 25 if (found != null) // So what does this do??? System.out.println (“Found the node with key 25”); else System.out.pirntln(“ Could not find node with key 25”); // end main() } // end class TreeApp.

Java Code for Finding a Node • We will have to insert and delete a bit later. • Will start with finding a node, since this is simpler. • Remember, nodes have values. • In building a binary search tree, we ‘assume’ nodes are built in an order; that is the values to the left of the root are smaller than the parent or root, while values to the right are larger. • Further: this rule follows down the tree. • So, given that this is the way the tree was built, we have the following: • Note: we are “assuming” that the binary search tree is alreadybuilt…

Sample Binary Tree • So we might have a binary tree that looks like this. • Note the left and right, less than, greater than relationships. 50 30 60 20 40

Discuss the code: public Node find (int key) { // assumes non-empty tree Node current = root; // start at root while (current.iData != key) // if no match { if (key < current.iData) current = current.leftChild; // recall: current = current.next?? else current = current.rightChild; If (current == null) return null; // not found } // end while return current; // returns reference to node } // end find() // also assumes data will be found, and more… 

Java Code for Inserting a Node • Must, of course, find the place where to insert the new node. • We must follow a path to the parent of where we can insert this new node. • New Node will be connected to a parent and as either the left or right child of that parent. • This depends on whether the new node is greater than or less than the value of the parent. • First: create a new node. • Then, using the same thinking of finding a node, we need to find the right spot to add the new node.

Java Code for Inserting a Node • Unless we are out of memory, we will always find a spot to insert. • This is easy. • We will also ignore duplicates at this time, treating them as > conditions…

public void insert(int id, double dd) // we are within TreeApp.java file… { Node newNode = new Node(); // make new node newNode.iData = id; // insert data // create new node; move in its data newNode.dData = dd; if(root==null) // no node in root. root = newNode; // if true, we are done. else // root occupied { Node current = root; // start at root Current is a pointer to a node and it points to root. Node parent; // creating a reference to a parent (think singly-linked list!!! Needed ‘previous’) while(true) // (exits internally) // here is our search to find the right spot. { parent = current; if (id < current.iData) // go left? { current = current.leftChild; // But maybe there IS no left child. So: if(current == null) // means there is no left child of current { parent.leftChild = newNode; // insert on left . Link this new node in. We are done. return; // we are done if we get here. }// end if else // current > current.iData? If so, go right? { current = current.rightChild; if(current == null) { // if end of the line insert on right parent.rightChild = newNode; // link in and we are done again. return;// we are done if we get here. } } // end else go right } // end while } // end else not root } // end insert() • Note: we use ‘parent’ to keep track of where we are… • Parent is used to keep track of the last non-null node.

Traversing the Tree • Traversing the tree means ‘visiting’ all the nodes is some kind of specified order. • Not particularly fast (unless you use recursion). • Three basic ways (there are several others…) • Preorder traversal (NLR scan or traversal) • Inorder traversal (LNR traversal) • Postorder traversal (LRN traversal) • Most common is inorder. • We’ll start here.

Inorder Traversal – Binary Tree (LNR Traversal) • Results in ascending scan based on key values. • Simplest way to traverse a tree is via recursion (ahead) • Start with ‘a’ node as an argument (should be root, but won’t be later…) • Recursive routine will’ • Call itself to traverse the node’s left subtree • Visit the node (implies do something with it!) • Call itself to traverse the node’s right subtree. • Traversal method doesn’t pay attention to key values; only if the node has children.

Java Code for Traversing – Inorder Recursive… • These traversals typically take a total of three statements, if executed recursively. • Here’s the code: private void inOrder (node localRoot)  initially called with root, { // as in inOrder(root); if (localRoot != null) { inOrder (localRoot.leftChild) System.out.print(“localRoot.iData + “ “); inOrder (localRoot.rightChild); }// end if } // end inOrder() It continues until there are no more nodes to visit. All nodes visited! Let’s examine how this works:

Execute the algorithm: LNR Traversal (inOrder) private void inOrder (node localRoot) { (initially called with root as in inOrder(root);) if (localRoot != null) { inOrder (localRoot.leftChild) System.out.print(“localRoot.iData + “ “); inOrder (localRoot.rightChild); } } // end inOrder() // inOrder means priority Left!!!! Start with 50 (root) - Check left subchild L; not null Recursive call with 30 as root; check with left subchild L; not null; Recursive call with 20 as root; check with left subchild; L null. Visit 20 (print) N (System.out.print above…) Recursive call with right subchild; R null – completed most recent call Visit 30 (print) N (System.out.print above…) Recursive call with right subchild (40) Recursive call with left subchild of 40; It is null L Visit 40 (print) N Recursive call with right subchild of 40. It is null – completed another call R Visit 50 N (System.out.print above) Recursive call with right subtree (60) (not null) Recursive call with left subtree of N (60) It is null L Visit 60. Recursive call with right subtree of N (60) It is null Done! 50 30 60 20 40

Priority: L-N-R • Inorder traversal (LNR) means go to the left if at all possible; Continue going to the left as much as possible. THEN: • Next, do the root (node) N; • Lastly go to the right, R. • But even when you go to the right, before you process it, you must determine if you can go to the left again, as much as possible… recursively…

Inorder, preorder, postorder get their names from the position of N in LNR, NLR, and LRN. • In traversing the tree, the first letter in the scan is the priority, second letter is second priority, and third letter is last priority.

Pre-Order and Post-Order Traversals • These are NLR and LRN traversals. • What do you think? • Of course! Same three statements in the algorithm: But we will change the order!! • In NLR, we visit first and then go to the left, which would be considered the node, N, of the left subtree (if present and not null). • Process this node (visit the node). • Go to the left. If not null, this node is now the new root, N, of another subtree. We process that…etc. • Simple priority: N – L – R and • L – R – N – each of which yields different orderings!!

Pre-Order and Post-order Traversals • There are specific applications for these types of traversals. • Ahead… • A binary tree (not a binary search tree) can be used to represent algebraic expressions that involve binary operators. • The root holds the operator and the other nodes hold a variable or another operator. • Each of the subtrees is a valid algebraic expression. • Consider the following:

Pre-Order Traversal: This represents the expression (A+B)*C Of course, this is called ‘infix’ notation – which we are used to. For preorder traversals, NLR, we have the algorithm: visit the node call itself to traverse the node’s left subtree call itself to traverse the nodes’s right subtree See the priority??? Above, preorder would be: *+ABC This is also called ‘prefix’ notation. Advantage: parentheses are never required. Starting with the left, the operator is applied to the next to operands: So, (A+B)*C (operator needs to operands. A+B is temp and considered an operand. Of course, there are more advanced parse trees! (Different one in book) * + C A B

Post-Order Traversal – LRN. * • You can guess the execution sequence: Call itself to traverse the node’s left subtree Call itself to traverse the node’s right subtree Visit the node, N • Will use the tree in the book. • Priority: L R N in that order. • So, ABC+* My words: Go left. Get A Can I go to the left. No. Can I go to the right? No.  Therefore visit A Go back to node Have gone left. NO more. Then go right. (I’m at +) But before I visit, any chance to go left. Yes! Go left. (I’m at B) Any more to the left? No (I’m at B). Anything to the Right? No.  So visit B. Go to the node, +. Next priority is to the right or C. But before I ‘visit’ C can I go to the left? NO. Can I go to the right? NO  Ergo, Visit C Now go back to the node, +. Have visited L and R. So N is left.  Visit N, or +. Now recurse to it’s parent. Have gone to the left; have gone to the right.  So visit that node, *. We are done. Priority: Go left first, then right, then the node (visit) L – R – N. Note this is the postfix notation!!!!! This gives us: abc+* which is postfix notation!! A + B C

Evaluating a Postfix Notation – postorder traversal… • So, how would you evaluate an expression in postfix notation? • Create a tree using postfix notation as the input. • Once the tree is built, you will need to look for operators. • Play with this…you may need…

Finding Maximum and Minimum Values • Pretty easy to do in a binary search tree. • Do the LNR scan. When you encounter an L with no left subtree, ‘that’ is the minimum. • Maximum? Go to the right. Same song.

COP 3540 Data Structures with OOP