1 / 60

Data Structures 2-3-4 Trees

Data Structures 2-3-4 Trees. Phil Tayco Slide version 1.1 Apr. 14, 2018. 2-3-4 Trees. Binary trees revisited Binary trees combine the best of both worlds of dynamic memory usage and performing binary search like you could with a sorted array

shennessy
Télécharger la présentation

Data Structures 2-3-4 Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures2-3-4 Trees Phil Tayco Slide version 1.1 Apr. 14, 2018

  2. 2-3-4 Trees Binary trees revisited Binary trees combine the best of both worlds of dynamic memory usage and performing binary search like you could with a sorted array The search algorithm with a binary tree will only achieve O(log n) as long as the tree is balanced The balance of a tree is dependent on the inserting and deleting of nodes which can lead to imbalance Imbalance leads to O(n) search performance which is basically a linked list

  3. 2-3-4 Trees Advanced tree ideas With each new structure, we try to address the cons of the previous structure In this case, keeping the tree balanced during insert and delete There are tree algorithms that already look at ways to do this: AVL trees Red-black trees These keep the basic structure of a tree intact and are functionally more complex

  4. 2-3-4 Trees Advanced tree ideas As we’ve done with learning more structures, we explore more changes to the way they are built and the rules supporting them Binary Tree nodes added more pointers than a Linked List node Updated rules about what values are supposed to be left and right children of a node We can do the same thing here to try to address maintaining balance in a tree

  5. 2-3-4 Trees Multiway tree node What if we modified the tree node to where each node contains multiple data elements and multiple child links? The tree shape is present suggesting the advantages with a tree are mostly present To address balance we need to establish a set of rules root 20 40 60 10 30 50 70 80

  6. 2-3-4 Trees Multiway tree Leaf nodes can have any number of data items root 20 40 60 10 30 31 32 50 70 80

  7. 2-3-4 Trees Multiway tree A non-leaf node with 1 data item always has 2 children root 20 10 30

  8. 2-3-4 Trees Multiway tree A non-leaf node with 2 data item always has 3 children root 20 40 10 30 50 60

  9. 2-3-4 Trees Multiway tree A non-leaf node with 3 data item always has 4 children root 20 40 60 10 30 50 70 80

  10. 2-3-4 Trees Multiway tree As before, child nodes to the left and right of a data item are less and greater to maintain order root 20 40 60 10 30 31 32 50 70 80

  11. 2-3-4 Trees Similarities to Binary trees While the number of items and node children have increased, the basic order is the same This promotes a search and insert performance similar to binary trees at O(log n) Search starts at root examining data items against the search value and traverses down nodes appropriately Insert adds new data items at the appropriate leaf level The algorithms will show that balance will always be achieved. This makes search and insert perform at O(log n)

  12. 2-3-4 Trees Insert New data items will be inserted at the leaf level In order to maintain balance, as we perform the normal search for the appropriate leaf to insert the new data element, we add a rule to the algorithm: When visiting any node, if it is full, “split” the node Whether or not a split has occurred, continue down the path using the standard search until a leaf node is reached Once a leaf is reached, add the new data element to it (if it is full, perform another “split”)

  13. 2-3-4 Trees Split The splitting of a node requires creating a new or modifying an existing parent node as well as creating a new sibling node Data elements are moved and child pointers are readjusted as follows: A new node is created as a sibling to the full node The 3rd data item of the full node is moved to the sibling node as its 1st data item The 2nd data item of the full node is added to the parent node The 1st data item of the full node remains where it is The 3rd and 4th child pointers of the full node move to the sibling node as its 1st and 2nd child pointers

  14. 2-3-4 Trees Split example 1 We want to add 5 to the tree below. We start at root, 1st data item is 14 so we go down the 1st child pointer. We see it’s full so we must split it root 14 3 6 10 17 1 2 4 7 8 12 16 18 20

  15. 2-3-4 Trees Step 1: Create new sibling node • Notice parent node in this case is root and the sibling is not yet attached to the parent (the 2nd child pointer of root is still connected as such) root (parent) 14 (sibling) (current) 3 6 10 17 1 2 4 7 8 12 16 18 20

  16. 2-3-4 Trees Step 2: Move 3rd item to as 1st item of new node • 10 of current moves to new sibling node root (parent) 14 (sibling) (current) 3 6 10 17 1 2 4 7 8 12 16 18 20

  17. 2-3-4 Trees Step 3: Move 2nd item to parent • Notice 6 is inserted into the data item list of parent. This shifts 14 as well as its 2 child pointers root (parent) 6 14 (sibling) (current) 3 10 17 1 2 4 7 8 12 16 18 20

  18. 2-3-4 Trees Step 5: Move 3rd and 4th child pointers as 1st and 2nd child pointers of sibling • This keeps the parent-child relationships and orders intact and balanced root (parent) 6 14 (sibling) (current) 3 10 17 1 2 4 7 8 12 16 18 20

  19. 2-3-4 Trees Split Analysis The split keeps the non-leaf and leaf rules intact Guarantees non-leaf nodes with 1, 2 or 3 data items have 2, 3 or 4 child nodes The split is performed as full nodes are encountered on the way down In the previous example, the insert of 5 still has not been performed The insert process resumes at the parent. Note that if the parent is full as a result of the split, a split at that node is not performed

  20. 2-3-4 Trees Resume insert at parent • 5 is less than 6 so we go down child pointer 1. 5 is greater than 3 and there is only 1 data item, so we go down 2nd child pointer. Node with data item 4 is a leaf and is not full so we add 5 there. 6 14 3 10 17 1 2 4 5 7 8 12 16 18 20

  21. 2-3-4 Trees Insert Analysis The algorithm keeps the tree balanced New nodes are created as needed by adding siblings before adding levels Levels are increased when the root node is the one that requires splitting When splitting the root, the same split algorithm applies, but instead of adding the 2nd data item to the parent node, a new parent node is created (as the new root)

  22. 2-3-4 Trees Splitting the root Here, we will insert 15. Before we even go down a child node, we must split the root because it is full root 20 40 60 10 30 31 32 50 70 80

  23. 2-3-4 Trees Step 1: Create the sibling node The algorithm works the same as before, except there is no “parent” node (yet) (current) root (sibling) 20 40 60 10 30 31 32 50 70 80

  24. 2-3-4 Trees Step 2: Create new root as parent Since the current node is root, we create another new node to be the parent (and new root) (parent) root (current) (sibling) 20 40 60 10 30 31 32 50 70 80

  25. 2-3-4 Trees Step 3: Move data items The normal split occurs. 3rd item of current moves to 1st of sibling and 2nd item of current moves to 1st of parent (parent) 40 (current) root (sibling) 20 60 10 30 31 32 50 70 80

  26. 2-3-4 Trees Step 4: Update pointers 3rd and 4th child pointers of current become 1st and 2nd of sibling. 1st and 2nd of new parent get current and sibling nodes respectively (parent) 40 (current) root (sibling) 20 60 10 30 31 32 50 70 80

  27. 2-3-4 Trees Step 5: New root and continue Make the parent the new root of the tree. Resume the insert from the root (15 will end up going down and added to leaf node with 10) Notice the full leaf node 30, 31, 32 is not split. This is because it is never visited (root) 40 20 60 10 15 30 31 32 50 70 80

  28. 2-3-4 Trees Insert Analysis Splitting will only occur when a visited node is full, keeping the 2-3-4 tree rules intact Levels of the tree increase “upward” when the root node is full (because the new parent is created at that moment and becomes the new root) Splitting a leaf node will never result in more than 4 children for a parent node (if the parent node had 4 children, it would be full and split before reaching any of the child leaf nodes) Balance is maintained because even if one side gets “heavy” with data items, the number of nodes will remain balanced because of the splitting algorithm Best practice at understanding the algorithm is to insert a series of numbers and draw the resulting tree

  29. 2-3-4 Trees public class Node234 { private int numItems; private Node234 parent; private Node234[] children; private int[] dataItems;

  30. 2-3-4 Trees public Node234() { numItems = 0; parent = null; children = new Node234[4]; dataItems = new int[3]; for (int n = 0; n < 4; n++) children[n] = null; for (int n = 0; n < 3; n++) dataItems[n] = -1; }

  31. 2-3-4 Trees public class Tree234 { private Node234 root; public Tree234() { root = new Node234(); }

  32. 2-3-4 Trees Node234 and Tree234 Code More properties needed here for the node numItems to keep track of how many data items are in the node Reference to parent node (useful for handling splits) Array of child pointers Array of data items The array sizes are defined in the constructor and initialized to null (for children) and -1 (for data items) We could also use a Linked List for the child and data arrays, but they are so small, we don’t necessarily need to (and simplifying the code to start) The Tree is just the root node. Note that it is not initialized to null, but to a new Node234 object with no data items

  33. 2-3-4 Trees public void insert(int value) { Node234 current = root; while(true) { if(current.isFull()) { split(current); current = current.getParent(); current = getNextChild(current, value); }

  34. 2-3-4 Trees else if(current.isLeaf()) break; else current = getNextChild(current, value); } current.insertItem(value); }

  35. 2-3-4 Trees Tree234 Insert Code We start with a current node at root The loop plans to go down child nodes of the tree until we reach a leaf node at which point we insert the value Along the way, if the node.isFull method returns true, we have to split it After the split, we set current to its parent followed by finding the appropriate child to go to based on the value to be inserted (getNextChild) Many methods being used here: isFull, isLeaf, getParent, split, getNextChild, and insertItem

  36. 2-3-4 Trees public boolean isFull() { return (numItems == 3); } public Node234 getParent() { return parent; } public boolean isLeaf() { return (children[0] == null); }

  37. 2-3-4 Trees public Node234 getNextChild(Node234 n, int value) { int i; int numItems = n.getNumItems(); for (i = 0; i < numItems; i++) if (value < n.getDataItem(i)) return n.getChild(i); return n.getChild(i); }

  38. 2-3-4 Trees getNextChild Code This method is designed to find the child node where the given value could be located (not if the current node contains the value as a dataItem) Using a loop to go through the dataItems array, if the given value is greater than or equal to the dataItem in the node, the child to the “left” of the dataItem is returned If you go through all the dataItems, then return the “right”most child node of the available dataItems getNextChild also uses “getNumItems()” method to determine the number of dataItems in the node

  39. 2-3-4 Trees public int getNumItems() { return numItems; } This method is a standard get function of a class, returning the numItems property

  40. 2-3-4 Trees public int insertItem(int data) { numItems++; int c = 0; for (int n = 2; n >= 0; n--) { if (dataItems[n] == -1) continue; From right to left of the data items array, we check for non-empty data items (denoted as not equal to -1), if a spot is empty, ignore it

  41. 2-3-4 Trees else { int d = dataItems[n]; if (data < d) dataItems[n + 1] = dataItems[n]; else { dataItems[n + 1] = data; return n + 1; } } } dataItems[0] = data; return 0; }

  42. 2-3-4 Trees Node234 Code – inserting a data item The “else” branch here deals with encountering a data item as we go right to left in the data array looking for the correct place to insert the new data item When a data item is found, compare it to the new item If the new item is less than it, the new item belongs to the left so we shift the data item in the array to the right by 1 (similar to insertionSort algorithm) Otherwise, the new data item belongs to the right of this item in the array so we set it there and return that index If we reach the end of the loop, that means all data items in the array shifted to the right and the new item belongs in the first spot (index 0). We insert it there and return that index This method assumes there is space to insert the value in the dataItems array. Why is that okay here in the context of the overall insert algorithm?

  43. 2-3-4 Trees private void split(Node234 n) { int thirdItem = n.removeItem(); int secondItem = n.removeItem(); Node234 fourthChild = n.removeChild(3); Node234 thirdChild = n.removeChild(2); Node234 sibling = new Node234(); Node234 parent;

  44. 2-3-4 Trees Tree234 Split Code It is important now if you haven’t been drawing pictures to go through code that you do so now… Split begins with removing the 2nd and 3rd data items from the full node and storing their values – these will be transferred to the parent and sibling nodes respectively We do the same with disconnecting the 3rd and 4th child pointers of the node (so we can transfer them to the sibling) We then create a new sibling node and a parent pointer (parent is not a new node yet as we haven’t determined if the full node is root at this point) The setup is complete, but there are 2 new methods in Node234 to review before continuing with the split function: removeItem and removeChild

  45. 2-3-4 Trees public int removeItem() { int lastItem = dataItems[numItems - 1]; dataItems[--numItems] = -1; return lastItem; } This removes the last data item in the data array (setting it to -1), decrements numItems and returns the value that was removed

  46. 2-3-4 Trees public Node234 removeChild(int n) { Node234 child = children[n]; children[n] = null; return child; } This sets the given child of the node to null while returning a reference to that child Now we can go back and look at the next part of the split function which is to establish the parent node

  47. 2-3-4 Trees if (n == root) { parent = new Node234(); n.setParent(parent); root = parent; root.setChild(0, n); } else parent = n.getParent(); If the node being split is root, we need a new parent node as root and set its first child to the current node being split Otherwise, a parent exists and we just get that node

  48. 2-3-4 Trees int itemLocation = parent.insertItem(secondItem); int parentItems = parent.getNumItems(); int c = parentItems - 1; while (c > itemLocation) { Node234 temp = parent.removeChild(c); parent.setChild(c + 1, temp); c--; } parent.setChild(itemLocation + 1, sibling);

  49. 2-3-4 Trees Tree234 Split Code – adjusting the parent The second item from the full node being split is inserted into the parent node using the Node’s insertItem function The location of that insert can vary, so it is returned here to determine how to adjust the child pointers of the parent This is done by getting the number of items in the parent and using a loop down to shift child pointers to the location of the new item that was inserted Once that shift is complete, there will be a “hole” to the right of where the item inserted into the parent took place. This hole is filled by connecting it to the new sibling node just created This part of the process uses a setChild method that when used in this context, keeps the child node assignments and their parent nodes intact

  50. 2-3-4 Trees public void setChild(int i, Node234 n) { children[i] = n; if (n != null) n.setParent(this); } This assigns the given node n to a specific location in the children array If n is an actual node, we reassign its parent pointer to this one This fixes the current and parent nodes. Now all that remains is to complete configuring the sibling node

More Related