1 / 45

Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees

Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees. Prof. Neeraj Suri Brahim Ayari. Height of AVL Trees. AVL trees are defined by the height difference of subtrees Original goal: the tree should be as “balanced” as possible

Télécharger la présentation

Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Computer Science 2 Balanced Binary Search Trees (2)&Extended Binary Trees Prof. Neeraj Suri Brahim Ayari

  2. Height of AVL Trees • AVL trees are defined by the height difference of subtrees • Original goal: the tree should be as “balanced” as possible • How balanced is an AVL tree? • The answer is given by the theorem of height of an AVL tree: Theorem: For the height h(T) of an AVL tree with n nodes holds:  log2n + 1  h(T)  1.44 log2( n+1 )

  3. Fibonacci Trees • The lower bound  log2n  + 1  h(T) comes from the minimal height of a balanced binary tree (already shown) • For the proof of the upper bound one needs a special class of AVL trees: Fibonacci trees • Fibonacci numbers: F0 = 0, F1 = 1, Fn = Fn-1 + Fn-2 Definition: Fibonacci Trees are constructed as follows: • The empty tree T0 is a Fibonacci tree (height 0) • The tree T1, that contains only one node is a Fibonacci tree of height 1 • If Th-1 and Th-2 are Fibonacci trees of heights h-1 and h-2, and x a node, then Th = (Th-1, x, Th-2) is a Fibonacci tree of height h • No other trees are Fibonacci trees -> Observe: the number of nodes on the path from root to the deepest leaf gives the height of the Fibonacci tree !

  4. Fibonacci Trees Number of nodes n0 = 0, F0= 0 n1 = 1, F1= 1 n2 = 2 , F2= 1 n3 = 4, F3= 2 T0 : empty tree T1: one node x T2: (T1, x, T0) x T3: (T2, x, T1)

  5. Fibonacci Trees Number of nodes n4 = 7 , F4= 3 n5 = 12 , F5=5 T4: (T3, x, T2) T2 T3 x T5: (T4, x, T3) T4 T3 T6, T7, etc. analogue

  6. Fibonacci and AVL Trees To prove: Every Fibonacci tree is an AVL tree Proof (by induction over h): • Note: Th is always a tree of height h • T0 and T1 are AVL trees • If Th-1 and Th-2 are AVL trees, build according to the rules Th = (Th-1, x, Th-2). • As Th-1 and Th-2 are AVL trees, we must now only check the balancing factor of the root • BF(Th) = | h(Th-1) - h(Th-2) | = | (h - 1) - (h - 2) | = 1 

  7. Fibonacci and AVL Trees • Special note: for a given Fibonacci tree there are no AVL trees with the same height and fewer nodes • The construction gives AVL trees with maximal height • One can add more nodes with kept height, but remove none without violating the AVL criterion (height is kept unchanged) • Fibonacci trees gives the maximal height of an AVL tree for a given number of nodes • Note: the number of nodes nh in Th is the number of nodes in the (h+2)-th Fibonacci number minus 1, i.e., nh = Fh+2 - 1 (for n  0)

  8. Fibonacci and AVL Trees • The following inequality holds for Fibonacci numbers: Fh h-2 for h  2 and  = ½ ( 1 + 5 ) • n is the number of nodes in an AVL tree of height h. As Th contains a minimal number of nodes: n  nh • Insert nh = Fh+2 - 1: n  nh = Fh+2 - 1  h - 1 thus n + 1  h • Number of nodes grows exponentially with the height • Reversely: h  log (n + 1) = (1 / log2) log2(n+1) = 1.44... log2(n+1) • Thus: search path in an AVL tree is in worst case 44% longer than in a complete tree

  9. Cost Analysis of AVL Trees • h  c•log2 (n+1) means: the height of an AVL tree is limited by O(log2n)  Cost for insertion is in O( log2n ) • One should only consider the path from the root to the insertion point • Rotations have constant costs  Cost for deletion is in O( log2n ) • For every node on the path from the root to the deleted node results in maximally one rotation • AVL trees are worst case efficient implementations of binary search trees • Natural trees need (n) steps in worst case • Calculating the average height is still an open problem • Empirical results give h = c + log2n for c  0,2

  10. Weight Balanced Binary Search Trees • Treat the “weight difference” of two subtrees as a measure of balancing • Weight = number of nodes in subtree • The properties are very similar to height balanced binary trees • Let T be a binary search tree, TL the left subtree and n(X) the number of nodes in a tree X Definition: the value (T) = (n(TL) + 1) / (n(T) + 1) is the root balance of T Definition: a tree T is -balanced, if for every subtree T’ holds that:   (T’)  1 - 

  11. Condition   (T’)  1 -  • The set of all -balanced binary trees are called BB() („bounded balance“). • The definition of balance only considers the left subtree, but for a BB() tree holds also for every subtree   1 - ’(T’)  1 -  where ’ analogue to  is defined on the right subtree • Parameter  defines the “distance” from a complete tree: •  = ½ only complete trees allowed •  < ½ relaxed condition •  = 0 no structural conditions •  > ½ makes no sense to consider

  12. Mars Jupiter Pluto Earth Mercury Uranus Neptune Saturn Venus Example • (T) = (n(TL) + 1) / (n(T) + 1) • Choose  = 0.3, then holds for every subtree  = 0.3    1 -  = 0.7 • Tree is in BB() for  = 0.3

  13. Notes • Already noted:  = ½ holds for complete trees • Root balance < ½ means: there are fewer nodes in the left subtree •  limits the root balance symmetrically from both sides • Left tree is complete: root balance goes towards 1 with increasing number of nodes • Only  = 0 allows all “degenerations” • Not every tree (with n nodes) can be transformed into a BB() tree for any  • There is at least one tree in BB() when 0,25    1 - ½ 2  0,292

  14. Height of Weight Balanced Trees • Note: when traversing the path from the root to the leaves one “looses”, dependent on , a number of nodes at every step • Consider the path p = v1, v2, ..., vh • For the right and left subtree TL and TR of a tree T holds (due to the BB() condition) n(TL) + 1  ( 1 -  ) (n(T) + 1) n(TR) + 1  ( 1 -  ) (n(T) + 1) • Traversal of path p: n(v2) + 1  ( 1 -  ) (n(v1) + 1) n(v3) + 1  ( 1 -  ) (n(v2) + 1)  n(vh) + 1  ( 1 -  ) (n(vh-1) + 1)

  15. Height of Weight Balanced Trees • As v1 is the root and vh a leaf, holds: n(T) + 1 = n(v1) + 1 and n(vh) + 1 = 2 • Insertion in the total inequality : 2 = n(vh) + 1  (1 - )h-1 (n(v1) + 1) = (1 - )h-1 (n(T) + 1) • Apply logarithms on both sides: 1  (h - 1)log2(1 - ) + log2 (n(T) + 1) • Thus (note: log2(1 - ) < 0 for  > 0): h - 1  log2 (n(T) + 1) / c  O(log2n) Height of the tree is logarithmic in the number of nodes

  16. Operations on Weight Balanced Binary Trees • Search is the same as for AVL trees • Cost is logarithmic • For insertion/deletion the root balance must be updated along the path from the root to the corresponding position • By violation of the criterion: rotations as for AVL trees • Open issues: • Are rotations appropriate measures for restructuring BB() trees? • How does one effectively calculate the root balance? • The number of rotations on the path to the root is limited: search/insertion/deletion are all in O(log2n)

  17. Position Search in Balanced Binary Search Tree • Comparison: Tree implementations vs. linked lists • Balanced trees allows (almost) all operations in O(log2n) • Linked lists need for search/insertion/deletion in O(n)! • For sequential traversal both perform in O(n) • Should sorted data always be stored in trees?! • One should not underestimate the implementation costs • “Last” operation where lists “win” is for positional search (the pth element) • Positional search: Find the kth element in a list • For trees the “list” is an inorder traversal

  18. ? The Problem • For lists: • Travers k elements in O(k) • For trees: • One does not “know” whether to go left or right, and one does not know anything about the number of nodes in the subtrees • Worst case all nodes must be visited: O(n)! • That can be improved! ...

  19. Rank of a Node Definition: The rank of a node is the number of nodes in the left subtree plus 1 Rank = position of node x in the tree where x is root class BinarySearchTree { int K; /* Key */ Info info; /* info */ int balance; /* BF, for AVL trees: -1, 0, +1 */ int rank; BinarySearchTree L, R; /* constructor und methods ... */ public BinarySearchTree posFind(int pos) { ... } }

  20. Algorithm • Pseudo code: • Start in the root • If pos < rank: search in the left subtree • If pos > rank: subtract the rank from the position and search in the right subtree • Search stops when pos = rank • Correctness: • The rank of a node is always its position in the subtree where it is the root • Note: when inserting/deleting in the left subtree, the nodes upwards until the root must update their ranks

  21. Prague Athens Tokyo Rome Cairo Paris Sofia Lima Oslo pos=5 pos=3 pos=8 pos=7 pos=10 pos=4 pos=2 pos=9 pos=1 pos=6 pos=11 Example pos = 4 -> Cairo 5 pos = 9 -> Rome 3 3 Bonn 2 1 2 2 Bern 1 1 1 1

  22. Java Method public BinarySearchTree findPos( int pos ) { BinarySearchTree root = this; while ( ( root  null ) && ( pos  root.rank )) { if ( pos < root.rank ) { root = root.L; } else { pos = pos - root.rank; root = root.R; } } return root; } Complexity in balanced tree O(log2n)

  23. Summary: Balanced Search Trees

  24. Extended Binary Trees

  25. Extended binary trees • Replace NULL-pointers with special (external) nodes. • A binary tree, to which external nodes are added, is called extended binary tree. • The data can be stored either in the internal or the external nodes. • The length of the path to the node illustrates the cost of the search.

  26. External and internal path length • The cost of the search in extended binary trees depend on the following parameters: • External path length = The sum over all path lengths from the root to the external nodes Si (1  i  n+1): Extn = i = 1 ... n+1 depth( Si ) • Internal path length = The sum over all path lengths to the internal nodes Ki ( 1  i  n ): Intn = i = 1 ... n depth( Ki ) • Extn = Intn + 2n (Proof by induction) • Extended binary trees with a minimal external path length have a minimal internal path length too.

  27. Example • External path length Extn = 3 + 4 + 4 + 2 + 3 + 3 + 3 + 3 = 25 • Internal path length Intn = 0 + 1 + 1 + 2 + 2 + 2 + 3 = 11 • 25 = Extn = Intn + 2n = 11 + 14 = 25 0 n = 7 1 1 2 2 2 2 3 3 3 3 3 3 4 4

  28. Minimal and maximal length • For a given n, a balanced tree has the minimal internal path length. • Example: Within a complete tree with height h, the internal path length is (for n = 2h -1): Intn = i = 1 ... h i • 2i • Internal path length becomes maximum if the tree degenerates to a linear list: Intn = i = 1 ... n-1 i = n(n-1)/2 Example: h = 4, n = 15, Int = 34, Ext = 16•4 = 64 For comparison: List with n = 15 nodes has Int = 105, Ext = 105 + 30 = 135

  29. 25 15 8 15 3 25 8 3 Weighted binary trees • Often weights qi are assigned to the external nodes ( 1  i  n+1 ). • The weighted external path length is defined as Extw = i = 1 ... n+1 depth( Si )  qi • Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more. • The determination of the minimal external path length is an important practical problem... Extw = 88 (less than 102 although linear list) Extw = 102

  30. Application example: optimal codes • To convert a text file efficiently to bit strings, there are two alternatives: • Fixed length coding: each character has the same number of bits (e.g., ASCII) • Variable length coding: some characters are represented using less bits than the others • Example for coding with fixed length: 3-bit code for alphabet A, B, C, D: • A = 001, B = 010, C = 011, D = 100 • Message: ABBAABCDADA is converted to • 001010010001001010011100001100001 (length 33 bits) • Using a 2-bit code the same message can be coded only with 22 bits. • For decoding the message, group each 3-bits (respectively 2bits) and use a table with the code and its matching character.

  31. Application example: optimal codes (2) • Idea: More frequently used characters are coded using less bits. • Message: ABBAABCDADA • Coding: 01010001011111001100 • Length: 20 Bit! • Variable length coding can reduce the memory space needed for storing the file. • How can this special coding be found and why is the decoding unique?

  32. Application example: optimal codes (3) • Representation of the frequencies and coding as a weighted binary tree. • First of all decoding: Given a bit string: • Use the successive bits, in order to traverse the tree starting from the root. • If you arrive to an external node, use the character stored there. Example: 010100010111... 1 0 5 A • 1. Bit = 0: external node, A • 2. Bit = 1, from the root to the right • 3. Bit 0, links, external node, B • 4. Bit = 1, from the root to the right • 5. Bit 1, right • ... 1 0 3 B 0 1 1 2 D C

  33. Correctness condition • Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character. • If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node). • If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code. • How is a tree with minimal external path length generated?

  34. Huffman Code • Idea: Characters are weighted and sorted according to the frequency • This works as well independently from the text, e.g., in English (characters with relative weights): • A binary tree with minimal external path length is constructed as follows: • Each character is represented with an appropriate tree with its corresponding weight (only one external node). • The two trees having respectively the smallest weight are merged to a new tree. • The root of the new tree is marked with the sum of the weights of the original roots. • Continue until only one tree remains.

  35. Example 1: Huffman • Alphabet and frequency: • Step 1: (4, 5, 9, 10, 29) • new weight: 9 4+5 0 1 5 4 9+9 0 1 • Step 2: (9, 9, 10, 29) • new weight: 18 9 9 0 1 5 4

  36. Example 1: Huffman (2) • Step 3: (18, 10, 29)  (10, 18, 29) • new weight: 28 10+18 0 1 18 10 57 0 1 0 1 9 9 28 29 0 1 0 1 5 4 18 10 0 1 9 9 • Step 4: (28, 29) • finished! 0 1 5 4

  37. Resulting tree • Coding: • Extw = 112 • Using this coding, the code e.g., for: • TENNIS = 00101101101010100 • SET = 0100100 • NET = 011100 • Decoding as described before. 57 0 1 28 E 0 1 18 T 0 1 9 N 0 1 I S

  38. Some remarks • The resulting tree is not regular. • Regular trees are not always optimal. • Example: the best nearly complete tree has Extw = 123 • For the messageABBAABCDADA20 bits is optimal(see previousslides) 29 10 9 5 4

  39. Example 2: Huffman • Average number of bits without Huffman: 3 (because 23 = 8) • Average number of bits using Huffman code: • There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman)

  40. Analysis /* Algorithm Huffmann */ for (int i = 1; i  n-1; i++) { p1 = smallest element in list L remove p1 from L p2 = smallest element in L remove p2 from L create node p add p1 und p2 as left and right subtrees to p weight p = weight p1 + weight p2 insert p into L } • Run time behavior depends in particular on the implementation of the list • Time required to find the node with the smallest weight • Time required to insert a new node • “Naive” implementations give O(n2), “smarter” result in O(n log2n)

  41. Optimality • Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root. • Theorem: A Huffman tree is an extended binary tree with minimal external path length Extw. • Proof outline(per induction over n, the number of the characters in the alphabet): • The statement to prove is A(n) = “A Huffman tree with n nodes has minimal external path length Extw”. • Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes has minimal external path length”.

  42. Optimality (2) • Proof: • n = 2: Only two characters with weights q1 and q2 result in a tree with Extw = q1 + q2. This is minimal, because there are no other trees. • Induction hypothesis: For all i  n, A(i) is true. • To prove: A(n+1) is true. V T1 T2

  43. Optimality (3) • Proof: • Consider a Huffman tree T with n+1 nodes. This tree has a root V and two subtrees T1 und T2, which have respectively the weights q1 and q2. • Considering the construction method we can deduce, that For the weights qi of all internal nodes ni of T1 and T2: qi  min(q1, q2). • That’s why: for these weights qi: q1 + q2 > qi. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight. • Replacing nodes within T1 and T2 will not make sense, because T1 and T2 are already optimal (both are trees with n nodes or less and the induction hypothesis hold for them). • So T is an optimal tree with n+1 nodes. q1 + q2 V q1 q2 T1 T2

  44. Huffman Code: Applications • Fax machine

  45. Huffman: Other applications • ZIP-Coding (at least similar technique) • In principle: most of coding techniques with data reduction (lossless compression) • NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …

More Related