400 likes | 538 Vues
This presentation explores the concepts of ordered dictionaries and search trees, foundational elements in algorithm analysis. Based on the work of Michael Goodrich and Roberto Tamassia, it examines data structures used to hold (key, element) pairs, emphasizing efficiency in retrieval and modification operations. The presentation delves into various types of dictionaries, including unordered and ordered forms, and introduces binary search trees, which significantly enhance search efficiency. Additionally, it discusses practical applications and the importance of effective problem-solving strategies for optimal data management.
E N D
CS 221 Analysis of Algorithms Ordered Dictionaries and Search Trees
Portions of these slides come from • Michael Goodrich and Roberto Tamassia, Algorithm Design: Foundations, Analysis and Internet Examples, 2002, John Wiley and Sons. • and its authors, Michael Goodrich and Roberto Tamassia, • the books publisher John Wiley & Sons • and… • www.wikipedia.org
Reading material • Goodrich and Tamassia, 2002 • Chapter 2, section 2.5,pages 114-137 • see also section 2.6 • Chapter 3, section 3.1 pages 141-151 • Wikipedia: • http://en.wikipedia.org/wiki/AVL_trees
in the previous episode… • …we defined a data structure which we called a dictionary. It was… • a container to hold multiple objects or in Goodrich and Tamassia’s terminology “items” • each item = a (key, element) pair • element = a “piece” of data • think= name, address, phone number • key = a value we associate the element to help us find, retrieve, delete, etc an element • think = rdbms autoincrement key, student ID#
Dictionaries • Up til now we looked at • Unordered dictionaries • container for (k,e) pairs but… • in no particular order • Logfiles • Hash Tables
Dictionaries • A terminology note • for purposes of our discussion – • A linear unordered dictionary = logfile • A lineary ordered dictionary = lookup table
Game Time • Twenty Questions • One person thinks of an object that can be any person, place or thing… • and does not disclose the selected object until it is specifically identified by the other players… • All other players take turns asking Yes/No questions in an attempt to identify the mystery object
Game Time • Twenty Questions • An efficient problem solving strategy is to ask questions for which the answers will optimally narrow the size of the problem space (possible solutions) • for example, • Q: Is it a person? • A: Yes ….we just eliminated all places and non-human objects from the solution set
Game Time • Twenty Questions • Size of problem? • N=??? large ~∞ • Yes/No attack makes this a binary search problem… • So, what size of problem space can we effectively search? • 220
Game Time • Twenty Questions • Something to think about… • N is conceivably much larger than 220 • So, how is that we can usually solve this problem in 20 steps or less… • i.e. correctly identify the mystery object
Dictionaries • Ordered Dictionaries • suppose the items in a dictionary are ordered (sorted) • like low to high • Would that make a difference in terms of • size() • isEmpty() • findElement() • insertItem() • removeItem()
Dictionaries • Ordered Dictionaries • suppose we implement an ordered dictionary as a linear data structure or more specifically a vector • items are in vector in key order • we gain considerable efficiency because we can visit D[x], where x is a rank in O(1) time • Can we achieve the same time of findElement() time if the ordered dictionary were implemented as a linked list?
Binary Search • Binary search performs operation findElement(k) on a dictionary implemented by means of an array-based sequence, sorted by key • similar to the high-low game • at each step, the number of candidate items is halved • terminates after O(log n) steps • Example: findElement(7) 0 1 3 4 5 7 8 9 11 14 16 18 19 m h l 0 1 3 4 5 7 8 9 11 14 16 18 19 m h l 0 1 3 4 5 7 8 9 11 14 16 18 19 m h l 0 1 3 4 5 7 8 9 11 14 16 18 19 l=m =h
Binary Search • Lookup tables are not very efficient for dynamic data (lot of insertItem, removeElement • Lookup tables are efficient for dictionaries where predominant access is findElement, and relatively little inserts or removes • credit card authorizations, code translation tables,…
Binary Search Tree • Binary tree for holding (k,e) items, such that… • each internal node v store elem e with key k • k of e in left subtree of v <= k of v • k of e in right subtree of v >= k of v • external nodes store no elements… • only placeholder (NULL_NODE)
Binary Search Tree • Each left subtree is less than its parent • Each right subtree is greater than its parent • All leaf nodes hold no items 58 31 90 62 25 42 12 36 75
Search • AlgorithmfindElement(k, v) • ifT.isExternal (v) • returnNO_SUCH_KEY • if k<key(v) • returnfindElement(k, T.leftChild(v)) • else if k=key(v) • returnelement(v) • else{ k>key(v) } • returnfindElement(k, T.rightChild(v)) 6 < 2 9 > = 8 1 4
removeElement(k) – simple case 6 < • To perform operation removeElement(k), we search for key k • Assume key k is in the tree, and let let v be the node storing k • If node v has a leaf child w, we remove v and w from the tree with operation removeAboveExternal(w) • Example: remove 4 2 9 > v 1 4 8 w 5 6 2 9 1 5 8
RemoveElement(k) – more complicated case 1 v • We consider the case where the key k to be removed is stored at a node v whose children are both internal • we find the internal node w that follows v in an inorder traversal • we copy key(w) into node v • we remove node w and its left child z (which must be a leaf) by means of operation removeAboveExternal(z) • Example: remove 3 3 2 8 6 9 w 5 z 1 v 5 2 8 6 9
Binary Search Tree Performance • Consider a dictionary with n items implemented by means of a binary search tree of height h • the space used is O(n) • methods findElement , insertItem and removeElement take O(h) time • The height h is O(n) in the worst case and O(log n) in the best case
Balanced Trees • When a path in a tree gets very long relative to other paths in the tree… • the tree is unbalanced • In fact, in its extreme form an unbalanced tree is a linear list. • So, to achieve optimal performance… • you need to keep the tree balanced
AVL Trees • we want to maintain a balanced tree • recall- • height of a node v = longest path from v to an external node • We want to maintain the principle that • for every node v the height of its children can differ by no more than 1 • Height-Balance Property
AVL Trees • h(right_subtree)-h(left_subtree) = Balance Factor • |h(right_subtree)-h(left_subtree)| = {0,1} • Tree with Balance Factor ≠ {-1,0,1} • Unbalanced Tree • Must be rebalanced • Balance Factor exists for every node v • except (trivially) external nodes
AVL Trees • If Balance Factor = -1,0,1 • tree balanced • does not need restructured • If Balance Factor = -2, 2 • tree unbalanced • needs restructured • restructured done by process called rotation
AVL Trees • Rotation • Four types – but two are symmetrical • Left Single Rotation • Right Single Rotation • Left Double Rotation • Right Double Rotation • Since two are symmetrical –only consider single and double rotation
AVL Trees • Rotation • if BF = 2
AVL Trees • Binary Trees that maintain the Height-Balance Property are called • AVL trees • the name comes from the inventors • G.M. Adelson-Velsky and E.M. Landis in paper entitled “An Algorithm for Information Organization”
AVL Trees Unbalanced Tree Balanced Tree from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees • h(right_subtree)-h(left_subtree) = Balance Factor (BF) • If BF = {-1,0,1} then tree balanced (do nothing) • If BF ≠{-1,0,1} then tree unbalanced (must be restructured) • Restructuring done by rotation from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees • Rotation • four cases – but pairs are symmetrical • left single rotation • right single rotation • left double rotation • right double rotation • singe symmetric – we only examine single and double from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees - Insertion • Rotation • If BF > 2 unbalance occurred further down in right subtree • Recursively walk down subtree until |BF| =2 • If BF < -2 unbalance occurred further down in left subtree • Recursively walk down subtree until |BF| =2 from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees - Insertion • Rotation • If BF = 2 unbalance occurred in right subtree • Recursively walk down subtree until |BF| =2 • If BF = -2 unbalance occurred in left subtree • Recursively walk down subtree until |BF| =2 from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees - Insertion • Rotation • If BF = 2 unbalance occurred in right subtree • Step down to subtree to find where insertion occurred • If BF = -2 unbalance occurred in left subtree • Step down to subtree to find where insertion occurred from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees - Insertion • Rotation • If BF at subtree = 1 • insertion occurred on right leaf node • single rotation required • If BF at subtree = -1 • insertion occurred on left leaf node • double rotation occurred from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees - Insertion • Rotation • See • http://en.wikipedia.org/wiki/AVL_trees from:http://en.wikipedia.org/wiki/AVL_trees
AVL Trees - Insertion • Performance • rotations – O(1) • Recall h(T) maintained at O(log n) • insertItem – O(log n) • balanced tree - priceless from:http://en.wikipedia.org/wiki/AVL_trees
Bounded –depth Search Trees • Search efficiency in tree is related to the depth of the tree • Can use depth bounded tree to create ordered dictionaries that run in O(log n) for search and update run-time
Multi-way Search Trees • Remember Binary Search Trees • any node v can have at most 2 children • what if we get rid of that rule • Suppose a node could have multiple children (>2) • Terminology – if v has d children – v is a d-node
Multi-way Search Trees • Multi-way Search Tree - T • Each Internal node must have at least two children -- internal node is d-node with d ≥ 2 • Internal nodes store collections of items (k,e) • Each d-node stores d-1 items • Special keys k0 = -∞ and kd = ∞ • External nodes only placeholders