CS 221

CS 221 Analysis of Algorithms Ordered Dictionaries and Search Trees

Portions of these slides come from • Michael Goodrich and Roberto Tamassia, Algorithm Design: Foundations, Analysis and Internet Examples, 2002, John Wiley and Sons. • and its authors, Michael Goodrich and Roberto Tamassia, • the books publisher John Wiley & Sons • and… • www.wikipedia.org

Reading material • Goodrich and Tamassia, 2002 • Chapter 2, section 2.5,pages 114-137 • see also section 2.6 • Chapter 3, section 3.1 pages 141-151 • Wikipedia: • http://en.wikipedia.org/wiki/AVL_trees

in the previous episode… • …we defined a data structure which we called a dictionary. It was… • a container to hold multiple objects or in Goodrich and Tamassia’s terminology “items” • each item = a (key, element) pair • element = a “piece” of data • think= name, address, phone number • key = a value we associate the element to help us find, retrieve, delete, etc an element • think = rdbms autoincrement key, student ID#

Dictionaries • Up til now we looked at • Unordered dictionaries • container for (k,e) pairs but… • in no particular order • Logfiles • Hash Tables

Dictionaries • A terminology note • for purposes of our discussion – • A linear unordered dictionary = logfile • A lineary ordered dictionary = lookup table

Game Time • Twenty Questions • One person thinks of an object that can be any person, place or thing… • and does not disclose the selected object until it is specifically identified by the other players… • All other players take turns asking Yes/No questions in an attempt to identify the mystery object

Game Time • Twenty Questions • An efficient problem solving strategy is to ask questions for which the answers will optimally narrow the size of the problem space (possible solutions) • for example, • Q: Is it a person? • A: Yes ….we just eliminated all places and non-human objects from the solution set

Game Time • Twenty Questions • Size of problem? • N=??? large ~∞ • Yes/No attack makes this a binary search problem… • So, what size of problem space can we effectively search? • 220

Game Time • Twenty Questions • Something to think about… • N is conceivably much larger than 220 • So, how is that we can usually solve this problem in 20 steps or less… • i.e. correctly identify the mystery object

Dictionaries • Ordered Dictionaries • suppose the items in a dictionary are ordered (sorted) • like low to high • Would that make a difference in terms of • size() • isEmpty() • findElement() • insertItem() • removeItem()

Dictionaries • Ordered Dictionaries • suppose we implement an ordered dictionary as a linear data structure or more specifically a vector • items are in vector in key order • we gain considerable efficiency because we can visit D[x], where x is a rank in O(1) time • Can we achieve the same time of findElement() time if the ordered dictionary were implemented as a linked list?

Binary Search • Binary search performs operation findElement(k) on a dictionary implemented by means of an array-based sequence, sorted by key • similar to the high-low game • at each step, the number of candidate items is halved • terminates after O(log n) steps • Example: findElement(7) 0 1 3 4 5 7 8 9 11 14 16 18 19 m h l 0 1 3 4 5 7 8 9 11 14 16 18 19 m h l 0 1 3 4 5 7 8 9 11 14 16 18 19 m h l 0 1 3 4 5 7 8 9 11 14 16 18 19 l=m =h

Binary Search • Lookup tables are not very efficient for dynamic data (lot of insertItem, removeElement • Lookup tables are efficient for dictionaries where predominant access is findElement, and relatively little inserts or removes • credit card authorizations, code translation tables,…

Binary Search Tree • Binary tree for holding (k,e) items, such that… • each internal node v store elem e with key k • k of e in left subtree of v <= k of v • k of e in right subtree of v >= k of v • external nodes store no elements… • only placeholder (NULL_NODE)

Binary Search Tree • Each left subtree is less than its parent • Each right subtree is greater than its parent • All leaf nodes hold no items 58 31 90 62 25 42 12 36 75

Search • AlgorithmfindElement(k, v) • ifT.isExternal (v) • returnNO_SUCH_KEY • if k<key(v) • returnfindElement(k, T.leftChild(v)) • else if k=key(v) • returnelement(v) • else{ k>key(v) } • returnfindElement(k, T.rightChild(v)) 6 < 2 9 > = 8 1 4

removeElement(k) – simple case 6 < • To perform operation removeElement(k), we search for key k • Assume key k is in the tree, and let let v be the node storing k • If node v has a leaf child w, we remove v and w from the tree with operation removeAboveExternal(w) • Example: remove 4 2 9 > v 1 4 8 w 5 6 2 9 1 5 8

RemoveElement(k) – more complicated case 1 v • We consider the case where the key k to be removed is stored at a node v whose children are both internal • we find the internal node w that follows v in an inorder traversal • we copy key(w) into node v • we remove node w and its left child z (which must be a leaf) by means of operation removeAboveExternal(z) • Example: remove 3 3 2 8 6 9 w 5 z 1 v 5 2 8 6 9

Binary Search Tree Performance • Consider a dictionary with n items implemented by means of a binary search tree of height h • the space used is O(n) • methods findElement , insertItem and removeElement take O(h) time • The height h is O(n) in the worst case and O(log n) in the best case

Balanced Trees • When a path in a tree gets very long relative to other paths in the tree… • the tree is unbalanced • In fact, in its extreme form an unbalanced tree is a linear list. • So, to achieve optimal performance… • you need to keep the tree balanced

AVL Trees • we want to maintain a balanced tree • recall- • height of a node v = longest path from v to an external node • We want to maintain the principle that • for every node v the height of its children can differ by no more than 1 • Height-Balance Property

AVL Trees • h(right_subtree)-h(left_subtree) = Balance Factor • |h(right_subtree)-h(left_subtree)| = {0,1} • Tree with Balance Factor ≠ {-1,0,1} • Unbalanced Tree • Must be rebalanced • Balance Factor exists for every node v • except (trivially) external nodes

AVL Trees • If Balance Factor = -1,0,1 • tree balanced • does not need restructured • If Balance Factor = -2, 2 • tree unbalanced • needs restructured • restructured done by process called rotation

AVL Trees • Rotation • Four types – but two are symmetrical • Left Single Rotation • Right Single Rotation • Left Double Rotation • Right Double Rotation • Since two are symmetrical –only consider single and double rotation

AVL Trees • Rotation • if BF = 2

AVL Trees • Binary Trees that maintain the Height-Balance Property are called • AVL trees • the name comes from the inventors • G.M. Adelson-Velsky and E.M. Landis in paper entitled “An Algorithm for Information Organization”

AVL Trees Unbalanced Tree Balanced Tree from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees • h(right_subtree)-h(left_subtree) = Balance Factor (BF) • If BF = {-1,0,1} then tree balanced (do nothing) • If BF ≠{-1,0,1} then tree unbalanced (must be restructured) • Restructuring done by rotation from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees • Rotation • four cases – but pairs are symmetrical • left single rotation • right single rotation • left double rotation • right double rotation • singe symmetric – we only examine single and double from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion • Rotation • If BF > 2 unbalance occurred further down in right subtree • Recursively walk down subtree until |BF| =2 • If BF < -2 unbalance occurred further down in left subtree • Recursively walk down subtree until |BF| =2 from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion • Rotation • If BF = 2 unbalance occurred in right subtree • Recursively walk down subtree until |BF| =2 • If BF = -2 unbalance occurred in left subtree • Recursively walk down subtree until |BF| =2 from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion • Rotation • If BF = 2 unbalance occurred in right subtree • Step down to subtree to find where insertion occurred • If BF = -2 unbalance occurred in left subtree • Step down to subtree to find where insertion occurred from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion • Rotation • If BF at subtree = 1 • insertion occurred on right leaf node • single rotation required • If BF at subtree = -1 • insertion occurred on left leaf node • double rotation occurred from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion • Rotation • See • http://en.wikipedia.org/wiki/AVL_trees from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion • Performance • rotations – O(1) • Recall h(T) maintained at O(log n) • insertItem – O(log n) • balanced tree - priceless from:http://en.wikipedia.org/wiki/AVL_trees

Bounded –depth Search Trees • Search efficiency in tree is related to the depth of the tree • Can use depth bounded tree to create ordered dictionaries that run in O(log n) for search and update run-time

Multi-way Search Trees • Remember Binary Search Trees • any node v can have at most 2 children • what if we get rid of that rule • Suppose a node could have multiple children (>2) • Terminology – if v has d children – v is a d-node

Multi-way Search Trees • Multi-way Search Tree - T • Each Internal node must have at least two children -- internal node is d-node with d ≥ 2 • Internal nodes store collections of items (k,e) • Each d-node stores d-1 items • Special keys k0 = -∞ and kd = ∞ • External nodes only placeholders

CS 221

CS 221

Presentation Transcript

Statistics 221

MQM 221

CS 221 – May 8

CS 221 – May 14

EDU 221

EDU 221

CS 221 – May 13

CS 221 – May 24

CS 221 Guest lecture: Cuckoo Hashing

221.moe.tw/

CS 221

CS 221

INT 221

ESL 221

CS 221

221 PHT

CS 221

EDU 221

CS 221 Chapter 2 Excel

CS 221/ IT 221 Lecture 14

ESL 221

ESL 221