1 / 34

CSE 326

CSE 326. Nov 18, 1999 (Title pages make Powerpoint happy). Hash tables Hashing summary Hashing for Web server load balancing 2-3, (a,b), B-trees 2-3 Tree: invariant, and insert/delete (a,b)-Tree Why many children Why nodes at root Zasha’s Up-tree mistake k-d trees

mazzone
Télécharger la présentation

CSE 326

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 326 Nov 18, 1999 (Title pages make Powerpoint happy)

  2. Hash tables Hashing summary Hashing for Web server load balancing 2-3, (a,b), B-trees 2-3 Tree: invariant, and insert/delete (a,b)-Tree Why many children Why nodes at root Zasha’s Up-tree mistake k-d trees Assignment 7 group discussion Menu

  3. “Hash Table” terminology confusion • Regular, vanilla Arrays sometimes called “Hash Table” • Don’t get confused • Laugh quietly • Ask “where’s the hashing function?”

  4. Hashing in Greg Badros’s childhood • Call customer service for status of your order. • Notes on status stored in one of 100 boxes. • Which box? Last 2 digits of phone number. • Hash(Customer)=PhoneNumber mod 100 • External hashing with list stored in array cell (papers in box)

  5. Basic elements of hashing

  6. External/Open Hashing • Collision: Array indexes point to buckets • Buckets = linked lists • Buckets = AVL trees  O(log n) worst case • Insert/Delete easy

  7. Coalesced Chaining • Store linked list inside Array (instead of external list) • On collision: look for first free Array element, starting at first • Reserve “Cellar” for hotspots at begin of Array

  8. Open Addressing • Collision: Implicit list based on 2nd hashing function • On collision: add 2nd hashing function to last index • Types: • “Linear probing”. 2nd function is H(K)=c for some constant (can have “clusters”) • “Double Hashing”. pseudo-random 2nd function (clusters not likely)

  9. Open Addressing cont’d • Deletes are tricky. “Deleted flag” • “Ordered hashing”. • ordered list of collisions • reduce time for unsuccessful searches

  10. Web Server Without Load Balancing

  11. Random Load Balancing

  12. Hashed Load Balancing

  13. 2-3 Tree: Invariants • All leaves are same depth • log time • All internal nodes have 1 or 2 keys (2 or 3 children) • All leaves have 1 or 2 keys • In-order property • like BST property • so we can find things efficiently

  14. 2-3 Tree: Zasha’s flawed intuition • BST is like 1-2 tree • 2-child nodes: can be balanced • 1-child nodes: unbalanced linked list • 2-3 Tree • 2-child nodes is worst case. • (FLAW: also need “all leaves same depth”, or else degenerate Huffman tree)

  15. 2-3 Tree: Insert (p. 231) Find leaf node to get K nodeWithInsert is node that gets K Loop If nodeWithInsert has 2 Keys (3 children) Then { Fine } Exit Loop If nodeWithInsert has 3 Keys (4 children) Then { Ooops – too many } Split nodeWithInsert into 2 nodes { Say 2 nodes have parent – throw this new parent to nodeWithInsert’s Parent } nodeWithInsert = nodeWithInsert.Parent If nodeWithInsert is Root Then Create new root with 2 children Exit Loop

  16. 2-3 Tree: Why insert works • All nodes end up with 1/2 Keys (2/3 children) • Keeps in-order property • split values for 3-Key (4-child) nodes into 2-child nodes • Keeps same levels • only time we add a level anywhere is Root • adding new root adds levels symmetrically

  17. 2-3 Tree: Delete (p. 231) Find nodeWithDelete based on Key If nodeWithDelete is not leaf Then nodeWithDelete=In-Order Successor(nodeWithDelete) { nodeWithDelete is a leaf} Loop If nodeWithDelete has 1 Key (2 Children) Then { Fine } Exit Loop { cont’d }

  18. 2-3 Tree: Delete cont’d If 0 keys (1 child) Then { Ooops } If nodeWithDelete is Root Then Remove Root Exit Loop Set nodeSibling { parent has 2/3 children } Set S: Key(nodeWithDelete),S,Key(nodeSibling) Set siblingKey : Key(nodeSibling) If nodeSibling has 1 Keys (2 Children) Then Move S from parent to nodeWithDelete Replace S with sKey from nodeSibling Move child of siblingKey Exit Loop Else Parent is parent(nodeWithDelete,nodeSibling) { give nodeWithDelete 2 keys (3 children) } nodeWithDelete gets keys S and siblingKey nodeWithDelete=Parent { we took a key } { loop again with Parent }

  19. 2-3 Tree: Why Delete works • All nodes end up with 1/2 Keys (2/3 children) • Keeps in-order property • careful about which keys we take from parent/sibling • Keeps same levels • only time we remove a level is Root • adds levels symmetrically

  20. (a,b) Tree: Why such big nodes? • Hard Drive / CD-ROM • Read sector at a time • Sector: 256 bytes – 4 KB (typical) • AVL/2-3 node likely <20 bytes • You read 256 bytes to get 20 useful bytes???

  21. (a,b) Tree: Why keys at leaf? • Database: nodes on disk. • Every access to tree asks for Root •  Keep Root in RAM • Keys in root helps rarely • Put Keys in leaves, Cache higher nodes

  22. (a,b) Tree: Why keys at leaf: 2 • No in-order successor business • Can fiddle with Keys at internal nodes • e.g. Chop off unnecessary suffixes

  23. (a,b) Tree: Invariants • a >= 2, b >= 2a-1 • All leaves have same depth • All internal nodes have a..b children • All leaves have (a-1)..(b-1) keys • In-order property for efficient searching

  24. (a,b) Tree: Insert idea • Same idea as 2-3 Tree Vague code snippet If nodeWithInsert has too many children Then Split it: get 2 nodes, each with b/2 children Now, Parent is nodeWithInsert

  25. (a,b) Tree: Delete idea • Same idea as 2-3 Tree Vague code snippet If nodeWithInsert has too few children Then Get nodeSibling If nodeSibling has >a children Then Borrow key from parent Replace borrowed key from nodeSibling Take nodeSibling’s child from its just- removed key Else nodeSibling has a children, So Borrow key from parent Ooops, parent lost a key nodeWithInsert=Parent

  26. B-Tree: Summary • B-Tree is (a,b) tree with b=2a-1 • Why not get the biggest value of a we can? • Who wants to bother with two different parameters???

  27. B-tree for full text index (aka “inverted index”) • One possibility • Key: word • Info: list of documents (or document IDs) • Jim & Zasha’s idea for phrases • Key: word • Info: list of (document,list of occurrences as word #n)

  28. Depth first search • Avoid loops: don’t repeat nodes. • Use some kind of Set ADT

  29. Zasha’s Up-tree mistake

  30. 2-d Trees

  31. 2-d Trees: points • Like binary trees, but lines • Not always balanced • R-trees are • balanced • big nodes like B-trees • Usually faster than lists… • source: Zasha

  32. Assignment 7 Discussion Points • What task will your program accomplish? • What sub-problems will it need to solve, to accomplish this task? • How do the sub-problems relate to each other? What does one sub-problem need from another in order to work? • What seems tricky?

  33. Discussion example: Huffman • Purpose: • Program will encode text files using Static Huffman trees, and then decode the encoded files. This will compress files. • Sub-problems: • Reading/writing files. • Reading/writing files with bits • Heap with DeleteMin/Insert • Building Static Huffman tree using Heap • Encoding/Decoding scheme for Static Huffman tree with bit-oriented files

  34. Discussion example: Huffman cont’d • Sub-problem relationships: • Reading/writing files is lowest level. • R/W files with bits needs R/W files • Heap is lowest • Building static Huffman needs reading files and Heap • Encoding/Decoding tree needs Static Huffman tree and bit-oriented files,

More Related