Tirgul 7

Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples

The heap property & heapify • A heap is a complete binary tree, in which each node is larger than both his sons. • The largest element is the root of the tree. • Notice that this does not mean, however, that the two sons of the root are the 2nd two largest in the heap.

The heap property & heapify • Heapify: assumes that both subtrees of the root are heaps, but the root may be smaller than one of its children: • Heapify(Node x) largest = max {left(x), right(x)} if ( largest > x ) exchange (largest, x) heapify (x)

The heap property & heapify • The node 3 is replaced with 16, then with 12, and then with 8.

Priority Queue • Priority Queue applies First In Highest Priority Out. • Supports the operations: • insert, maximum, extract-maximum. • It is simple to implement it using heaps: • maximum: just return the root • extract-maximum: save the root, move the last leaf to be the root, perform heapify and return the saved root. • insert: add the node as a leaf and then move it up until its value is lower than its parent’s value.

Priority Queue • Example for Priority Queue insert operation. • When inserting 11, first add it as a leaf, then exchange it with 5, then with 9.

Performance • Priority Queue using heaps: • maximum operation takes O(1). • extract-max operation takes O(log n). • insert operation takes O(log n). • Priority Queue using ordered list: • maximum operation takes O(1). • extract-max operation takes O(1). • insert operation takes O(n).

Insert versus Build Heap • What is the difference between insert operation of Priority Queue and Build_Heap? for downto 1do while there’s some x not yet in heap

Insert versus Build Heap • andsometimes create different heaps. • For example, consider the sequence 1,2,3,4 • build-heap will create 4,2,3,1, • Insert-build-heap will create 4,3,2,1. • Run time • Build-heap = O(n). • Insert-build-heap = O(nlogn) in the worst-case.

Questions • How to implement a queue/stack with a priority queue? What are the differences in running times? • How to implement an increase-key operation, which increases the value of some node? • How to delete a given node from the heap in O(logn)?

Dictionary / Map ADT • This ADT stores pairs of the form <key, data> (in java: “value” instead of “data”). • Supports the operations insert(key, data), find(key), and delete(key). • One way to implement it is by search trees. The standard operations take O(log n) this way. • Can we achieve better performance for the standard operations?

Direct addressing • Say the keys comes from a (large) set U. one way to have fast operations is by allocating an array of size |U|. This is of course a waste of memory, since most entries in the array will remain empty. • For example, A Hebrew dictionary (e.g. Even-Shushan) holds less than 100,000 words whereas the number of possible combinations of Hebrew letters is much bigger (225 for 5-letter words only). It’s impractical to try and allocate all this space that will never be used.

Hash table • In a hash table, we allocate an array of size m, which is much smaller than |U|. • We use a hash function h() to determine the entry of each key. • When we want to insert/delete/find a key k we look for it in the entry h(k) in the array. • Notice that this way, it is not necessary to have an order among the elements of the table.

Example of usage • Take for example the login names of the students in the dast course. There are about 300 login names. • If we will use a binary search tree, a tree of height 8 will be created. • We can store the login names in a hash table of 100 entries using h(k)=k mod 100 hash function. • The next slide presents a possible spread of login names on entry.

Example of usage • The x-axis describes the number of items in an entry. • The y-axis describes how many entries with this load exist.

Example of usage • Notice that even if the spread of login names was perfect, there would have been 3 names in an entry. (This spread considered to be good.) • Searching a login name with binary search tree: • Since half of the elements are in the leaves it would take 8 operations to find them. • Searching a login name with the hash table: • The worst case is still 8, but the search for most of the elements (about 80% of them) will take about half than that.

Example of usage • Two questions arise from the example: • What is the best hash functions to use, and how do we find it? • What happens when several keys have the same entry? (clearly it might happen, since U is much larger than m).

How to choose hash functions • The crucial point: the hash function should “spread” the keys of U equally among all the entries of the array. • Unfortunately, since we don’t know in advance the keys that we’ll get from U, this can be done only approximately. • Remark: the hash functions usually assume that the keys are numbers. We’ll discuss next class what to do if the keys are not numbers.

The division method • If we have a table of size m, we can use the hash function • Some values of m are better than others: • Good m’s are prime numbers not too close to 2p. • Bad choice is 2p - the function uses only the p less significant bits of k. • Likewise - if keys are decimal using 10p is bad. • A bad choice example: • if m=100 and key=3674 a decimal number then gives 74.

The division method • A good choice example: • if we have |U|=2000, and we want each search to take (on average) 3 operations, we can choose the closest primal number to 2000/3, m=701. 0 701,1402 1 702,1403 . . . 700 700…

More About Hash Tables • Next class 

Tirgul 7

Tirgul 7

Presentation Transcript

Tirgul 10

Tirgul 13

Tirgul 11

Tirgul 10

Tirgul 11

Tirgul no. 13

Tirgul 12

Tirgul 13

Tirgul 14

Tirgul 7

Tirgul 2

Tirgul 7

Tirgul 12

Tirgul no. 8

Tirgul no. 2

Tirgul 8

Tirgul 8

Tirgul 4

Tirgul 5 Notes

Tirgul 2 Notes

Tirgul 6

Tirgul no. 7