Collections Overview: Sets, Maps, and Implementations
E N D
Presentation Transcript
Overview • The Collections Hierarchy • Sequence vs. Associative Containers • Sets • Maps • Implementing Sets and Maps • Trees • Hash tables
the Collection interface • List interface extends Collection interface • Collection is an object that holds other objects • Collections have base types (<String>, ...) • Sets are another kind of Collection • as are Queues Collection<E> List<E> Set<E> Queue<E> Collection, List, Set & Queue are interfaces; ArrayList & LinkedList are classes ArrayList<E> LinkedList<E>
Collection Methods • Common to Lists, Sets and Queues • add(T), remove(T), clear() • contains, equals, isEmpty, size, toArray • addAll, containsAll, removeAll, retainAll • hashCode, iterator • List and Deque add several methods • different methods for each • Set adds no more methods
the Collections class • The Collections (note the s) class • static methods for Collection objects • like ArrayLists • e.g. Collections.sort(…) • Collection is an interface • set of method headers for a “skill set” • Collections is a class • (static) methods with definitions
Other Collections Methods • Collections.methodName(args..); • frequency(list, 10) 2 [10, 20, 10, 5] • max(list) 20 [10, 20, 10, 5] • min(list) 5 [10, 20, 10, 5] • reverse(list) [5, 10, 20, 10] • replaceAll(list, 10, 7) [5, 7, 20, 7] • swap(list, 0, 2) [20, 7, 5, 7] • shuffle(list) maybe [7, 20, 5, 7] max & min can also take comparators as 2nd argument
Exercise • Write a method that shows how many of each element is in a List<Integer> – but only in the range of the elements in the list • for example: [10, 5, 10, 3, 5, 10, 22, 19, 10, 5] • 3 appears 1 time(s) • 4 appears 0 time(s) • 5 appears 3 time(s) • ... • 22 appears 1 time(s)
Lists vs. Sets • List elements allow duplicates; Sets do not • list1 [a, b, c, a, b, d, a, a, a, a, b, z] • set1 [a, b, c, d, z] • Client puts List elements in order; computer chooses order for Set elements • list1.add("e"); [a, b, c, a, b, d, a, a, a, a, b, z, e] • set1.add("e"); [a, b, c, d, e, z] • Set interface implemented by (e.g.) TreeSet
Container Types • Sequence containers (List) • accessed by position • first element, second, third, …, last • duplicates allowed • Associative containers (Set, Map) • accessed by key (= part of its value) • position is incidental ( doesn’t matter) • no duplicates allowed
10 20 30 15 7 19 40 21 10 8 42 90 54 0 1 2 3 4 5 6 7 8 9 10 11 12 lion6 31 cat3 99 4 8 elephant6 15 18 55 12 dog7 elf0 11 42 walrus11 Container Types • Sequence (List) • Associative (Set, Map) 25 91 7 3 -5 88 42
Is 4 in the set? yes Is 7 in the set? no How many things are in the set? 10 What is the value of “cat”? 3 Change the value of “elephant” to 7 Is the map empty? no lion6 31 cat3 99 4 8 elephant6 15 18 55 12 dog7 elf0 11 42 walrus11 Associative Containers elephant7
Java Set and Map • Set = collection of unique values • is <key> there or not? • Map = function from key to value • what is the value of <key>? • Set extends Collection; Map does not • Sets can be implemented with Maps • TreeSet implemented using a TreeMap • HashSet implemented using a HashMap
Set and Map Interfaces • Set contains elements of a single type • Set<String>, Set<Integer>, Set<Student> public interface Set<E> extends Collection • Map needs two types: key and value • Map<String, Integer>, Map<String, Student> public class Map<K, V>
Map and Set Application • Simple web page search engine • maintains a Map<String, Set<URL> > • key string = search word (“nonmonotonic”) • Set<URL> = web pages containing that word • Multi-term search uses set intersection • set of pages with “nonmonotonic” = s1 • set of pages with “reasoning” = s2 • s1 intersect s2 = set of pages with both
Set Element Order • Depends on the implementation Set<String> treeSet = new TreeSet<>(); Set<String> hashSet = new HashSet<>(); treeSet.add("words"); hashSet.add("words"); treeSet.add("in"); hashSet.add("in"); treeSet.add("this"); hashSet.add("this"); treeSet.add("set"); hashSet.add("set"); System.out.println(treeSet); System.out.println(hashSet); [in, set, this, words] [set, in, words, this]
31 99 4 8 15 18 55 12 11 42 Implementing Sets and Maps • Binary Search Trees • linked structure • Hash Tables • array structure 0 1 2 3 4 5 6 7 8 9 10 11 12 / / 15 31 55 4 8 99 31 4 12 55 18 11 18 42 42 15 8 99 / 11 12
Binary Search Trees • Tree: root and children • each child has one parent above it in the tree • Binary Tree: at most two children • a left child and a right child • Binary Search Tree: left < root < right • everything to the left of the root is smaller • everything to the right of root is larger
BST Example • Root is 31 • its children are 8 and 99: 8 < 31 < 99 • Root is 8 • its children are 4 and 12: 4 < 8 < 12 • Everything under 8 is < 31, too • Everything under 99 is > 31, too 31 8 99 55 42 4 12 11 18 15
BST Nodes • BST contains data and two links // for a Set<E> private class BSTNode { E data; BSTNode left; BSTNode right; } // for a Map<K, V> private class BSTNode { K key; V value; BSTNode left; BSTNode right; }
Set and Map Implementations • Root is a BSTNode • also need a comparator • and maybe a count // for Set<E> private BSTNode root; private Comparator<E> comp; private intnumInTree; • root starts as null, numInTree as 0 • comp may be the natural order comparator (o1, o2) -> o1.compareTo(o2)
Finding an Element in a BST • Want to know if 6 is in the tree • How to see? • 6 < 7, so if it’s there, it mustbe in the left sub-tree • 6 > 2, so look right • 6 > 3, look right again • There it is 7 2 12 3 1 6 5 4
Finding an Element in a BST • Want to know if 8 is in the tree • 8 > 7, so look right • 8 < 12, so look left • Nothing there • 8 must not be in the tree 7 2 12 3 1 6 5 4
BST contains Method • Like a binary search for item public boolean contains(E item) { BSTNode cur = root; while (cur != null) { int c = comp.compare(item, cur.data); if (c < 0) { cur = cur.left; } // item < root else if (c > 0) { cur = cur.right; } // item > root else { return true; } // item == root } return false; // item not found }
9 Inserting Into a BST • Newly inserted node must appearin exactly the right position • If it is bigger than the root: • it must go in the right sub-tree • If smaller, into the left sub-tree • Where in the sub-tree? Recur. • for example, insert 9 7 2 12 1 3 6 5 4
To Insert into a BST • If the root is null • create the new node here • If the item to insert is less than the root • insert into the left sub-tree • If the item to insert is more than the root • insert into the right sub-tree • Otherwise – ignore duplicate item • return false Sets do not allow duplicate elements
BST insert Method • Returns true if inserted, false otherwise • recall: no duplicates allowed public boolean insert(E anItem) { intoldCount = numInTree; root = insertNode(anItem, root); return numInTree > oldCount; } • insertNode creates the new Node (if necessary), places it in the tree, and updates numInTree
BST insert Method • Insert new node, return the root of subtree private BSTNodeinsertNode(E anItem, BSTNode cur) { if (cur == null) { ++numInTree; return new BSTNode(anItem); } int c = comp.compare(anItem, cur.data); if (c < 0) { cur.left = insertNode(anItem, cur.left); } else if (c > 0) { cur.right = insertNode(anItem, cur.right); } return cur; }
Deleting from a BST • Easy if the node is a leaf: • just delete it (for example, 4) 7 2 12 1 3 9 6 5 4
Deleting from a BST • Easy if the node is a leaf: • just delete it 7 2 12 • If it has only one child • we can re-attach the child to the deleted node’s parent (for example, 3) 1 3 9 6 5
Deleting from a BST • Easy if the node is a leaf: • just delete it • If it has only one child • we can re-attach the child to the deleted node’s parent 7 2 12 1 9 6 • But what if it has two children? • delete the 2, for example • now what? 5
6 6 5 1 1 5 1 1 6 6 5 1 5 5 6 Deleting from a BST • Need to keep the BST property • After 2 is deleted there willjust be the 1, 5 and 6 7 2 12 1 9 6 5
Deleting from a BST • Find the minimum value in theright sub-tree • or the maximum in the left • Copy its value into the root • root was going to be deleted anyway • Delete the node you copied from • it’ll have at most one child 7 5 2 12 1 9 6 5
Exercise • Show the BST that results whenyou delete 5 from this tree • (use “min on right” rule) • Show the BST that resultswhen you delete 7 from this tree • use “min on right” rule 7 5 12 6 1 9 11
BST delete Method • Returns true if deleted, false otherwise public boolean delete(E anItem) { intoldCount = numInTree; root = deleteNode(anItem, root); return numInTree < oldCount; } • Exercise: write deleteNode • hint: it’s very similar to insertNode
Complexity • Average complexity: • on average tree is pretty balanced • contains, insert and delete all O(log N) • Worst case complexity: • insert data in order one long list • contains, insert and delete all O(N) • There are ways to balance trees • Java uses Red-Black trees
Hash Table Operations • Hash tables are designed to make insertion and finding particular elements fast • both in O(1) average time • deletion also supported in O(1) average time • Other operations expensive or impossible • no way to find the minimum/maximum except by looking at every element • no information on order used at all
Simple-Minded Structure • Simplest way to get O(1) is to have an array with one cell per possible data element • Example: keys drawn from 0..19 • A is an array indexed by 0..19 • A[i] is null if the item with key i is not present • A[i] is the data element if it is present
Simple-Minded Structure • Insertion = create new item & set A[i] • Find = look at A[i] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * / 01 Smith Brian 07 Wilson Debra 14 Chan Paulina 17 Lafleur Denis 18 Burns Monty
Large Key Spaces • Key spaces can be very large • key is a 20 character string • coded as ASCII 12820 = 2140 1042 • Impossible to get that much space • Generally only have a few keys • don’t need that much space • few thousands or millions 107
Hash Table Idea • Make the array much smaller • maybe about twice the number of items you’re expecting • Divvy up the keys between the array cells • function from keys to array indices: “hashing” • try to keep them spread out = avoid “collisions”
Hashing • Keys in range 1..100 • take key mod 20 as index into the array 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * / 41 Smith Brian 94 Chan Paulina 78 Burns Monty 27 Wilson Debra 37 Lafleur Denis 41 mod 20 = 1 94 mod 20 = 14 78 mod 20 = 18 27 mod 20 = 7 37 mod 20 = 17
Note on Hash Table Size • Want HT size to be a prime number • will explain why later • 20 not a good size for a hash table • 19 or 23 would be better • Will use 20 anyway in these notes • easier to do the math! • computer doesn’t have to worry about that….
Note on Hashing Function • Usually broken into two parts • Hash function proper translates key to integer • string number • number other number • Result mapped into table using mod (%) • For now just keep it really simple • just using positive integer values and mod
Hashing Exercise • Place the following keys into the hash table • 42, 17, 81, 20, 73 • Hash function = key mod 20 • h(42) = 42 % 20 = 2 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / / / / / / / / / / / / / / / / / / / /
Collisions • Collision happens when two items hash to same array location 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * / 41 27 94 37 78 58 Item 58 wants to go in position 18 item 78 already using it “collision”
Collision Frequency • Take a 1000 cell table & good hash function • each cell is equally likely to be hashed to • Chance of collision: • 1st item: 0/1000 • 2nd item: 1/1000 • 3rd item: 2/1000 • … • Nth item: (N–1)/1000
Collision Frequency • What’s the chance of at least one collision? • 1 entry = 1 – (1000/1000) = 0 • 2 entries = 1 – (1000/1000)*(999/1000) = 1/1000 • 3 entries = 1 – (1000/1000)*(999/1000)*(998/1000) 3/1000 • 4 entries 6/1000 • 5 entries (table 0.5% full) 10/1000 = 1% • 100 entries (table 10% full)? about 99.4% (!)
Dealing with Collisions • Two ways of dealing with the problem • Open addressing • put the new item somewhere else • need to figure out where • Separate chaining • make a chain of all the elements that hash to a given location
Open Addressing • Put item somewhere else in the array • Need to be able to find it again • need some rules that can be followed • Stupid rules make things very bad! • Clever rules perform very well • clever rules are hard to describe/understand • Java uses open addressing with clever rules
41 / 27 / 94 / 37 58 / / 78 * Chained Hash Tables • Array elements are linked lists instead of data element pointers • data stored in nodes 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * /