Symbol Tables and Hashing: Review and Implementation Details

Data Structures and Algorithms I Day 19, 11/8/11 Review Chapters 3 and 4 CMP 338

Symbol Table • A symbol table is a mapping from Key's to Value's • Conventions: • (At most) one Value per Key • null is not a legal Key • null is not a legal Value • Key's should be immutable • Shorthand implementations: • delete(Key k) { put(k, null); } • contains(Key k) { return get(k) != null; } • IsEmpty() { return size() == 0 }

Symbol Table API • public class SymbolTable<Key, Value> • SymbolTable() • void put(Key k, Value v) • Value get(Key k) • void delete(Key k) • boolean contains(Key k) • boolean isEmpty() • int size() • Iterable<Key> keys()

Sequential Search Implementation • public Value get(Key k) • for (Node n=first; n!=null; n=next) • if (k.equals(n.key)) • return n.val; • return null; • public void put(Key k, Value v) • for (Node n=first; n!=null; n=next) • if (k.equals(n.key)) • n.val = v; return • first = new Node(k, v, first)

Hash Table Implementations • Hash function: hash() • Maps Key's to small int's • Java hashCode() maps Object's to 32-bit ints • Collision resolution: • Strategy for handling two Key's mapped to the same int • Closed-addressing (e.g., separate chaining) • Array entries point to secondary symbol table • Open-addressing (e.g. linear probing) • All Key-Value pairs stored in the same array

Hash Functions • Uniform hashing assumption: • hash: Key → 0..M-1 uniform and independent • Implementing hashCode() for user-defined types • Combine hashCodes of each field (array entry) • Start with a small prime (e.g., 17) • Multiply accummulating hash by small prime (e.g. 31) • Add hashCode() of next field (or array entry) • Box primitive values (e.g., ((Integer) 14),hashCode()) • Requirement: x.equals(y) => x.hashCode()==y.hashCode() • Hash function: hash(Key k) • return k.hashCode() && 0x7FFFFFFF % M;

Separate Chaining Hash Table • SeparateChainingHashTable(int size) • M = size; • for int i=0; i<M; i++ • st[i] = new SequentialSearchST() • public Value get(Key k) • return (Value) st[hash(k)].get(k) • public void put(Key k, Value v) • st[hash(k)].put(k, v) • private int hash(Key k) • return k.hashCode() & 0x7FFFFFFF % M

Linear Probing Hash Table • public Value get(Key k) • for int i=hash(k); null!=key[i]; i=i+1 % M • if (keys[i] == k) return vals[i] • return null • public void put(key k, Value v) • int i • for int i=hash(k); null!=key[i]; i=i+1 % M • if (keys[i] == k) • vals[i] = v; return • keys[i]=k; vals[i]=v

Separate Chaining vs. Linear Probing • Separate Chaining • Easier to implement • Performance degrades gracefully • Clustering less sensitive to poor hash() • Linear probing • Wastes less space • However, need to implement array resizing • Better cache performance

Hashing vs. Balanced Search Trees • Hashing • Simpler to code • No effective alternative for unordered keys • Faster (assuming efficient hash function) • Better system support for Java Strings • Balanced search trees • Stronger performance guarantees • Support for ordered operations • Easier to implement compareTo correctly • Than equals() and hashCode()

Java Symbol Tables • Map<K, V> Interface • TreeMap<K, V> implements SortedMap<K, V> • O(lg N) order operations (worst-case) • HashMap<K, V> implements Map<K, V> • O(1) put() and get() operations (average-case) • Set<K> Interface • TreeSet<K> implements SortedSet<K> • HashSet<K> implements Set<K>

Graphs (Mathematics) (Directed) Graph <V, E> V is a set of vertices E is a set of edges E  V x V DAG Directed Acyclic Graph Undirected Graph E is symmetric Edge-Weighted Graph weight: E → R

Graph Vocabulary A path is a sequence edges connecting vertices simple path: no vertex appears twice A cycle is a path from a vertex to itself simple cycle: removing final edge leaves a simple path A connected component (undirected graph): A maximal set of connected vertices A strongly connected component (directed graph): A maximal set of vertices such that there is a directed path from any vertex to any other vertex

Depth-First Search Each node visited exactly once. Visit each neighbor during visit to a node void visit (Node n) if (visited(n)) return; mark n visited do stuff for each neighbor m of n visit(m) maybe do more stuff Trace: (1(2(3)(4(5)(6))(7))(8(9)(A(B)(C))))(D(E)(F)) Example: ConnectedComponent.java

Breadth-First Search Each node visited exactly once. Schedule visit to each neighbor during visit to a node void visit (Node n) if (visited(n)) return; mark n visited do stuff for each neighbor m of n put m on queue of nodes to visit maybe do more stuff Trace: (1)(2)(3)(6)(7)(8)(4)(5)(9)(A)(C)(B)(D)(E)(F) Example: ShortestPath.java

Spanning Tree (Undirected Graph) A tree in an undirected graph: A set of connected edges not containing a cycle A spanning tree or an undirected graph: A tree that connects each vertex of the graph A spanning forest of an undirected graph: Set of spanning trees of the connected components A minimum spanning tree (MST) of a weighted graph The spanning tree with minimum total weight

Prim's MST Algorithm mark any node while exists an edge from marked to unmarked pick the shortest such edge add the edge to the MST mark the unmarked vertex Use priority queue to keep track of the edges Optimization: only 1 edge per unmarked node Need to be able to reduce a key in a priority queue Running-time: ||E|| + ||V|| lg ||V||

Kruskal's MST Algorithm while exists an unconsidered edge consider the shortest unconsidered edge if it would not create a cycle add the edge to MST Use priority queue to keep track of the edges Cycle detection: Disjoint Union / Find algorithm Edge will create a cycle iff both end-points in the same set Adding an edge to MST requires union of two sets Running-time: ||E|| lg ||E||

Shortest Path Algorithms Shortest paths in edge-weighted directed graphs Problem ill-formed if any negative cycle is reachable If graph is a DAG Relax nodes in topological order O(||E|| + ||V||) If all edges are non-negative (Dijkstra) Mark and relax nearest unmarked node O(||E||+||V|| lg ||V||) General edge-weighted directed graphs (Bellman-Ford) Repeat up to ||V|| times: Relax nodes changed in previous iteration O(||E|| ||V||)

Relaxation void relax (Node n) for Node m in edgeFrom(n) relax(n, m) void relax (Node n, Node m) If dist(s, n) + w(n, m) < dist(s, m) dist(s, m) = dist(s, n) + w(n, m) parent(m) = n dist is a Map<Node, Double> parent is a Map<Node, Node>

Dijkstra's Shortest Path Algorithm mark the source node while exists an edge from marked to unmarked pick closest unmarked node n to source pick shortest edge from marked to n add edge to Shortest-Path tree mark and relax n Use priority queue to order unmarked nodes Running-time: ||E|| + ||V|| lg ||V||

Symbol Tables and Hashing: Review and Implementation Details

Symbol Tables and Hashing: Review and Implementation Details

Presentation Transcript

Connected Math (CMP)

CMP Design Choices

CMP Transparency Requirements

CMP

CMP Interop Project

CMP Presentation

CIS 338: Debugging

CMP 338

CIS 338: Printing

CMP at Tufts

338

CMP Information Session

COS 338

CMP Instruction

CIS 338: Debugging

NAVSUP Form 338

CMP 338

CMP 338

CMP Slurry Market