1 / 24

Hash Tables and Constant Access Time

Hash Tables and Constant Access Time. CS-2303 System Programming Concepts (Slides include materials from The C Programming Language , 2 nd edition, by Kernighan and Ritchie and from C: How to Program , 5 th and 6 th editions, by Deitel and Deitel). New Challenge.

paley
Télécharger la présentation

Hash Tables and Constant Access Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hash Tables and Constant Access Time CS-2303System Programming Concepts (Slides include materials from The C Programming Language, 2nd edition, by Kernighan and Ritchie and from C: How to Program, 5th and 6th editions, by Deitel and Deitel) Hash Tables and Constant Access Time

  2. New Challenge • What if we require a data structure that has to be accessed by value in constanttime? • I.e., O(log n) is not good enough! • Need to be able to add or delete items • Total number of items unknown • But an approximate maximum might be known Hash Tables and Constant Access Time

  3. Examples • Anti-virus scanner • Symbol table of compiler • Virtual memory tables in operating system • Bank or credit card account for a person Hash Tables and Constant Access Time

  4. Example – Validate a Credit Card • Same is true for • ATM card numbers • Bank account numbers • Etc. • 16-digit credit card numbers • 1016 possible card numbers • Sparsely populated space • E.g., 108 MasterCard holders, similar for Visa • Not “random” enough for a binary tree • Too many single branches  really deep searches • Need to respond to customer in 1-2 seconds • 1000s or tens of1000s of customers persecond! Hash Tables and Constant Access Time

  5. Tens of Thousands! Example — Anti-Virus Scanner • Look at each sequence of bytes in a file • See if it matches against library of virus patterns • How many possible patterns? • If so, flag it as a possible problem Hash Tables and Constant Access Time

  6. Anti-Virus Scanner (continued) • Time to scan a file? • O(length)  O(# of patterns) • Can we do better? • Store patterns in a tree • O(length)  O(log (# of patterns)) • Can we do even better? • Yes — a Hash Table. Today’s topic. Hash Tables and Constant Access Time

  7. Requirement • In these applications (and many like them), need constant time access • I.e., O(1) • Need to access by value! Hash Tables and Constant Access Time

  8. Observation • Arrays provide constant time access … • … but you have to know which element you want! • We only know the contents of the item we want! • Also • Not easy to grow or shrink • Not open-ended • Can we do better? Hash Tables and Constant Access Time

  9. Also known as a hash function Definition – Hash Table • A data structure comprising an array • for constant time access • A set of linked lists • one list for each array element • A hashing function to convert search key to array index • a randomizing function to assure uniform distribution of values across array indices Hash Tables and Constant Access Time

  10. Definition – Search Key • A value stored as (part of) the payload of the item you are looking for • E.g., your credit card number • Your account number at Amazon • A pattern characteristic of a virus • Need to find the item containing that value (i.e., that key) Hash Tables and Constant Access Time

  11. Definition – Hash Function • A function that randomizes the search key it to produce an index into the array • Always returns the same value for the same key • So that non-random keys don’t concentrate around a subset of the indices in the array • See §6.6 in Kernighan & Ritchie Hash Tables and Constant Access Time

  12. item item item item item ... item item item item item data data data data data data data data data data data data data next next next next next next next next next next next next next The lists Hash Table Structure The array Hash Tables and Constant Access Time

  13. item item item item item ... item item item item item data data data data data data data data data data data data data next next next next next next next next next next next next next Hash Table Structure (continued) The array Average length of list should be in single digits Note that some of the lists are empty Hash Tables and Constant Access Time

  14. Guidelines for Hash Tables • Lists from each item should be short • I.e., with short search time (approximately constant) • Size of array should be based on expected # of entries • Err on large side if possible • Hashing function • Should “spread out” the values relatively uniformly • Multiplication and division by prime numbers usually works well Hash Tables and Constant Access Time

  15. Note prime numbers to “mix it up” Example Hashing Function • P. 144 of K & R #define HASHSIZE 101 unsigned int hash(char *s) {unsigned int hashval;for (hashval = 0; *s != '\0'; s++) hashval = *s + 31 * hashval; return hashval % HASHSIZE } Hash Tables and Constant Access Time

  16. Using a Hash Table struct item *lookup(char *s) {struct item *np;for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL; /* not found */ } Hash Tables and Constant Access Time

  17. Hash table is indexed by hash value of s Using a Hash Table struct item *lookup(char *s) {struct item *np;for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL; /* not found */ } Hash Tables and Constant Access Time

  18. Traverse the linked list to find item s Using a Hash Table struct item *lookup(char *s) {struct item *np;for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL; /* not found */ } Hash Tables and Constant Access Time

  19. Using a Hash Table (continued) struct item *addItem(char *s, …) {struct item *np;unsigned int hv;if ((np = lookup(s)) == NULL) { np = malloc(item); /* fill in s and data */ np -> next = hashtab[hv = hash(s)]; hashtab[hv] = np;}; return np; } Hash Tables and Constant Access Time

  20. Inserts new item at head of the list indexed by hash value Using a Hash Table (continued) struct item *addItem(char *s, …) {struct item *np;unsigned int hv;if ((np = lookup(s)) == NULL) { np = malloc(item); /* fill in s and data */ np -> next = hashtab[hv = hash(s)]; hashtab[hv] = np;}; return np; } Hash Tables and Constant Access Time

  21. Challenge • What kinds of situations in your field might you need a hash table? Hash Tables and Constant Access Time

  22. Example — Source Code Control System • System stores every version of every file since creation • Storage for one file comprises two parts: • Hash table of lines of the file • List of lines for each version of that file • Easy to reconstruct any version of the file • Easy to do an intelligent diff of two files I.e., each line that has everbeen part of that file! Hash Tables and Constant Access Time

  23. Hash Table Summary • Widely used for constant time access • Easy to build and maintain • There is an art and science regarding the choice of hashing functions • Consult textbooks, web, etc. Hash Tables and Constant Access Time

  24. Questions? Hash Tables and Constant Access Time

More Related