1 / 44

Object Oriented Data Structures

Object Oriented Data Structures. Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing. What is an INDEX?. An index lets you impose order on a file without actually rearranging the file.

ull
Télécharger la présentation

Object Oriented Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Oriented Data Structures Tables and Information Retrieval Rectangular TablesTables of Various ShapesRadix SortHashing Kruse/Ryba ch09

  2. What is an INDEX? • An index lets you impose order on a file without actually rearranging the file. • An index gives keyed access to fixed or variable-length record files. Kruse/Ryba ch09

  3. Simple Index A simple index uses a simple array to implement the index. Called by IBM ISAM (Indexed Sequential Access Method) Kruse/Ryba ch09

  4. Indexfile Datafile Actual data record Key Reference 32 77 132 167 211 256 300 353 396 442 ANG3795 167 LON|2312|Romeo and Juliet|... COL31809 353 RCA|2626|Quartet in C Sharp... COL38358 211 WAR|23699|Topuchstone|... DG139201 396 ANG|3795|Symphony No. 9|... DG18807 256 COL|38358|Nebraska|... FF245 442 DG|18807|Symphony No. 9|... LON2312 32 MER|75016|Coq d'or Suite|... MER75016 300 COL|31809|Symphony No. 9|... RCA2626 77 DG|139201|Violin Concerto|... WAR23699 132 FF|245|Good News|... Kruse/Ryba ch09

  5. Concerns • Two files to deal with • Index file easier to deal with than data file because it has fixed-length records • Fixed-length fields impose limits on size of keys • In the example, the index carries no information other than the keys and the reference fields. Other data could be included. (length) Kruse/Ryba ch09

  6. Basic Operations • Create the original empty index and data files. • Load the index file into memory before using it. • Rewrite the index file from memory after using it. • Add records to the data file and index. • Delete records from the data file. • Update records in the data file. Kruse/Ryba ch09

  7. Creating the Files Create both the index and data files as empty files. Write headers to both files. Kruse/Ryba ch09

  8. Loading the Index into Memory Assume that the index file is small enough to fit into RAM. Each array element is an index record. Kruse/Ryba ch09

  9. Safety Mechanisms • Know when the index is out of date. • Be able to reconstruct the index from the data file. Kruse/Ryba ch09

  10. Record Addition Adding a new record to the data file requires that we also add a record to the index file. Kruse/Ryba ch09

  11. Indexfile Datafile Actual data record Key Reference 32 77 132 167 211 256 300 353 396 442 486 ANG3795 167 LON|2312|Romeo and Juliet|... COL31809 353 RCA|2626|Quartet in C Sharp... COL38358 211 WAR|23699|Topuchstone|... DG139201 396 ANG|3795|Symphony No. 9|... DG18807 256 COL|38358|Nebraska|... FF245 442 DG|18807|Symphony No. 9|... LON2312 32 MER|75016|Coq d'or Suite|... LON783 486 MER75016 300 COL|31809|Symphony No. 9|... MER75016 300 DG|139201|Violin Concerto|... RCA2626 77 RCA2626 77 FF|245|Good News|... WAR23699 132 LON|783|Sweet Somthings|... Kruse/Ryba ch09

  12. Record Deletion Any of the methods discussed in chapter 5 could be used. However, the index file must now be considered. The index entry could be removed and the array adjusted or the index entry could just be marked as deleted. Kruse/Ryba ch09

  13. Record Updating • Updating changes the key field • conceptually, this is best thought of as a deletion followed by an addition • Updating does not change a key field • this will not cause any changes in the index file but could well cause changes in the data file if the size of the record changes. Kruse/Ryba ch09

  14. Indexes too large to fit in RAM Essentially, the later text material deals with this problem. Hashed Organization Tree-structures Kruse/Ryba ch09

  15. Access by Multiple Keys BEETHOVEN ANG3795 BEETHOVEN DG139201 BEETHOVEN DG18807 Secondary key organized by composer BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Kruse/Ryba ch09

  16. Record Addition Additional indices imply additional overhead when new records are added. Kruse/Ryba ch09

  17. Record Deletion This usually implies removing all references to that record in the file system. Since the primary index does reflect a deletion, a request from a secondary index will result in a failure, implying the record has been deleted. Such a method would result in wasted space in the secondary index. Kruse/Ryba ch09

  18. Record Updating • If the update changes the secondary key • it may be necessary to rearrange the secondary key index so it stays in sorted order • If the update changes the primary key • this creates a major impact on secondary indices • If the update is confined to other fields. • Updates that do not affect either the primary or secondary key fields do not affect the secondary key index. Kruse/Ryba ch09

  19. Access by Multiple Keys COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 Secondary key organized by recording title QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG139201 Kruse/Ryba ch09

  20. Access by Multiple Keys Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9 COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG139201 Kruse/Ryba ch09

  21. Access by Multiple Keys Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9 COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG139201 Kruse/Ryba ch09

  22. Access by Multiple Keys Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9 BEETHOVEN ANG3795 BEETHOVEN DG139201 BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Kruse/Ryba ch09

  23. Access by Multiple Keys Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9 BEETHOVEN ANG3795 BEETHOVEN DG139201 BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Kruse/Ryba ch09

  24. Indexfile Datafile Actual data record Key Reference 32 77 132 167 211 256 300 353 396 442 ANG3795 167 LON|2312|Romeo and Juliet|... COL31809 353 RCA|2626|Quartet in C Sharp... COL38358 211 WAR|23699|Topuchstone|... DG139201 396 ANG|3795|Symphony No. 9|... DG18807 256 COL|38358|Nebraska|... FF245 442 DG|18807|Symphony No. 9|... LON2312 32 MER|75016|Coq d'or Suite|... MER75016 300 COL|31809|Symphony No. 9|... RCA2626 77 DG|139201|Violin Concerto|... WAR23699 132 FF|245|Good News|... Kruse/Ryba ch09 LOGICAL AND

  25. Problems • We have to rearrange the index file every time a new record is added to the file, even if the new record is from an existing secondary key. Kruse/Ryba ch09

  26. A Better Solution: Linking the List of References Inverted lists work their way backward from a secondary key to the primary key to the record itself. Kruse/Ryba ch09

  27. BEETHOVEN ANG3795 DG139201 COREA DG18807 DVORAK RCA2626 PROKOFIEV WAR23699 COL31809 LON2312 Kruse/Ryba ch09

  28. BEETHOVEN ANG3795 DG139201 COREA DG18807 DVORAK RCA2626 PROKOFIEV WAR23699 COL31809 Might create a large numberof small files, one for eachcomposer. LON2312 Kruse/Ryba ch09

  29. Improved Version Redefine the secondary key index so it consists of records with two fields - a secondary key field, and a field containing the relative record number of the first corresponding primary key reference in the inverted list. The actual primary key references associated with each secondary key would be stored in a separate entry-sequenced file. Kruse/Ryba ch09

  30. 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 BEETHOVEN 3 -1 LON2312 COREA 2 RCA2626 -1 DVORAK 7 WAR23699 -1 PROKOFIEV 10 ANG2795 8 RIMSKY-KORSAKOV 6 COL38358 -1 SPRINGSTEEN 4 DG18807 -1 SWEET HONEY IN 9 MER75016 -1 Secondary IndexFile COL31809 -1 DG139201 5 FF245 -1 ANG3193 0 Kruse/Ryba ch09 Lable ID List File

  31. Hash Functions • Truncation • Ignore part, use the rest for key • Folding • Partition and combine • Modular Arithmetic • Perfect Hash Function Kruse/Ryba ch09

  32. C++ Example • int hash(const Key &target){int value = 0;for (int position = 0; position < 8; position++) value = 4 * value + target.key_letter(position);return value % hash_size;} Kruse/Ryba ch09

  33. 9 8 6 5 4 7 2 1 0 3 24 12 25 13 26 14 27 15 28 16 29 17 30 18 31 19 32 20 33 21 34 10 22 35 11 23 Kruse/Ryba ch09

  34. Collision Resolution • Linear Probing • Clustering • Rehashing • Increment Functions • Quadratic Probing • h+i2 • Key-Dependent Increments • Increment = (int)the_data.key_letter(0); • Random Probing Kruse/Ryba ch09

  35. Error_code Hash_table::insert(const Record &new_entry){ Error_code result = success;int probe_count, // be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed Key null; // Null key for comparison purposes. null.make_blank(); probe = hash(new_entry); probe_count = 0; increment = 1;while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1) / 2) {// Has overflow occurred? probe_count++; probe = (probe + increment) % hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry;else if(table[probe] == new_entry) result = duplicate_error;else result = overflow; // The table is full.return result;} Kruse/Ryba ch09

  36. 0 1 2 Collision Resolution with Buckets Kruse/Ryba ch09

  37. Collision Resolution by Chaining Kruse/Ryba ch09

  38. Collision Resolution by Chaining • Advantages • Saving of space • Simple, efficient collision handling • Size of hash table does not need to exceed the number of records • Deletion becomes quick and easy • Disadvantage • Links require space Kruse/Ryba ch09

  39. Theoretical Comparison Load factor 0.10 0.50 0.80 0.90 0.99 1.00 Successful search, expected number of probes: Chaining 1.05 1.25 1.40 1.45 1.50 2.00 Open, random probes 1.05 1.40 2.0 2.6 4.6 ----- Open, linear probes 1.06 1.50 3.0 5.5 50.5 ------- Kruse/Ryba ch09

  40. Theoretical Comparison Load factor 0.10 0.50 0.80 0.90 0.99 2.00 Unsuccessful search, expected number of probes: Chaining 0.10 0.50 0.80 0.90 0.99 2.00 Open, random probes 1.1 2.00 5.0 10.0 100 ----- Open, linear probes 1.12 2.50 13. 50. 5000 ------- Kruse/Ryba ch09

  41. Empirical Comparison Load factor 0.10 0.50 0.80 0.90 0.99 2.00 Successful search, expected number of probes: Chaining 1.04 1.2 1.4 1.4 1.59 2.00 Open, quadratic probes 1.04 1.50 2.1 2.7 5.2 ----- Open, linear probes 1.05 1.60 3.4. 6.2 21.3 ------- Kruse/Ryba ch09

  42. Empirical Comparison Load factor 0.10 0.50 0.80 0.90 0.99 2.00 Unsuccessful search, expected number of probes: Chaining 0.10 0.50 0.80 0.90 0.99 2.00 Open, quadratic probes 1.13 2.20 5.2 11.9 126. ----- Open, linear probes 1.13 2.70 15.4. 59.8 430. ------- Kruse/Ryba ch09

  43. Highlights Kruse/Ryba ch09

  44. Chapter 9 - The End Kruse/Ryba ch09

More Related