1 / 32

Lecture 21: Indexed Files

CSC 213 – Large Scale Programming. Lecture 21: Indexed Files. Today’s Goals. Look at how Dictionary s used in real world Where this would occur & why they are used there In real world setting, what problems can/do occur Indexed file usage presented and shown

beryl
Télécharger la présentation

Lecture 21: Indexed Files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 213 – Large Scale Programming Lecture 21:Indexed Files

  2. Today’s Goals • Look at how Dictionarys used in real world • Where this would occur & why they are used there • In real world setting, what problems can/do occur • Indexed file usage presented and shown • How & why we split index & data files • Formatting of each file and how they get used • Describe what problems solved using indexed files • Java coding techniques that simplify using these files • Idea needed when using multiple indexes shown

  3. Dictionaries in Real World • Often need large database on many machines • Split search terms across machines • Updating & searching work split between machines • Database way too large for any single machine • If you think about it, this is incredibly common • Where?

  4. Split Dictionaries

  5. Split Dictionaries

  6. Splitting Keys From Values • In real world, we often have many indices • Simple units measure where we can find values • Values could be searched for in multiple ways

  7. Splitting Keys From Values • In real world, we often have many indices • Simple units measure where we can find values • Values could be searched for in multiple ways

  8. Index & Data Files • Split information into two (or more) files • Data file uses fixed-size records to store data • Index files contain search terms & data locations • Fixed-size records usually used in data file • Each record will use exactly that much space • Extra space wasted if the value is smaller • But limits data size, cannot get more space • Makes it far easier to reuse space & rebuild index

  9. Index File Format • No standard format – depends on type of data • Often variable sized, but this not specific requirement • Each entry in index file begins with exact search term • Followed by position containing matching data • As a result, often find indexes smushed together • Can read indexes at start of program execution • Reasonably assumes index file smaller than data file • Changes written immediately, however • When program starts, do NOT read data file

  10. Never Read Entire Data File

  11. Indexed Files • Enables splitting search terms across computers • Alphabetical split searches faster on many servers U-X Y-Z A - C S-T D-E Q-R F-H I-P

  12. Indexed Files • Enables splitting search terms across computers • Create indexes for different types of searching Song name Song Length

  13. How Does This Work? • Using index files simplified using positions • Look in index structure to find position of data in file • With this position can then seek to specific record • Create instance & initialize by reading data from file

  14. Starting with Indexed Files IBM 106 IBM AT & T 23 T Ford 2 F

  15. Where Was "Searching" Used? • Indexed files used in Maps and Dictionarys • Read data into searchable object after opening file • For each record, Entryuses indexed data as its key • Single data file has multiple indexes to search it • Not a problem, each index has own Collection • Cannot have multiple instances for each data item • Cannot have single instance for each data item • Then how can we construct each Entry's value?

  16. Proxy Pattern For The Win!

  17. Proxy Pattern For The Win! • Create proxy instances to use as Entry's value • Proxy pretends has data by defining getters & setters • Data's position & file only fields these objects have • Whenever method called looks up & returns data • Other classes will think proxy has fields declared • Simplifies using class & ensures up-to-date data used • But little memory needed, since data resides on disk!

  18. Starting with Indexed Files IBM 106 IBM AT & T 23 T Ford 12 F

  19. Coding public class Stock {private static final intNAME_OFF = 0;private static finalintNAME_SZ = 50;private static final intPRC_OFF=NAME_OFF + NAME_SZ;private static final intPRC_SZ = 4;private static final intTICK_OFF = PRC_OFF + PRC_SZ;private static final intTICK_SZ = 6;private static final intSIZE = TICK_OFF + TICK_SZ;private long position;private RandomAccessFiletheFile;public Stock(long pos, RandomAccessFile file) {position = pos;theFile = file;}

  20. Coding public class Stock {private static final intNAME_OFF = 0;private static final intNAME_SZ = 50;private static final intPRC_OFF=NAME_OFF + NAME_SZ;private static final intPRC_SZ=4;private static final intTICK_OFF= PRC_OFF +PRC_SZ;private static final intTICK_SZ= 6;private static finalintSIZE=TICK_OFF +TICK_SZ;private long position;private RandomAccessFiletheFile;public Stock(long pos, RandomAccessFile file) {position = pos;theFile= file;} Fixed max. sizeof each field Fixed size of a record in data file

  21. Coding public class Stock {private static final intNAME_OFF = 0;private static final intNAME_SZ = 50;private static final intPRC_OFF=NAME_OFF + NAME_SZ;private static final intPRC_SZ=4;private static final intTICK_OFF = PRC_OFF + PRC_SZ;private static final intTICK_SZ=6;private static final intSIZE=TICK_OFF+TICK_SZ;private long position;private RandomAccessFiletheFile;public Stock(long pos, RandomAccessFile file) {position = pos;theFile= file;} Offset in record to field start

  22. Coding public class Stock { // Continues from last timepublic intgetStockPrice() {theFile.seek(position + PRC_OFF); return theFile.readInt();}public void setStockPrice(int price) {theFile.seek(position + PRC_OFF); theFile.writeInt(price);}public void setTickerSymbol(String sym) {theFile.seek(position + TICK_OFFSET);theFile.writeUTF(sym);}// More getters & setters from here…

  23. Visualizing Indexed Files IBM 106 IBM AT & T 23 T Ford 12 F

  24. How Do We Add Data? • Adding new records takes only a few steps • Add space for record with setLength on data file • Update index structure(s) to include new record • Records in data file updated at each change

  25. Adding New Data To The Files IBM 106 IBM AT & T 23 T Ford 12 F 0 Ø

  26. Adding New Data To The Files IBM 106 IBM AT & T 23 T Ford 12 F Citibank -2 C

  27. How Does This Work? • Removing records even easier • To prevent using record, remove items from indexes • Do NOT update index file(s) until program completes • Use impossible magic numbers for record in data file

  28. Removing Data As We Go IBM 106 IBM AT & T 23 T Ford 12 F Citibank -2 C

  29. Removing Data As We Go IBM 106 IBM AT & T 23 T 0 Ø Citibank -2 C

  30. Using Multiple Indexes • Multiple indexes for data file very often needed • Provides many ways of searching for important data • Since file read individually could also create problem • Multiple proxy instances for data could be created • Duplicates of instance are created for each index • Makes removing them all difficult, since not linked • Very easy to solve: use Map while loading index • Converts positions in file to proxy instances to solve this

  31. Linking Multiple Indexes • Use one Map instance while reading all indexes • For each position in file, check if already in Map • Use existing proxy instance, if position already in Map • If a search in Mapreturns null, create new instance • Make sure to call put()when we must create proxy

  32. For Next Lecture • Angel now has week #9 assignment (due 3/20) • This is after break, but might want to get start now • Angel will also have project #2 available • Has staggered submissionslike previous project • Based upon index files, so can start working now! • Will discuss implementing space efficient BST • Start coloring nodesred&black • Keeps balanced, but limits amount of movement

More Related