Chapter 11: File System Implementation • Chapter 11.1 • File-System Structure • File-System Implementation • Directory Implementation • Chapter 11.2 • Allocation Methods • Chapter 11.3 • Free-Space Management • Recovery • Log-Structured File System
11.4 Allocation Methods • An allocation method refers to how disk blocks are arranged that store file data (records). • There are three primary approaches: • Contiguous allocation • Linked allocation • Indexed allocation
Contiguous Allocation of Disk Space • Each file occupies a set of contiguousblocks on the disk • Blocks occupy a linear ordering, and disk head movements (a disk seek), are only to next sectors on track or to the next track within cylinder, etc. • Number of disk seeks is therefore minimal since all blocks are kept together. • Directory entry typically has address of firstblock and the number of blocks only. • This is all that is needed. • File access is very straightforward. • For sequential access, the file system keeps track of the last block referenced and can readily read the next block (see FCB format). • For random access to some specific block, given that we want block i and we typically start at block b, we can go very quickly to block b + i. • Biggest problem: file growth. • Is totally new space required or other mechanism? Ahead. Extents may help, but still a significant problem… • Let’s look and see what a file might look like…
Contiguous Allocation of Disk Space - Visual Can easily see starting block number and number of blocks for each file. See ‘count’ starts at 0 on the disk. ‘Mail’ starts at block 19 for six blocks. All allocations are contiguous! Note: there are holes! This issimplistic, however.
Contiguous Allocation of Disk Space • Finding Space – allocation schemes: • Both first fit and best fit work pretty well, with first fit generally a bit better. • (We will see how the system keeps track of available blocks ahead…) • Worst fit is undesirable in terms of time and storage utilization. • All contiguous allocation schemes have external fragmentation issues. • Could be a major or minor problem in managing an overall disk resource. • Down Side. Generally all installations have a downtime during low system usage where the disk can be compacted and external fragments brought together during a disk compaction activity. • Can be done off-line – generally best. Users get a ‘warning’ of imposing ‘non-availability’ like at 3am, etc. • Save your files, the system will not be available for a while. • Disks can be ‘reorganized’ and garbage collected… • We have ‘periodic maintenance’ and ‘system saves’ and compaction…… • More later…
Extent-Based Systems • How much space is needed for the file? Oftentimes we do not know! • Lots of times, files cannot be extended ‘in place.’ So, what to do? • Can take systemoffline, allocate more space; move the data, and then restart the system • Very costly in run time. • We often overestimate required space – can be very wasteful, especially if all the ‘required’ newly requested space is really not used / needed. • Can find a totally larger space, copy the file into the new space and release old space. • But this involves down time, possibly rerunning a process, and other management considerations. • Some systems use extent-basedfilesystems and they allocatediskblocks in extents • An extent is a contiguous block of disks • A file consists of a basic allocation plus oneormoreextents. • IBM uses a SPACE parameter: A process requests an originalallocation of say 10 tracks and 2 possible extents of one track each. Ten are allocated and two are held in reserve and used if needed. • Extents are ‘linked in’ as needed.
Linked Allocation of Disk Space • Here, in linked allocation, we no longer have problems with contiguous allocation scheme. • Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk. • Directory will point to the first block, and each block points to thenext block. (of course, links take some of the space in the block) • For a New file: create a new entry in the directory – no final size is needed. • Pointer is set to null and each request requires the space management system to finda block and link it in. • No external fragmentation, and file can grow. • Disk need not be compacted due to this kind of allocation. • Major Disadvantage: Cannot be used for random access – only sequential access. • We must follow the pointers until we find the desired block. • Not efficient if we need a direct-access capability. • Also pointers do take up some space, if one adds them up!
Linked Allocation of Disk Space - Clusters • Lots of times clusters of blocks are allocated. • If so, the pointers will occupy much less space, and efficiency is improved because the cluster of blocks are located in contiguous locations. • But, of course, this means there’s a possibility of external fragmentation. • Clusters are nevertheless used in most systems. • There are a lot of inherent dangers is present in a linked allocation: dropping a pointer. • Could link into a protected area • Could link into some other file • Could simply lose your data!!! • PotentialSolution - often used: have a doubly-linked list • Potential Solution2 – store the file name and relative block number in each block – but this requires more space! • And these links add up! • So there are issues with linked allocation. • Let’s see what linked allocation looks like….
Linked Allocation - Visual Note: Starting location only is stored in the directory. All else is linked! Why might you think that in addition to the starting link, only the last link is stored in directory??
Linked Allocation with File Allocation Table. • Many disks use a FAT (File Allocation Table), which is a data structure on disk and located at the beginning of each volume. • The directory has one entry per file, and this entry points into the FAT for a particular file reference. • (The FAT is indexed by block number) • The FAT entry contains the address of the ‘next’ block in the file for random access. • Final block in the table has a special end of file mark. (See next slide) • Remember: linked allocation only permits sequential access! • Unused blocks in the FAT have a 0 table value. • When more space is needed for the linked file, the file management system finds an available block (value 0 in the FAT) and moves that block number to the previous block’s EOF value. (simply a singly-linked list…) • Downslide: This scheme may result in a lot of disk head movement, which definitely slows things down. • Solution: Cache the FAT for sure. • Advantage: random-access is greatly improved because any block can be accessed via the FAT access, particularly if the FAT is in cache, if we know the block number.
File-Allocation Table - Visual Indexed by block number.
Indexed Allocation of Disk Space • In linked allocation, we • don’t have the external fragmentation problem and we • don’t have the size declaration problem, but • we also do not have direct access capability without the FATbecause the pointers to the blocks are within the blocks and hence must be retrieved. • Indexed Allocationbrings all pointers (links) together into the index block. • Each file has its own index built as an array of block addresses. • To access a block, we use the index, • search the index for a hit, and • hit (if present) will point to the disk location for that block.
Indexed Allocation of Disk Space • Indexed allocation supports direct access w/no external fragmentation. • Any free block will suffice when a block needs to be added to the file. • Pointer overhead is more than linked allocation because we actually have a separatefile: the index. • This indexitself will occupy at least one block of disk storage. (Of course, it can be cached during use – and generally is.) • So how large should the index block be? • Want it to be small, since every indexed file will have one, but we want a sufficient number of entries to support large file access. • Want it to be large? Might need to link several index blocks. • Several implementations of this, as we shall see.
Example of Indexed Allocation - Visual Shows recods in block 19 as well as unused space…
Structure of the Index Block • Linked Scheme: usually one-block long, but we can link blocks (that is, several ‘indices’) for particularly large files. (very large files.) • Multilevel index: First index block may only be a set of pointers to a second level index block. These in turn point to the data blocks. • IBM uses this organization forits indexed sequential files, which it calls Key Sequenced Data Sets (KSDS). • It calls the outermost block the index set, followed by the sequence set followed by the data themselves organized into what they call control areas and control intervals… • Note: a two-level index would allow a file size of up to 4GB (with 4K blocks). • Combined Scheme: (used by Unix) keeps the first set of pointers of the index block in the file’s inode • This scheme involves a number of direct and indirect blocks and we will not spend time on this one.
Indexed Allocation – Mapping (Cont.) outer-index file index table General mappings with multiple indices Some systems have ‘coarse indices followed by ‘fine’ indices, etc….
INDEX COMPONENT … INDEX SET SEQUENCE SET . . . DATA COMPONENT CONTROL INTERVALS . . . CONTROL AREA CONTROL AREA CONTROL AREA
KEY VALUES EXTREMELY EXAGGERATED!! I1 I2 INDEX SET 62 S2 FREE 9/S1 SEQUENCE SETS S1 S2 S3 3 D1 9 D2 36 D3 62 D4 FREE FREE D1 D2 D3 D4 1 3 FREE 5 9 FREE 35 36 FREE 42 43 62 FREE CONTROL INTERVALS CONTROL INTERVALS CONTROL AREAS
Performance • Choice of an allocation methods is largely dependent upon how the data needs to be accessed. • Contiguous Allocation – requires only one access to get to the data block. • Keep initial address in memory and calculate disk addresses from there. • Linked Allocation – keep the address of the next block in memory and can read it directly. • Major disadvantage – no random access, and access to a specific block might well require multiple reads to get ‘to’ that record. • Some systems that require direct access use a contiguous allocation scheme and linked allocation for sequential access. • These accesses must be declared when the file is created. • Sequential files will be linked • Direct access files will be contiguous and can support both direct access and sequential access, such as indexed sequential file organizations.
Performance - 2 • Indexed Allocation – If index is in memory, accesses are quick. • Retaining the index in memory does require space; but often in cashe. • If space is available, then this is good. • If space is not available, then the index and the data require two I/Os – and this is not desirable. • For multiple index blocks, more reads might be needed. • Performance using indexed allocation depends on the index structure, the size of the file, and the position of the block desired. • Caching the index file(s) is significantly helpful if space is available. • There are a number of other approaches at optimization. Your book cites that oftentimes it is not unreasonable to add thousands of extra instructions to the operating system to save just a few disk-head movements. • “Furthermore, this disparity is increasing over time, to the point where hundreds of thousands of instructions reasonably could be used to optimize head movements.” Discuss.