Topic 9 : File Systems

Topic 9 : File Systems L & E: Pages 92-113 Tanenbaum: Pages 17-20 & 401-434

File System Objectives • Provide persistent storage for programs and data files. • creation and deletion of files • reading and writing of files • allow reference to files by symbolic name • manage space available on secondary storage • protect files from unauthorised access and system failure • allow sharing of files

Basic Operations • fd = open (filename, mode) • fd = a file descriptor • mode = read or write • Read (fd, buffer) • Write (fd, buffer) • Seek (fd, position) • Close (fd)

Organising Files for the User • A flat file space is difficult for users. keith slides letters programs 1st year 2nd year Ada Pascal • Imposing a directory structure allows users to logically group files.

Sharing and Security • This is only an issue in multi-user operating systems or those which support remote access to files. • The owners of files must be able to specify the access rights for files. • Two main approaches are to use either a protection mask or access control lists.

Protection Masks • Users are classified as • owner (O) • members of a particular group (G) • others (W) • Access privileges are classified as • read access (R) • write access (W) • execute access (E) • delete access (D)

Protection Masks ... contd. • Owner allocates a mask to a file, e.g. O:RWED, G:RWE, W:RE • Note: for this to work there must be a mechanism for grouping users. • This approach is used in UNIX systems.

Access Control Lists • For each file a list of users and permissible operations is specified, e.g. a file could be read by uid x and any gid and written to by processes with uid y and gid staff. • Pros: • much more flexible than protection masks, it is possible to prohibit specific uids or guids from accessing a file. • Cons: • access control lists can take a lot of space • the ordering of entries in the list is critical to avoid big performance hit

Storing Files on Disk • The next issue to address is how to store files on the disk. • Disks are block oriented devices, i.e. data is stored as a series of blocks. • Therefore files must be stored as a series of blocks. • Of course the question is how big to make the blocks ?

Block Size and File System Performance • The average (median) UNIX file is 1K. • If blocks are made large then small files are very wasteful of disk space. • For example, if we make the block size equal to one cylinder (32K) then 97% (i.e. 31/32) of the disk space is wasted.

Block Size and File System Performance • But, each block requires a separate read/write and so incurs a seek and rotational delay, • So, if we make blocks small then we get poor performance. • Good example of a fundamental conflict between resource utilisation and performance.

The Trade Off… 100 200 Disk space utilisation 75 Assuming file size of 1K 150 Disk space utilisation (%) Kbytes/Sec 100 50 50 25 Data Rate 0 0 128 256 512 1K 2K 4K 8K Block Size

Keeping Tracks of Blocks • We need to keep track of which blocks make up a given file. • Four main methods of doing this: • Make all blocks contiguous for a given file. • Link the blocks of a given file. • Use a file map. • Use index blocks.

Contiguous Files • Keep all of the blocks of the file adjacent. • The directory entry for the file contains: • a pointer to the start • the length of the file

Contiguous Files File Blocks Directory Entries 4 EOF

Contiguous Files • Keep all of the blocks of the file adjacent. • The directory entry for the file contains: • a pointer to the start • the length of the file • Simple to implement • Good performance • Main problem is fragmentation (again!). • Need to reserve disk space • Resilient way of arranging data since corruptions are localised.

Block Linkage • Maintain blocks in a linked list. • Each block contains a pointer to the next block or null if it is the end of the list.

Block Linkage File Blocks Directory Entries 4 NULL

Block Linkage • Maintain blocks in a linked list. • Each block contains a pointer to the next block or null if it is the end of the list. • Access must be sequential. • Data stored in a block is no longer 2n bytes • Corruption can cause real problems if links become corrupted. • Storing data using a doubly linked list can help.

File Map • The state of the entire disk is stored in a file map. • The directory entry points to the first block of the file in the map. • The map contains links to the other blocks in the file. • Effectively, this is linked list allocation using an index.

File Map File Map 0 Directory Entry 1 NULL 2 4 3 7 4 5 6 9 7 8 9 2

File Map • Entire block available for data. • Random access much quicker given that the file map is held in main memory – no disk references. • File map can become very large. • Damage to the file map can result in serious data loss. • Many systems store multiple copies of the file map (though MS-DOS puts them all in the same place!).

Index Blocks • Pointers to a file’s blocks are stored in one or more index blocks. • These index blocks are chained together if required for big files. • The directory entry for a file points to the first index block for the file.

Index Blocks I-node

Index Blocks • Big advantage is that files can be accessed non-sequentially. • In UNIX small files don’t have separate index blocks, instead the entries are in (effectively) the directory entry. • Corruption of an index block is bad news for the file in question.

Free Space • In order to be able to write data to the disk we need to manage the free space. • This is often done using either: • linked list of disk blocks • Each block holds free disk block numbers • a file map (or bit map) • Disk with n blocks requires a bit map with n bits. • Preferred technique

Implementing the Open Operation • look up the directory entry for the file • check access privileges • check if the file is already open • reading vs writing • find device and location of file or create new space • create a file descriptor which acts as a handle to the file

A Quick Note on File Descriptors • These are the handles used by application programs onto files. • Avoids hard-wiring file name dependencies into programs. • Provides a convenient structure in which the O/S can store all its data such as • the location of the first block of the file • the location of the next block to read

Summary • Looked at the major issues in building file systems. • Examined file space structuring, sharing and protection. • Conflict between space and time efficiency w.r.t block sizes. • The basic operations involved in carrying out an open operation. • The notion of file descriptors.

Coming Next Week • Systems software

Topic 9 : File Systems