Recitation 8 Disk & File System

Recitation 8 Disk & File System

Disk Scheduling • Disks are at least four orders of magnitude slower than main memory • The performance of disk I/O is vital for the performance of the computer system as a whole • Access time (seek time + rotational delay) >> transfer time for a sector • Therefore the order in which sectors are read matters a lot • Disk scheduling • Usually based on the position of the requested sector rather than according to the process priority • Possibly reorder stream of read/write request to improve performance

Disk Scheduling (Cont.) • Several algorithms exist to schedule the servicing of disk I/O requests. • We illustrate them with a request queue (tracks 0-199). • 98, 183, 37, 122, 14, 124, 65, 67 • Head pointer 53

FCFS Illustration shows total head movement of 640 cylinders.

SSF (Shortest Seek First) • Selects the request with the minimum seek time from the current head position. • SSF scheduling is a form of SJF scheduling; may cause starvation of some requests. • Illustration shows total head movement of 236 cylinders.

SSF (Cont.)

SCAN (Elevator 1) • The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. • Sometimes called the elevator algorithm. • Illustration shows total head movement of 208 cylinders.

SCAN (Cont.)

C-SCAN (Elevator 2) • Provides a more uniform wait time than SCAN. • The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. • Treats the cylinders as a circular list that wraps around from the last cylinder to the first one.

C-SCAN (Cont.)

Sample Question 1 • [20], 10, 22, 20, 2, 40, 6, 38 • 6 ms /per cylinder. • (FCFS) [20], 10, 22, 20, 2, 40, 6, 38 • 10 + 12 + 2 + 18 + 38 + 34 + 32 = 146 cylinders= 876 msec. • (SSR) [20] 20, 22, 10, 6, 2, 38, 40 • 0 + 2 + 12 + 4 + 4 + 36 +2 = 60 cylinders = 360 msec. • (Elevator 1) [20] 22, 38, 40, 10, 6, 2 • 0 + 2 + 16 + 2 + 30 + 4 + 4 = 58 cylinders = 348 msec. • (Elevator 2) [20] 22, 38, 40, 0, 2, 6, 10 • (20 +40+10) cylinders = 420 msec

RAID • Redundant Array of Inexpensive Disks (RAID) • A set of physical disk drives viewed by the OS as a single logical drive • Replace large-capacity disks with multiple smaller-capacity drives to improve the I/O performance (at lower price) • Data are distributed across physical drives in a way that enables simultaneous access to data from multiple drives • Redundant disk capacity is used to compensate for the increase in the probability of failure due to multiple drives • Improve availability because no single point of failure • Six levels of RAID representing different design alternatives

RAID Level 0 • Does not include redundancy • Data is stripped across the available disks • Total storage space across all disks is divided into strips • Strips are mapped round-robin to consecutive disks • A set of consecutive strips that maps exactly one strip to each disk in the array is called a stripe • Can you see how this improves the disk I/O bandwidth? • What access pattern gives the best performance? stripe 0 strip 2 strip 3 strip 1 strip 0 strip 4 strip 5 strip 6 strip 7 ...

RAID Level 1 • Redundancy achieved by duplicating all the data • Every disk has a mirror disk that stores exactly the same data • A read can be serviced by either of the two disks which contains the requested data (improved performance over RAID 0 if reads dominate) • A write request must be done on both disks but can be done in parallel • Recovery is simple but cost is high strip 1 strip 0 strip 1 strip 0 strip 3 strip 2 strip 2 strip 3 ...

RAID Levels 2 and 3 Parallel access: all disks participate in every I/O request Small strips (1 bit) since size of each read/write = # of disks * strip size RAID 2: 1-bit strips and error-correcting code. ECC is calculated across corresponding bits on data disks and stored on O(log(# data disks)) ECC disks Hamming code: can correct single-bit errors and detect double-bit errors Example configurations data disks/ECC disks: 4/3, 10/4, 32/7 Less expensive than RAID 1 but still high overhead – not needed in most environments RAID 3: 1-bit strips and a single redundant disk for parity bits P(i) = X2(i)  X1(i)  X0(i) On a failure, data can be reconstructed. Only tolerates one failure at a time X2(i) = P(i)  X1(i)  X0(i) b1 b0 b2 P(b)

Disk Hardware (3) • Raid levels 0 through 2 • Backup and parity drives are shaded

Disk Hardware (4) • Raid levels 3 through 5 • Backup and parity drives are shaded

RAID Levels 4 and 5 P(0-2) strip 0 • RAID 4 • Large strips with a parity strip like RAID 3 • Independent access - each disk operates independently, so multiple I/O request can be satisfied in parallel • Independent access  small write = 2 reads + 2 writes • Example: if write performed only on strip 0: • P’(i) = X2(i)  X1(i)  X0’(i) • = X2(i)  X1(i)  X0’(i)  X0(i)  X0(i) • = P(i)  X0’(i)  X0(i) • Parity disk can become bottleneck • RAID 5 • Like RAID 4 but parity strips are distributed across all disks strip 2 strip 1 strip 4 strip 5 P(3-5) strip 3

Sample Question 2 • Ts-b is the time to read a sector to a buffer from a rotating disk. Tb-MM is the time to empty a sectors of data from the buffer into MM. (The average rate into the buffer must equal the average rate out of the buffer.) • What is the maximum rate of data into the MM, given as the number of sectors read into the MM in time Ts-b in each of the following cases. Also give the degree of interleaving in each case. • a) If Tb-MM = Ts-b and single buffering is used • b) If Tb-MM=Ts-b and double buffering is used • c) If Tb-MM = 2Ts-b and single buffering is used • d) If Tb-MM = 2Ts-b and double buffering is used

Sample Question 2 • a) If Tb-MM = Ts-b and single buffering is used • Rate = 1 sector / 2 Ts-b , single interleaving • b) If Tb-MM=Ts-b and double buffering is used • Rate = 1 sector / Ts-b , zero interleaving • c) If Tb-MM = 2Ts-b and single buffering is used • Rate = 1 sector / 3 Ts-b , double interleaving • d) If Tb-MM = 2Ts-b and double buffering is used • Rate = 1 sector / 2 Ts-b , single interleaving

UNIX File i-nodes

Implementing Files (4) An example i-node

Tree-Structured Directories

Sample question 3 • (Figure in previous slide). An I-node contains single indirect block & 10 direct addresses of 4 bytes each and all disk blocks are 1024B. What is the largest possible file. • The indirect block can hold 256 disk addresses. • Together with the 10 direct disk addresses, the maximum file has 266 blocks. • Since each block is 1 KB, the largest file is 266 KB.

Sample Question 4 • What would happen if the bitmap or Free list containing the information about free disk blocks was completely lost due to a crash? Is there any way to recover from this disaster or is it bye-bye disk? Discuss your answer for UNIX and the FAT-16 file system. • It is not a serious problem at all. Repair is straightforward; it just takes time. • The recovery algorithm is to make a list of all the blocks in all the files and take the complement as the new free list. • In UNIX this can be done by scanning all the i-nodes. • In the FAT file system, the problem cannot occur because there is no free list. But even if there were, all that would have to be done to recover it is to scan the FAT looking for free entries.

Sample question 5 • A certain file system uses 2-KB disk blocks. The median file size is 1KB. (a) If all files were exactly 1KB, what fraction of the disk space would be wasted? (b) Do you think the wastage for a real file system will be higher than this number or lower? • Answer: If all files were 1 KB, then each 2-KB block would contain one file and 1 KB of wasted space. Trying to put two files in a block is not allowed because the unit used to keep track of data is the block, not the semiblock. This leads to 50 percent wasted space. In practice, every file system has large files as well as many small ones, and these files use the disk much more efficiently. For example, a 32,769-byte file would use 17 disk blocks for storage, given a space efficiency of 32768/34816, which is about 94 percent.

Sample question 6 • A floppy disk has 40 cylinders. A seek takes 6ms/cyl moved. If no attempt is made to put the blocks of a file close to each other, 2 blocks that are logically consecutive will be about 13 cyl.s apart, on the avg. If, however, OS makes an attempt, the mean interblock distance can be reduced to 2 cyl.s. How long does it take to read a 100-block file in bothe cases, if the rotational latency is 100ms & the transfer time is 25ms/block. • Answer: The time per block is built up of three components: seek time, rotational latency, and transfer time. In all cases the rotational latency plus transfer time is the same, 125 msec. Only the seek time differs. For 13 cylinders it is 78 msec; for 2 cylinders it is 12 msec. Thus for randomly placed files the total is 203 msec, and for clustered files it is 137 msec.

Recitation 8 Disk & File System