Disk Failures and Error Handling in Data Representation

DATA REPRESENTAION Presenters: Namrata Buddhadev (104) Deepti Bhardwaj (103)

Index 13.4 Disk Failures 13.4.1 Intermittent Failures 13.4.2 Organizing Data by Cylinders 13.4.3 Stable Storage 13.4.4 Error- Handling Capabilities of Stable Storage 13.4.5 Recovery from Disk Crashes 13.4.6 Mirroring as a Redundancy Technique 13.4.7 Parity Blocks 13.4.8 An Improving: RAID 5 13.4.9 Coping With Multiple Disk Crashers

Intermittent Failures • If we try to read the sector but the correct content of that sector is not delivered to the disk controller • Check for the good or bad sector • To check write is correct: Read is performed • Good sector and bad sector is known by the read operation

Checksums • Each sector has some additional bits, called the checksums • Checksums are set on the depending on the values of the data bits stored in that sector • Probability of reading bad sector is less if we use checksums • For Odd parity: Odd number of 1’s, add a parity bit 1 • For Even parity: Even number of 1’s, add a parity bit 0 • So, number of 1’s becomes always even

Intermittent Failure: Parity Check • Media Decay And Write Failure: Stable Storage • Disk Crash: RAID • Example: 1. Sequence : 01101000-> odd no of 1’s parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of 1’s parity bit: 0 -> 111011100

By finding one bit error in reading and writing the bits and their parity bit results in sequence of bits that has odd parity, so the error can be detected • Error detecting can be improved by keeping one bit for each byte • Probability is 50% that any one parity bit will detect an error, and chance that none of the eight do so is only one in 2^8 or 1/256 • Same way if n independent bits are used then the probability is only 1/(2^n) of missing error

Stable Storage • Correct Errors • Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by subsituting spare sector of Xl and Xr until the good value is returned

Error Handling Capabilities of Stable Storage • Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small • Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr 2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X

Recovery from Disk Crashes: Ways to recover the data • The most serious mode of failure for disks is “head crash” where data permanently destroyed. • So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes. • Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk.

Mirroring as a Redundancy Technique • Mirroring Scheme is referred as RAID level 1 protection against data loss scheme. • In this scheme we mirror each disk. • One of the disk is called as data disk and other redundant disk. • In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.

Parity Blocks • RAID level 4 scheme uses only one redundant disk no matter how many data disks there are. • In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks. • It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true.

Parity Block – Reading disk Reading data disk is same as reading block from any disk. • We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum. disk 2: 10101010 disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of the bits in each column, we get - disk 1: 11110000

Parity Block - Writing • When we write a new block of a data disk, we need to change that block of the redundant disk as well. • One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk. But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os • Better approach will require only four disk I/Os 1. Read the old value of the data block being changed. 2. Read the corresponding block of the redundant disk. 3. Write the new data block. 4. Recalculate and write the block of the redundant disk.

Parity Blocks – Failure Recovery If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk. Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like: disk 1: 11110000 disk 2: ???????? disk 3: 00111000 disk 4: 01100010

If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is : 10101010

An Improvement: RAID 5 • RAID 4 is effective in preserving data unless there are two simultaneous disk crashes. • Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk. • However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.

For instance, if there are n + 1 disks numbered 0 through n, we could treat the ith cylinder of disk j as redundant if j is the remainder when i is divided by n + 1. For example, n = 3 so there are 4 disks. The first disk, numbered 0, is redundant for its cylinders numbered 4, 8, 12, and so on, because these are the numbers that leave remainder 0 when divided by 4. The disk numbered 1 is redundant for blocks numbered 1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6. 10,. . ., and disk 3 is redundant for 3, 7, 11,. . . .

Coping With Multiple Disk Crashes • Error-correcting codes theory known as Hamming code leads to the RAID level 6. • By this strategy the two simultaneous crashes are correctable. • The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3. • The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4. • The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4

Coping With Multiple Disk Crashes – Reading / Writing • We may read data from any data disk normally. • To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1.

Thank You

Disk Failures and Error Handling in Data Representation

Disk Failures and Error Handling in Data Representation

Presentation Transcript

Data Mining: Data

Data Mining: Data

Data Mining: Data

Data, Data Everywhere

Data, Data, and more Data

Data, Data, and more Data

Data Data Data

Data Mining: Data

Data ! Data! Data!

DATA? What DATA?

Data Mining: Data

Data Mining: Data

DATA REPRESENTAION

Data Mining: Data

Data Mining: Data

DATA REPRESENTAION

Data Begets Data

Data, Data, Everywhere...

Data, Data, and More Data

Data Mining: Data

Data Mining: Data