Sections 13.1 – 13.3 SanujaDabade & Eilbroun Benjamin CS 257 – Dr. TY Lin Secondary storage management
Presentation Outline • 13.1 The Memory Hierarchy • 13.1.1 The Memory Hierarchy • 13.1.2 Transfer of Data Between Levels • 13.1.3 Volatile and Nonvolatile Storage • 13.1.4 Virtual Memory • 13.2 Disks • 13.2.1 Mechanics of Disks • 13.2.2 The Disk Controller • 13.2.3 Disk Access Characteristics
Presentation Outline (con’t) • 13.3 Accelerating Access to Secondary Storage • 13.3.1 The I/O Model of Computation • 13.3.2 Organizing Data by Cylinders • 13.3.3 Using Multiple Disks • 13.3.4 Mirroring Disks • 13.3.5 Disk Scheduling and the Elevator Algorithm • 13.3.6 Prefetching and Large-Scale Buffering
13.1.1 Memory Hierarchy Several components for data storage having different data capacities available Cost per byte to store data also varies Device with smallest capacity offer the fastest speed with highest cost per bit
Memory Hierarchy Diagram Tertiary Storage As Visual Memory Disk File System Main Memory Cache Programs, DBMS Main Memory DBMS’s
13.1.1 Memory Hierarchy • Cache • Lowest level of the hierarchy • Data items are copies of certain locations of main memory • Sometimes, values in cache are changed and corresponding changes to main memory are delayed • Machine looks for instructions as well as data for those instructions in the cache • Holds limited amount of data
13.1.1 Memory Hierarchy (con’t) No need to update the data in main memory immediately in a single processor computer In multiple processors data is updated immediately to main memory….called as write through
Main Memory Everything happens in the computer i.e. instruction execution, data manipulation, as working on information that is resident in main memory Main memories are random access….one can obtain any byte in the same amount of time
Secondary storage Used to store data and programs when they are not being processed More permanent than main memory, as data and programs are retained when the power is turned off E.g. magnetic disks, hard disks
Tertiary Storage Holds data volumes in terabytes Used for databases much larger than what can be stored on disk
13.1.2 Transfer of Data Between levels Data moves between adjacent levels of the hierarchy At the secondary or tertiary levels accessing the desired data or finding the desired place to store the data takes a lot of time Disk is organized into bocks Entire blocks are moved to and from memory called a buffer
13.1.2 Transfer of Data Between level (cont’d) A key technique for speeding up database operations is to arrange the data so that when one piece of data block is needed it is likely that other data on the same block will be needed at the same time Same idea applies to other hierarchy levels
13.1.3 Volatile and Non Volatile Storage A volatile device forgets what data is stored on it after power off Non volatile holds data for longer period even when device is turned off All the secondary and tertiary devices are non volatile and main memory is volatile
13.1.4 Virtual Memory Typical software executes in virtual memory Address space is typically 32 bit or 232 bytes or 4GB Transfer between memory and disk is in terms of blocks
13.2.1 Mechanism of Disk • Mechanisms of Disks • Use of secondary storage is one of the important characteristic of DBMS • Consists of 2 moving pieces of a disk • 1. disk assembly • 2. head assembly • Disk assembly consists of 1 or more platters • Platters rotate around a central spindle • Bits are stored on upper and lower surfaces of platters
13.2.1 Mechanism of Disk Disk is organized into tracks The track that are at fixed radius from center form one cylinder Tracks are organized into sectors Tracks are the segments of circle separated by gap
13.2.2 Disk Controller • One or more disks are controlled by disk controllers • Disks controllers are capable of • Controlling the mechanical actuator that moves the head assembly • Selecting the sector from among all those in the cylinder at which heads are positioned • Transferring bits between desired sector and main memory • Possible buffering an entire track
13.2.3 Disk Access Characteristics • Accessing (reading/writing) a block requires 3 steps • Disk controller positions the head assembly at the cylinder containing the track on which the block is located. It is a ‘seek time’ • The disk controller waits while the first sector of the block moves under the head. This is a ‘rotational latency’ • All the sectors and the gaps between them pass the head, while disk controller reads or writes data in these sectors. This is a ‘transfer time’
13.3 Accelerating Access to Secondary Storage • Several approaches for more-efficiently accessing data in secondary storage: • Place blocks that are together in the same cylinder. • Divide the data among multiple disks. • Mirror disks. • Use disk-scheduling algorithms. • Prefetch blocks into main memory. • Scheduling Latency – added delay in accessing data caused by a disk scheduling algorithm. • Throughput – the number of disk accesses per second that the system can accommodate.
13.3.1 The I/O Model of Computation • The number of block accesses (Disk I/O’s) is a good time approximation for the algorithm. • This should be minimized. • Ex 13.3: You want to have an index on R to identify the block on which the desired tuple appears, but not where on the block it resides. • For Megatron 747 (M747) example, it takes 11ms to read a 16k block. • A standard microprocessor can execute millions of instruction in 11ms, making any delay in searching for the desired tuple negligible.
13.3.2 Organizing Data by Cylinders • If we read all blocks on a single track or cylinder consecutively, then we can neglect all but first seek time and first rotational latency. • Ex 13.4: We request 1024 blocks of M747. • If data is randomly distributed, average latency is 10.76ms by Ex 13.2, making total latency 11s. • If all blocks are consecutively stored on 1 cylinder: • 6.46ms + 8.33ms * 16 = 139ms (1 average seek) (time per rotation) (# rotations)
13.3.3 Using Multiple Disks • If we have n disks, read/write performance will increase by a factor of n. • Striping – distributing a relation across multiple disks following this pattern: • Data on disk R1: R1, R1+n, R1+2n,… • Data on disk R2: R2, R2+n, R2+2n,… … • Data on disk Rn: Rn, Rn+n, Rn+2n, … • Ex 13.5: We request 1024 blocks with n = 4. • 6.46ms + (8.33ms * (16/4)) = 39.8ms (1 average seek) (time per rotation) (# rotations)
13.3.4 Mirroring Disks • Mirroring Disks – having 2 or more disks hold identical copied of data. • Benefit 1: If n disks are mirrors of each other, the system can survive a crash by n-1 disks. • Benefit 2: If we have n disks, read performance increases by a factor of n. • Performance increases further by having the controller select the disk which has its head closest to desired data block for each read.
13.3.5 Disk Scheduling and the Elevator Problem • Disk controller will run this algorithm to select which of several requests to process first. • Pseudo code: • requests // array of all non-processed data requests • upon receiving new data request: • requests.add(new request) • while(requests is not empty) • move head to next location • if(head location is at data in requests) • retrieve data • remove data from requests • if(head reaches end) • reverse head direction
13.3.5 Disk Scheduling and the Elevator Problem (con’t) Events: Head starting point Request data at 8000 Request data at 24000 Request data at 56000 Get data at 8000 Request data at 16000 Get data at 24000 Request data at 64000 Get data at 56000 Request Data at 40000 Get data at 64000 Get data at 40000 Get data at 16000 64000 56000 48000 40000 32000 24000 16000 8000
13.3.5 Disk Scheduling and the Elevator Problem (con’t) Elevator Algorithm FIFO Algorithm
13.3.6 Prefetching and Large-Scale Buffering • If at the application level, we can predict the order blocks will be requested, we can load them into main memory before they are needed.
Disk Failures Disk failure ways and their mitigation 13.4 By Priya Gangaraju and Xiaqing He
Ways in which disks can fail: Intermittent failure Media Decay Write failure Disk Crash
Intermittent Failures Read or write operation on a sector successful not on first try, but after repeated tries. The most common form of failure. Parity checks can be used to detect this kind of failure.
Media Decay Serious form of failure. Bit/Bits are permanently corrupted. Impossible to read a sector correctly even after many trials. Stable storage technique for organizing a disk is used to avoid this failure.
Write failure Attempt to write a sector is not possible. Attempt to retrieve previously written sector is unsuccessful. Possible reason – power outage while writing of the sector. Stable Storage Technique can be used to avoid this.
Disk Crash • Most serious form of disk failure. • Entire disk becomes unreadable, suddenly and permanently. • RAID techniques can be used for coping with disk crashes.
More on Intermittent failures… When we try to read a sector, but the correct content of that sector is not delivered to the disk controller. If the controller has a way to tell that the sector is good or bad (checksums), it can then reissue the read request when bad data is read.
More on Intermittent Failures.. The controller can attempt to write a sector, but the contents of the sector are not what was intended. The only way to check this is to let the disk go around again read the sector. One way to perform the check is to read the sector and compare it with the sector we intend to write.
Continued Instead of performing the complete comparison at the disk controller, simpler way is to read the sector and see if a good sector was read. If it is good sector, then the write was correct otherwise the write was unsuccessful and must be repeated.
Checksums Technique used to determine the good/bad status of a sector. Each sector has some additional bits called the checksum that are set depending on the values of the data bits in that sector. If checksum is not proper on reading, then there is an error in reading.
More on Checksums… There is a small chance that the block was not read correctly even if the checksum is proper. The probability of correctness can be increased by using many checksum bits.
Checksum calculation.. Checksum is based on the parity of all bits in the sector. If there are odd number of 1’s among a collection of bits, the bits are said to have odd parity. A parity bit ‘1’ is added. If there are even number of 1’s then the collection of bits is said to have even parity. A parity bit ‘0’ is added.
Continued The number of 1’s among a collection of bits and their parity bit is always even. During a write operation, the disk controller calculates the parity bit and append it to the sequence of bits written in the sector. Every sector will have a even parity.
Examples… A sequence of bits 01101000 has odd number of 1’s. The parity bit will be 1. So the sequence with the parity bit will now be 011010001. A sequence of bits 11101110 will have an even parity as it has even number of 1’s. So with the parity bit 0, the sequence will be 111011100.
Continued Any one-bit error in reading or writing the bits results in a sequence of bits that has odd-parity. The disk controller can count the number of 1’s and can determine if the sector has odd parity in the presence of an error.
Odds… There are chances that more than one bit can be corrupted and the error can be unnoticed. Increasing the number of parity bits can increase the chances of detecting errors. In general, if there are n independent bits as checksum, the chances of error will be one in 2n.
Stable Storage Checksums can detect the error but cannot correct it. Sometimes we overwrite the previous contents of a sector and yet cannot read the new contents correctly. To deal with these problems, Stable Storage policy can be implemented on the disks.
Continued Sectors are paired and each pair represents one sector-contents X. The left copy of the sector may be represented as XL and XR as the right copy.
Assumptions We assume that copies are written with sufficient number of parity bits to decrease the chance of bad sector looks good when the parity checks are considered. Also, If the read function returns a good value w for either XL or XR then it is assumed that w is the true value of X.
Stable -Storage Writing Policy: Write the value of X into XL. Check the value has status “good”; i.e., the parity-check bits are correct in the written copy. If not repeat write. If after a set number of write attempts, we have not successfully written X in XL, assume that there is a media failure in this sector. A fix-up such as substituting a spare sector for XL must be adopted. Repeat (1) for XR.
Stable-Storage Reading Policy: The policy is to alternate trying to read XL and XR until a good value is returned. If a good value is not returned after pre chosen number of tries, then it is assumed that X is truly unreadable.
Error-Handling capabilities: Media failures: • If after storing X in sectors XL and XR, one of them undergoes media failure and becomes permanently unreadable, we can read from the second one. • If both the sectors have failed to read, then sector X cannot be read. • The probability of both failing is extremely small.