1 / 51

ECE 6160: Advanced Computer Networks Introduction to Storage Devices

ECE 6160: Advanced Computer Networks Introduction to Storage Devices. Instructor: Dr. Xubin (Ben) He Email: Hexb@tntech.edu Tel: 931-372-3462. Prev… . Gigabit Ethernet MAC PHY VLAN. Anatomy: Key components for a typical computer. Motivation: Who Cares About I/O?.

jaden
Télécharger la présentation

ECE 6160: Advanced Computer Networks Introduction to Storage Devices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 6160: Advanced Computer NetworksIntroduction to Storage Devices Instructor: Dr. Xubin (Ben) He Email: Hexb@tntech.edu Tel: 931-372-3462

  2. Prev… • Gigabit Ethernet • MAC • PHY • VLAN ECE6160:Advanced Computer Networks

  3. Anatomy: Key components for a typical computer ECE6160:Advanced Computer Networks

  4. Motivation: Who Cares About I/O? • CPU Performance: 60% per year • I/O system performance limited by mechanical delays (disk I/O) < 10% per year (IO per sec) • Amdahl's Law: system speed-up limited by the slowest part! 10% IO & 10x CPU => 5x Performance (lose 50%) 10% IO & 100x CPU => 10x Performance (lose 90%) • I/O bottleneck: Diminishing fraction of time in CPU Diminishing value of faster CPUs ECE6160:Advanced Computer Networks

  5. I/O Systems interrupts Processor Cache Memory - I/O Bus Main Memory I/O Controller I/O Controller I/O Controller Graphics Disk Disk Network ECE6160:Advanced Computer Networks

  6. Storage Technology Drivers • Driven by the prevailing computing paradigm • 1950s: migration from batch to on-line processing • 1990s: migration to ubiquitous computing • computers in phones, books, cars, video cameras, … • nationwide fiber optical network with wireless tails • Effects on storage industry: • Embedded storage • smaller, cheaper, more reliable, lower power • Data utilities • high capacity, hierarchically managed storage ECE6160:Advanced Computer Networks

  7. Outline • Disk Basics/Disk History • Current Disk options • Disk fallacies and performance • FLASH • Tapes • RAID ECE6160:Advanced Computer Networks

  8. Inner Track Outer Track Sector Head Arm Platter Actuator Disk Device Terminology • Several platters, with information recorded magnetically on both surfaces (usually) • Bits recorded in tracks, which in turn divided into sectors (e.g., 512 Bytes) • Actuator moves head (end of arm,1/surface) over track (“seek”), select surface, wait for sector rotate under head, then read or write • “Cylinder”: all tracks under heads

  9. Tracks, Sectors, and Cylinders A cylinder is comprised of the set of tracks described by all the heads (on separate platters) at a single seek position ECE6160:Advanced Computer Networks

  10. { Platters (12) Photo of Disk Head, Arm, Actuator Spindle Arm Head Actuator ECE6160:Advanced Computer Networks source: www.seagate.com

  11. Disk Device Performance • Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead • Seek Time? depends on tracks move arm, seek speed of disk • Rotation Time? depends on speed disk rotates, how far sector is from head • Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request Inner Track Outer Track Sector Head Controller Arm Spindle Platter Actuator ECE6160:Advanced Computer Networks

  12. Disk Device Performance • Average rotation time: • Average distance sector from head? • 1/2 time of a rotation • 10000 Revolutions Per Minute  166.67 Rev/sec • 1 revolution = 1/ 166.67 sec  6.00 milliseconds • 1/2 rotation (revolution)  3.00 ms • Average seek time? • Sum all possible seek distances from all possible tracks / # possible • Assumes average seek distance is random • Disk industry standard benchmark • About 10ms (ATA) and 5ms (SCSI) ECE6160:Advanced Computer Networks

  13. Data Rate: Inner vs. Outer Tracks • To keep things simple, orginally kept same number of sectors per track • Since outer track longer, lower bits per inch • Competition  decided to keep BPI the same for all tracks (“constant bit density”)  More capacity per disk  More of sectors per track towards edge  Since disk spins at constant speed, outer tracks have faster data rate • Bandwidth outer track 1.7X inner track! • Inner track highest density, outer track lowest, so not really constant • 2.1X length of track outer / inner, 1.7X bits outer / inner ECE6160:Advanced Computer Networks

  14. Disk Performance Model /Trends • Capacity + 100%/year (2X / 1.0 yrs) • Transfer rate (BW) + 40%/year (2X / 2.0 yrs) • Rotation + Seek time – 8%/ year (1/2 in 10 yrs) • MB/$ > 100%/year (2X / 1.0 yrs) 60GB/$300 => $5/GB =>0.5c/MB ECE6160:Advanced Computer Networks

  15. Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth { per access + per byte State of the Art: Barracuda 180 • 181.6 GB, 3.5 inch disk • 12 platters, 24 surfaces • 24,247 cylinders • 7,200 RPM; • 7.4/8.2 ms avg. seek (r/w) • 64 to 35 MB/s (internal) • 0.1 ms controller time • 10.3 watts (idle) Track Sector Cylinder Track Buffer Arm Platter Head Q:Calculate time to read 64 KB (128 sectors) for Barracuda 180, sector is on outer track. source: www.seagate.com ECE6160:Advanced Computer Networks

  16. Disk Performance Example • Calculate time to read 64 KB (128 sectors) for Barracuda 180 X using advertised performance; sector is on outer track Disk latency = average seek time + average rotational delay + transfer time + controller overhead = 7.4 ms + 0.5 * 1/(7200 RPM) + 64 KB / (64 MB/s) + 0.1 ms = 7.4 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / (64 KB/ms) + 0.1 ms = 7.4 + 4.2 + 1.0 + 0.1 ms = 12.7 ms ECE6160:Advanced Computer Networks

  17. Areal Density: linear density x track density • Bits recorded along a track • Metric is Bits Per Inch (BPI) along a track • Number of tracks per surface • Metric is Tracks Per Inch (TPI) • Disk Designs Brag about bit density per unit area • Metric is Bits Per Square Inch • Called Areal Density • Areal Density =BPI x TPI ECE6160:Advanced Computer Networks

  18. Areal Density • Areal Density =BPI x TPI • Change slope 30%/yr to 60%/yr about 1991 ECE6160:Advanced Computer Networks

  19. Historical Perspective • 1956 IBM Ramac — early 1970s Winchester • Developed for mainframe computers, proprietary interfaces • Steady shrink in form factor: 27 in. to 14 in • Form factor and capacity drives market, more than performance • 1970s: Mainframes  14 inch diameter disks • 1980s: Minicomputers,Servers  8”,5 1/4” diameter • PCs, workstations Late 1980s/Early 1990s: • Mass market disk drives become a reality • industry standards: SCSI, IPI, IDE • Pizzabox PCs  3.5 inch diameter disks • Laptops, notebooks  2.5 inch disks • Palmtops didn’t use disks, so 1.8 inch diameter disks didn’t make it • 2000s: • 1 inch for cameras, cell phones? ECE6160:Advanced Computer Networks

  20. Disk History Data density Mbit/sq. in. Capacity of Unit Shown Megabytes 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces”

  21. Disk History 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 MBytes 1997: 3090 Mbit/sq. in 8100 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces”

  22. 1 inch disk drive! • 2003 IBM MicroDrive: • 1.7” x 1.4” x 0.2” • 1 GB, 3600 RPM, 13.3 MB/s, 15 ms seek • Digital camera, PalmPC? • 2003 USB mini flash drive • 75mmx28mmx12mm, 17g • 256MB • 0.5-1MB/s • 2006 MicroDrive? • 9 GB, 50 MB/s! • Assuming it finds a niche in a successful product • Assuming past trends continue

  23. Seagate Cheetah 3146807LW Hitachi TravelStar 7K60 IBM Microdrive (07N5574) Dimension 3.5”, 1.5 lbs 2.5”, 0.2 lbs 1.7”, 0.6 lbs Interface Ultra 320 SCSI ATA-100 (Ultra) CF+ Capacity 146.8GB 60GB 1GB Data Transfer Rate 320MB/s 100MB/s 13.3MB/s Average seek time 4.7ms 10ms 15ms Spindle Speed 10000RPM 7200RPM 3600RPM Buffer 8MB 8MB 128KB Price (warehouse.com) $899.95 $299.95 $319.95 Disk Characteristics in 2003

  24. Seagate Cheetah ST3146707LW Hitachi TravelStar 7K60 Seagate Barracuda ES ST3750640NS Dimension 3.5” 2.5”, 0.2 lbs 3.5” Interface Ultra 320 SCSI ATA-100 (Ultra) Serial ATA-300 Capacity 146.8GB 60GB 750GB Data Transfer Rate 320MB/s 100MB/s Average seek time 4.7ms 10ms Spindle Speed 10000RPM 7200RPM 7200RPM Buffer 8MB 8MB 16MB Price (warehouse.com) $259 $93.99 (80GB) $319.95 Disk Characteristics in 2007

  25. Today’s Hard Drive (09/29/2008)(warehouse.com) • Seagate Barracuda 7200.11 - hard drive - 750 GB - SATA-300 Hard drive - 750 GB - internal - 3.5" - SATA-300 - 7200 rpm - buffer: 32 MB, $116 • Seagate FreeAgent Pro - hard drive - 500 GB - FireWire/Hi-Speed USB/eSATA Hard drive – 500 GB – external – 3.5” – USB 2.0, eSATA-300, FireWire interfaces – 7200 rpm – buffer: 32 MB, $100 ECE6160:Advanced Computer Networks

  26. Fallacy: Use Data Sheet “Average Seek” Time • Manufacturers needed standard for fair comparison (“benchmark”) • Calculate all seeks from all tracks, divide by number of seeks => “average” • Real average would be based on how data laid out on disk, where seek in real applications, then measure performance • Usually, tend to seek to tracks nearby, not to random track • Rule of Thumb: observed average seek time is typically about 1/4 to 1/3 of quoted seek time (i.e., 3X-4X faster) • Barracuda 180 X avg. seek: 7.4 ms  2.5 ms ECE6160:Advanced Computer Networks

  27. Fallacy: Use Data Sheet Transfer Rate • Manufacturers quote the speed off the data rate off the surface of the disk. • Sectors contain an error detection and correction field (can be 20% of sector size) plus sector number as well as data • There are gaps between sectors on track • Rule of Thumb: disks deliver about 3/4 of internal media rate (1.3X slower) for data • For example, Barracuda 180X quotes 64 to 35 MB/sec internal media rate  47 to 26 MB/sec external data rate (74%) ECE6160:Advanced Computer Networks

  28. Disk Performance Example: revision • Calculate time to read 64 KB for UltraStar 72 again, this time using 1/3 quoted seek time, 3/4 of internal outer track bandwidth; (12.7 ms before) Disk latency = average seek time + average rotational delay + transfer time + controller overhead = (0.33 * 7.4 ms) + 0.5 * 1/(7200 RPM) + 64 KB / (0.75 * 65 MB/s) + 0.1 ms = 2.5 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / (47 KB/ms) + 0.1 ms = 2.5 + 4.2 + 1.4 + 0.1 ms = 8.2 ms (64% of 12.7) ECE6160:Advanced Computer Networks

  29. Future Disk Size and Performance • Continued advance in capacity (60%/yr) and bandwidth (40%/yr). • Slow improvement in seek, rotation (8%/yr). • Time to read whole disk. Year Sequentially Randomly (1 sector/seek) 1990 4 minutes 6 hours 2000 12 minutes 1 week(!) • 3.5” form factor make sense in 5 yrs? • What is capacity, bandwidth, seek time, RPM? • Assume today 80 GB, 100 MB/sec, 6 ms, 10000 RPM • 838GB, 537MB/s, 4ms, 15000RPM ECE6160:Advanced Computer Networks

  30. What about FLASH • Compact Flash Cards • Intel Strata Flash • 16 Mb in 1 square cm. (.6 mm thick) • 100,000 write/erase cycles. • Standby current = 100uA, write = 45mA • Compact Flash 256MB~=$120 512MB~=$542 • Transfer @ 3.5MB/s • IBM Microdrive 1G~370 • Standby current = 20mA, write = 250mA • Efficiency advertised in wats/MB • VS. Disks • Nearly instant standby wake-up time • Random access to data stored • Tolerant to shock and vibration (1000G of operating shock) ECE6160:Advanced Computer Networks

  31. Disk Operations • Seek: move head to track; • Rotation: wait for sector under head; • Transfer:Move data to/from disks; • Overhead: • Controller delay • Queuing delay Access time = seek time + rotational delay + transfer time+overhead ECE6160:Advanced Computer Networks

  32. Improving disk performance. • Use large sectors to improve bandwidth • Use track caches and read ahead; • Read entire track into on-controller cache • Exploit locality (improves both latency and BW) • Design file systems to maximize locality • Allocate files sequentially on disks (exploit track cache) • Locate similar files in same cylinder (reduce seeks) • Locate simlar files in near-by cylinders (reduce seek distance) • Pack bits closer together to improve transfer rate and density. • Use a collection of small disks to form a large, high performance one--->disk array Stripping data across multiple disks to allow parallel I/O, thus improving performance. ECE6160:Advanced Computer Networks

  33. Use Arrays of Small Disks? • Katz and Patterson asked in 1987: • Can smaller disks be used to close gap in performance between disks and CPUs? Conventional: 4 disk designs 3.5” 5.25” 10” 14” High End Low End Disk Array: 1 disk design 3.5” ECE6160:Advanced Computer Networks

  34. Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K x70 23 GBytes 11 cu. ft. 1 KW 110 MB/s 3900 IOs/s ??? Hrs $150K IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K Capacity Volume Power Data Rate I/O Rate MTTF Cost 9X 3X 8X 6X Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

  35. Array Reliability • MTTF: Mean Time To Failure: average time that a non - repairable component will operate before experiencing failure. • Reliability of N disks = Reliability of 1 Disk ÷N • 50,000 Hours ÷ 70 disks = 700 hours • Disk system MTTF: Drops from 6 years to 1 month! • Arrays without redundancy too unreliable to be useful! • Solution: redundancy. ECE6160:Advanced Computer Networks

  36. Redundant Arrays of (Inexpensive) Disks • Replicate data over several disks so that no data will be lost if one disk fails. • Redundancy yields high data availability • Availability: service still provided to user, even if some components failed • Disks will still fail • Contents reconstructed from data redundantly stored in the array  Capacity penalty to store redundant info  Bandwidth penalty to update redundant info ECE6160:Advanced Computer Networks

  37. ECE6160:Advanced Computer Networks

  38. Levels of RAID • Original RAID paper described five categories (RAID levels 1-5). (Patterson et al, “A case for redundant arrays of inexpensive disks (RAID)”, ACM SIGMOD, 1988) • Disk striping with no redundant now is called RAID0 or JBOD(Just a bunch of disks). • Other kinds have been proposed in literature, Level 6 (P+Q Redundancy), Level 10, RAID53, etc. • Except RAID0, all the RAID levels trade disk capacity for reliability, and the extra reliability makes parallism a practical way to improve performance. ECE6160:Advanced Computer Networks

  39. file data block 0 block 1 block 2 block 3 Disk 1 Disk 0 Disk 2 Disk 3 RAID 0: Nonredundant (JBOD) • High I/O performance. • Data is not save redundantly. • Single copy of data is striped across multiple disks. • Low cost. • Lack of redundancy. • Least reliable: single disk failure leads to data loss. ECE6160:Advanced Computer Networks

  40. RAID 0: Striped Disk Array without Fault Tolerance RAID Level 0 requires a minimum of 2 drives to implement ECE6160:Advanced Computer Networks

  41. Redundant Arrays of Inexpensive DisksRAID 1: Disk Mirroring/Shadowing recovery group • • Each disk is fully duplicated onto its “mirror” • Very high availability can be achieved • • Bandwidth sacrifice on write: • Logical write = two physical writes • • Reads may be optimized, minimize the queue and disk search time • • Most expensive solution: 100% capacity overhead Targeted for high I/O rate , high availability environments

  42. RAID 1: Mirroring and Duplexing For Highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair. RAID Level 1 requires a minimum of 2 drives to implement ECE6160:Advanced Computer Networks

  43. block 0 block 1 block 2 block 3 P(0-3) block 4 block 5 block 6 block 7 P(4-7) block 9 block 10 block 11 block 8 P(8-11) block 13 block 15 block 14 P(12-15) block 12 RAID 4: Block Interleaved Parity • Blocks: striping units • Allow for parallel access by multiple I/O requests, high I/O rates • Doing multiple small reads is now faster than before. (allows small read requests to be restricted to a single disk). • Large writes(full stripe), update the parity: • P’ = d0’ + d1’ + d2’ + d3’; • Small writes(eg. write on d0), update the parity: • P = d0 + d1 + d2 + d3 • P’ = d0’ + d1 + d2 + d3 = P + d0’ + d0; • However, writes are still very slow since the parity • disk is the bottleneck.

  44. RAID 4: Independent Data disks with shared Parity disk Each entire block is written onto a data disk. Parity for same rank blocks is generated on Writes, recorded on the parity disk and checked on Reads. RAID Level 4 requires a minimum of 3 drives to implement ECE6160:Advanced Computer Networks

  45. D0 D1 D2 D3 P D7 P D4 D5 D6 Inspiration for RAID 5 • RAID 4 works well for small reads • Small writes (write to one disk): • Option 1: read other data disks, create new sum and write to Parity Disk • Option 2: since P has old sum, compare old data to new data, add the difference to P • Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk. Parity disk must be updated for every write operation!

  46. Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Increasing Logical Disk Addresses D0 D1 D2 D3 P Independent writes possible because of interleaved parity D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 Example: write to D0, D5 uses disks 0, 1, 3, 4 P D16 D17 D18 D19 D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns

  47. block 0 block 2 block 3 P(0-3) block 1 block 5 block 4 block 6 P(4-7) block 7 block 9 block 10 block 11 P(8-11) block 8 block 12 P(12-15) block 13 block 14 block 15 P(16-19) block 16 block 17 block 18 block 19 RAID 5: Block Interleaved Distributed-Parity Left Symmetric Distribution • Parity disk = (block number/4) mod 5 • Eliminate the parity disk bottleneck of RAID 4 • Best small read, large read and large write performance • Can correct any single self-identifying failure • Small logical writes take two physical reads and two • physical writes. • Recovering needs reading all nonfailed disks

  48. RAID 5: Independent Data disks with distributed parity blocks Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. RAID Level 5 requires a minimum of 3 drives to implement ECE6160:Advanced Computer Networks

  49. Comparison of RAID Levels (N disks, each with capacity of C)

  50. Other RAIDs • HP AutoRAID • AFRAID • RAPID • SwiftRAID • TickerTAIP • SMDA • …… ECE6160:Advanced Computer Networks

More Related