1 / 24

Reliability of MEMS-Based Storage Enclosures

Reliability of MEMS-Based Storage Enclosures. Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long. Storage Systems Research Center University of California, Santa Cruz *Also Santa Clara University, Santa Clara, CA. Spring. MEMS Storage Technology.

kendis
Télécharger la présentation

Reliability of MEMS-Based Storage Enclosures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J.* Scott A. Brandt, Darrell D. E. Long Storage Systems Research Center University of California, Santa Cruz *Also Santa Clara University, Santa Clara, CA

  2. Spring MEMS Storage Technology • Micro-Electro-Mechanical Systems (MEMS) storage • A promising alternative secondary storage technology • Hardware Research: IBM, HP, CMU, Nanochip • Magnetic storage, but very different mechanics

  3. MEMS Storage Technology • MEMS-based storage vs. Magnetic Disk • Provides non-volatile storage, too. • Delivers 10 * faster access time (< 1 ms) • Delivers higher bandwidth (100 MB – 1 GB/s) • Small (size of penny, cent) • Consumes 100* less power • Costs ~10 USD per device • Expected to be more reliable • Stores limited amount of data per device (3-10 GB) • A serious alternative to disk drives, in particular for mobile computing applications

  4. Reliability Implication of MEMS-based Storage • Storage systems built from MEMS-based storage … • Require more MEMS devices • At least 10 times the number of disks to meet capacity requirements • Require more connection components • Reliability implication • More components, hence (?) lower reliability

  5. MEMS Storage Enclosure • Our proposal: MEMS Enclosures • A device with dozens of MEMS • Single interface to rest of system • Might be serviceable, but service calls during economic lifetime should be very rare Interface

  6. MEMS Storage Enclosures • Reliability an issue: • MTTF 1- 2 years without redundant data storage • Uses RAID Level 5 technology with distributed sparing • Additional k spares • Calls for service when necessary • i.e. when we run out of spares • Organization and number of spares can • Decrease the data recovery time and thus improve reliability • Reduce human interference • No errors servicing • Reduce maintenance costs

  7. MEMS Enclosure Reliability • Measure MTBF for enclosures • Without replacing spares • With replacing spares (service calls) • Determine number of failures that trigger a service call • Mandatory replacement: no redundancy left • Preventive replacement: no spare left

  8. MEMS Enclosure Reliability without Replacement 5 spares 8.1 Yrs 4 spares 6.9 Yrs 3 spares 5.8 Yrs Disk 23 Yrs Disk 11.5 Yrs 2 spares 4.6 Yrs 1 spare 3.5 Yrs • MTTFDISK = 11.5 or 23 yrs • MTTFMEMS = 23 yrs • 19 data + 1 parity + k dedicated spares • 15-minute data recovery No spare 2.3 Yrs • MTTF is not enough to measure reliability of enclosures without repairs • Instead: focus on data reliability during the economic lifetimes (3-5 years) of enclosures

  9. Preventive replacement Mandatory replacement MEMS Enclosures with Replacement • Markov model for a MEMS enclosure with N data, one parity, and one dedicated spare devices • N – Normal; D – Degraded; DL – Data Loss • 1/ – MTTFMEMS (in tens of years) • 1/µ – Mean Time Between Recovery (in minutes) • 1/ – Mean Time Between Replacement (in days, weeks) • Preventive and mandatory replacement

  10. MEMS Enclosure Reliability with Replacement 1, 2, 3 – Number of spares Preventive + mandatory 3 2 1 Mandatory 3 2 1 No spare • Preventive replacement increases reliability and reduces replacement urgency

  11. MEMS Enclosure Reliability • Dedicated Sparing • Replace all data from a failed MEMS on a single spare MEMS • Distributed Sparing • Every spare contains • Client data • Parity data • Spare space

  12. X After MEMS 4 fails • Shorter data recovery time • More devices can fail Distributed Sparing [Menon and Mattson 1992] Before failure

  13. Dedicated 2 1 2 1 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing 1, 2– Number of spares Preventive + mandatory Mandatory Dedicated No spare Compare with following slide

  14. 2 1 2 1 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing 1, 2– Number of spares • Distributed sparing only better at short replacement times when using preventive replacement Distributed Preventive + mandatory Dedicated Mandatory Dedicated & Distributed No spare

  15. Durability of MEMS Storage Enclosures • All about economy • How long can MEMS enclosures work without repairs? • How often do they need repairing in the first 3-5 years? • How does replacement policies affect maintenance frequency? • # of failures an enclosure with k spares can tolerate before the (m+1)th repair is scheduled (m >= 0): • (m + 1) × k, under the preventive replacement policy • (m + 1) × (k + 1), under the mandatory replacement policy

  16. Durability of MEMS Storage Enclosures 10 failures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 6 failures 8 failures Disk 23 Yrs 4 failures 1 failure 2 failures No failure • First year survivability: 95.7% of disk vs. 98.8% of MEMS enclosures with two spares • Chance that MEMS enclosure with four spares requires more than one service in five years: 3.5% (preventive) vs. 0.6% (mandatory)

  17. Related Work • MEMS-based storage technology development • IBM, HP, CMU CHI2PS, Nanochip • Digital Micromirror Devices by TI • Reported Mean Time Between Failure: 650,000 hours [Douglass] • RAID reliability • Dedicated sparing [Dunphy et al.] • Distributed sparing [Menon and Mattson] • Parity sparing [Reddy and Banerjee] • Disk failure prediction • S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology)

  18. Summary • Reliability of MEMS storage enclosures • Can be more reliable than disks even without failed device replacement • Highly reliable when using preventive replacement • Dedicated sparing and distributed sparing provide comparable or almost identical reliability • Economy of MEMS storage enclosures • Preventive replacement trades more maintenance services for higher reliability

  19. Thank You! • Acknowledgements • Dave Nagle, Greg Ganger, CMU PDL • The rest of the UCSC SSRC • More information: • http://ssrc.cse.ucsc.edu • http://ssrc.cse.ucsc.edu/mems.shtml • Questions?

  20. Backup Slides

  21. MEMS Storage Technology • Micro-Electro-Mechanical Systems (MEMS) storage • A promising alternative secondary storage technology • Hardware Research: IBM, HP, CMU, Nanochip • Radical differences between MEMS storage and magnetic disk technologies

  22. Predicted Performance in 2005 MEMS Storage Device Characteristics • Physical size: 1 – 2 cm2 • Recording density: 250 – 750 Gb/in2 7GB/s DRAM 6GB/s 0.5–2 GB $100-$200/GB 5GB/s Throughput 4GB/s 3GB/s MEMS 2GB/s 3–10 GB $5-$50/GB 100–500 GB $1-$2/GB 1GB/s DISK 1ns 10ns 100ns 1us 10us 100us 1ms 10ms Access Latency

  23. Spring Y X MEMS Storage Device

  24. Durability of MEMS Storage Enclosures 10 failures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 6 failures 8 failures Disk 23 Yrs 4 failures 1 failure 2 failures No failure

More Related