250 likes | 391 Vues
This paper discusses the reliability implications of Micro-Electro-Mechanical Systems (MEMS) in storage technology. MEMS-based storage offers significant advantages over traditional magnetic disk drives, including faster access times, lower power consumption, and smaller size. However, the increased number of MEMS devices raises concerns about system reliability. We propose MEMS enclosures equipped with multiple MEMS devices to enhance reliability and reduce human intervention in maintenance. By implementing dedicated and distributed sparing strategies, we outline methods to improve data reliability and enhance serviceability while managing the economic lifecycle of these storage systems.
E N D
Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J.* Scott A. Brandt, Darrell D. E. Long Storage Systems Research Center University of California, Santa Cruz *Also Santa Clara University, Santa Clara, CA
Spring MEMS Storage Technology • Micro-Electro-Mechanical Systems (MEMS) storage • A promising alternative secondary storage technology • Hardware Research: IBM, HP, CMU, Nanochip • Magnetic storage, but very different mechanics
MEMS Storage Technology • MEMS-based storage vs. Magnetic Disk • Provides non-volatile storage, too. • Delivers 10 * faster access time (< 1 ms) • Delivers higher bandwidth (100 MB – 1 GB/s) • Small (size of penny, cent) • Consumes 100* less power • Costs ~10 USD per device • Expected to be more reliable • Stores limited amount of data per device (3-10 GB) • A serious alternative to disk drives, in particular for mobile computing applications
Reliability Implication of MEMS-based Storage • Storage systems built from MEMS-based storage … • Require more MEMS devices • At least 10 times the number of disks to meet capacity requirements • Require more connection components • Reliability implication • More components, hence (?) lower reliability
MEMS Storage Enclosure • Our proposal: MEMS Enclosures • A device with dozens of MEMS • Single interface to rest of system • Might be serviceable, but service calls during economic lifetime should be very rare Interface
MEMS Storage Enclosures • Reliability an issue: • MTTF 1- 2 years without redundant data storage • Uses RAID Level 5 technology with distributed sparing • Additional k spares • Calls for service when necessary • i.e. when we run out of spares • Organization and number of spares can • Decrease the data recovery time and thus improve reliability • Reduce human interference • No errors servicing • Reduce maintenance costs
MEMS Enclosure Reliability • Measure MTBF for enclosures • Without replacing spares • With replacing spares (service calls) • Determine number of failures that trigger a service call • Mandatory replacement: no redundancy left • Preventive replacement: no spare left
MEMS Enclosure Reliability without Replacement 5 spares 8.1 Yrs 4 spares 6.9 Yrs 3 spares 5.8 Yrs Disk 23 Yrs Disk 11.5 Yrs 2 spares 4.6 Yrs 1 spare 3.5 Yrs • MTTFDISK = 11.5 or 23 yrs • MTTFMEMS = 23 yrs • 19 data + 1 parity + k dedicated spares • 15-minute data recovery No spare 2.3 Yrs • MTTF is not enough to measure reliability of enclosures without repairs • Instead: focus on data reliability during the economic lifetimes (3-5 years) of enclosures
Preventive replacement Mandatory replacement MEMS Enclosures with Replacement • Markov model for a MEMS enclosure with N data, one parity, and one dedicated spare devices • N – Normal; D – Degraded; DL – Data Loss • 1/ – MTTFMEMS (in tens of years) • 1/µ – Mean Time Between Recovery (in minutes) • 1/ – Mean Time Between Replacement (in days, weeks) • Preventive and mandatory replacement
MEMS Enclosure Reliability with Replacement 1, 2, 3 – Number of spares Preventive + mandatory 3 2 1 Mandatory 3 2 1 No spare • Preventive replacement increases reliability and reduces replacement urgency
MEMS Enclosure Reliability • Dedicated Sparing • Replace all data from a failed MEMS on a single spare MEMS • Distributed Sparing • Every spare contains • Client data • Parity data • Spare space
X After MEMS 4 fails • Shorter data recovery time • More devices can fail Distributed Sparing [Menon and Mattson 1992] Before failure
Dedicated 2 1 2 1 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing 1, 2– Number of spares Preventive + mandatory Mandatory Dedicated No spare Compare with following slide
2 1 2 1 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing 1, 2– Number of spares • Distributed sparing only better at short replacement times when using preventive replacement Distributed Preventive + mandatory Dedicated Mandatory Dedicated & Distributed No spare
Durability of MEMS Storage Enclosures • All about economy • How long can MEMS enclosures work without repairs? • How often do they need repairing in the first 3-5 years? • How does replacement policies affect maintenance frequency? • # of failures an enclosure with k spares can tolerate before the (m+1)th repair is scheduled (m >= 0): • (m + 1) × k, under the preventive replacement policy • (m + 1) × (k + 1), under the mandatory replacement policy
Durability of MEMS Storage Enclosures 10 failures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 6 failures 8 failures Disk 23 Yrs 4 failures 1 failure 2 failures No failure • First year survivability: 95.7% of disk vs. 98.8% of MEMS enclosures with two spares • Chance that MEMS enclosure with four spares requires more than one service in five years: 3.5% (preventive) vs. 0.6% (mandatory)
Related Work • MEMS-based storage technology development • IBM, HP, CMU CHI2PS, Nanochip • Digital Micromirror Devices by TI • Reported Mean Time Between Failure: 650,000 hours [Douglass] • RAID reliability • Dedicated sparing [Dunphy et al.] • Distributed sparing [Menon and Mattson] • Parity sparing [Reddy and Banerjee] • Disk failure prediction • S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology)
Summary • Reliability of MEMS storage enclosures • Can be more reliable than disks even without failed device replacement • Highly reliable when using preventive replacement • Dedicated sparing and distributed sparing provide comparable or almost identical reliability • Economy of MEMS storage enclosures • Preventive replacement trades more maintenance services for higher reliability
Thank You! • Acknowledgements • Dave Nagle, Greg Ganger, CMU PDL • The rest of the UCSC SSRC • More information: • http://ssrc.cse.ucsc.edu • http://ssrc.cse.ucsc.edu/mems.shtml • Questions?
MEMS Storage Technology • Micro-Electro-Mechanical Systems (MEMS) storage • A promising alternative secondary storage technology • Hardware Research: IBM, HP, CMU, Nanochip • Radical differences between MEMS storage and magnetic disk technologies
Predicted Performance in 2005 MEMS Storage Device Characteristics • Physical size: 1 – 2 cm2 • Recording density: 250 – 750 Gb/in2 7GB/s DRAM 6GB/s 0.5–2 GB $100-$200/GB 5GB/s Throughput 4GB/s 3GB/s MEMS 2GB/s 3–10 GB $5-$50/GB 100–500 GB $1-$2/GB 1GB/s DISK 1ns 10ns 100ns 1us 10us 100us 1ms 10ms Access Latency
Spring Y X MEMS Storage Device
Durability of MEMS Storage Enclosures 10 failures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 6 failures 8 failures Disk 23 Yrs 4 failures 1 failure 2 failures No failure