250 likes | 377 Vues
Reliability of MEMS-Based Storage Enclosures. Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long. Storage Systems Research Center University of California, Santa Cruz *Also Santa Clara University, Santa Clara, CA. Spring. MEMS Storage Technology.
E N D
Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J.* Scott A. Brandt, Darrell D. E. Long Storage Systems Research Center University of California, Santa Cruz *Also Santa Clara University, Santa Clara, CA
Spring MEMS Storage Technology • Micro-Electro-Mechanical Systems (MEMS) storage • A promising alternative secondary storage technology • Hardware Research: IBM, HP, CMU, Nanochip • Magnetic storage, but very different mechanics
MEMS Storage Technology • MEMS-based storage vs. Magnetic Disk • Provides non-volatile storage, too. • Delivers 10 * faster access time (< 1 ms) • Delivers higher bandwidth (100 MB – 1 GB/s) • Small (size of penny, cent) • Consumes 100* less power • Costs ~10 USD per device • Expected to be more reliable • Stores limited amount of data per device (3-10 GB) • A serious alternative to disk drives, in particular for mobile computing applications
Reliability Implication of MEMS-based Storage • Storage systems built from MEMS-based storage … • Require more MEMS devices • At least 10 times the number of disks to meet capacity requirements • Require more connection components • Reliability implication • More components, hence (?) lower reliability
MEMS Storage Enclosure • Our proposal: MEMS Enclosures • A device with dozens of MEMS • Single interface to rest of system • Might be serviceable, but service calls during economic lifetime should be very rare Interface
MEMS Storage Enclosures • Reliability an issue: • MTTF 1- 2 years without redundant data storage • Uses RAID Level 5 technology with distributed sparing • Additional k spares • Calls for service when necessary • i.e. when we run out of spares • Organization and number of spares can • Decrease the data recovery time and thus improve reliability • Reduce human interference • No errors servicing • Reduce maintenance costs
MEMS Enclosure Reliability • Measure MTBF for enclosures • Without replacing spares • With replacing spares (service calls) • Determine number of failures that trigger a service call • Mandatory replacement: no redundancy left • Preventive replacement: no spare left
MEMS Enclosure Reliability without Replacement 5 spares 8.1 Yrs 4 spares 6.9 Yrs 3 spares 5.8 Yrs Disk 23 Yrs Disk 11.5 Yrs 2 spares 4.6 Yrs 1 spare 3.5 Yrs • MTTFDISK = 11.5 or 23 yrs • MTTFMEMS = 23 yrs • 19 data + 1 parity + k dedicated spares • 15-minute data recovery No spare 2.3 Yrs • MTTF is not enough to measure reliability of enclosures without repairs • Instead: focus on data reliability during the economic lifetimes (3-5 years) of enclosures
Preventive replacement Mandatory replacement MEMS Enclosures with Replacement • Markov model for a MEMS enclosure with N data, one parity, and one dedicated spare devices • N – Normal; D – Degraded; DL – Data Loss • 1/ – MTTFMEMS (in tens of years) • 1/µ – Mean Time Between Recovery (in minutes) • 1/ – Mean Time Between Replacement (in days, weeks) • Preventive and mandatory replacement
MEMS Enclosure Reliability with Replacement 1, 2, 3 – Number of spares Preventive + mandatory 3 2 1 Mandatory 3 2 1 No spare • Preventive replacement increases reliability and reduces replacement urgency
MEMS Enclosure Reliability • Dedicated Sparing • Replace all data from a failed MEMS on a single spare MEMS • Distributed Sparing • Every spare contains • Client data • Parity data • Spare space
X After MEMS 4 fails • Shorter data recovery time • More devices can fail Distributed Sparing [Menon and Mattson 1992] Before failure
Dedicated 2 1 2 1 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing 1, 2– Number of spares Preventive + mandatory Mandatory Dedicated No spare Compare with following slide
2 1 2 1 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing 1, 2– Number of spares • Distributed sparing only better at short replacement times when using preventive replacement Distributed Preventive + mandatory Dedicated Mandatory Dedicated & Distributed No spare
Durability of MEMS Storage Enclosures • All about economy • How long can MEMS enclosures work without repairs? • How often do they need repairing in the first 3-5 years? • How does replacement policies affect maintenance frequency? • # of failures an enclosure with k spares can tolerate before the (m+1)th repair is scheduled (m >= 0): • (m + 1) × k, under the preventive replacement policy • (m + 1) × (k + 1), under the mandatory replacement policy
Durability of MEMS Storage Enclosures 10 failures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 6 failures 8 failures Disk 23 Yrs 4 failures 1 failure 2 failures No failure • First year survivability: 95.7% of disk vs. 98.8% of MEMS enclosures with two spares • Chance that MEMS enclosure with four spares requires more than one service in five years: 3.5% (preventive) vs. 0.6% (mandatory)
Related Work • MEMS-based storage technology development • IBM, HP, CMU CHI2PS, Nanochip • Digital Micromirror Devices by TI • Reported Mean Time Between Failure: 650,000 hours [Douglass] • RAID reliability • Dedicated sparing [Dunphy et al.] • Distributed sparing [Menon and Mattson] • Parity sparing [Reddy and Banerjee] • Disk failure prediction • S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology)
Summary • Reliability of MEMS storage enclosures • Can be more reliable than disks even without failed device replacement • Highly reliable when using preventive replacement • Dedicated sparing and distributed sparing provide comparable or almost identical reliability • Economy of MEMS storage enclosures • Preventive replacement trades more maintenance services for higher reliability
Thank You! • Acknowledgements • Dave Nagle, Greg Ganger, CMU PDL • The rest of the UCSC SSRC • More information: • http://ssrc.cse.ucsc.edu • http://ssrc.cse.ucsc.edu/mems.shtml • Questions?
MEMS Storage Technology • Micro-Electro-Mechanical Systems (MEMS) storage • A promising alternative secondary storage technology • Hardware Research: IBM, HP, CMU, Nanochip • Radical differences between MEMS storage and magnetic disk technologies
Predicted Performance in 2005 MEMS Storage Device Characteristics • Physical size: 1 – 2 cm2 • Recording density: 250 – 750 Gb/in2 7GB/s DRAM 6GB/s 0.5–2 GB $100-$200/GB 5GB/s Throughput 4GB/s 3GB/s MEMS 2GB/s 3–10 GB $5-$50/GB 100–500 GB $1-$2/GB 1GB/s DISK 1ns 10ns 100ns 1us 10us 100us 1ms 10ms Access Latency
Spring Y X MEMS Storage Device
Durability of MEMS Storage Enclosures 10 failures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 6 failures 8 failures Disk 23 Yrs 4 failures 1 failure 2 failures No failure