Failure trends in a Large Disk Drive Population

Shiva Srivastava and VaibhavRastogi offend Failure trends in a Large Disk Drive Population

Relevance • Work conducted 5-10 years back • Disk drives have changed • All hard disks were Parallel ATA • Prevalent technology today is SATA • Some aspects not covered • Power cycles

Assumptions • Drives fail independently of each other • Enables AFRs

Coarse measurements • Definition of failures • Too coarse grained • When do the disks get replaced • Utilization • Weekly averages • Do not have anything better • Same for temperature

Statistical correctness • What was the size of the fleet? • How does it compare with others • Patterns like those in Figure 3 may be random • Difference between 2 and 4 % is not much

Usefulness • No good empirical model • Perhaps the measurements are too coarse • How am I supposed to use them? • Can they be different for different data centers, for different usage patterns?

Improper presentation • Where is the control in Figure 8 and 11? • Why do the confidence levels decrease so much in Figure 11 • Shows there is a lot of variance? Why?

Comparison with others • Your finding not corroborating manufacturers’ findings • Does it not go against you? • People have used large number of disks • How do you compare with them?

Failure trends in a Large Disk Drive Population

Failure trends in a Large Disk Drive Population

Presentation Transcript

HDD: Hard Disk Drive

Floppy Disk Drive FDD

Hard Drive / Hard Disk

Hard Disk Drive Components

Hard Disk Drive

Hard Disk Drive Design

Failure Trends in a Large Disk Drive Population

Hybrid Hard Disk Drive

Introduction to hard disk drive

Disk Drive Science

Floppy Disk Drive

Disk Drive Architecture Exploration

Floppy Disk Drive

Failure Correction Techniques for Large Disk Array

Floppy Disk Drive

Disk Drive Architecture Exploration